IBM PC or MS-DOScode page 437, often abbreviated CP437 and also known as, DOS-US, OEM-US or sometimes misleadingly referred to as the OEM font, High ASCII or Extended ASCII,[1][2] is the original character set of the IBM PC, circa 1981.
In a more strict sense, this character set was not born as a realcode page (in its present sense) but being merely the graphical glyph repertoire available in the ROM of the IBM Monochrome Display Adapter (MDA) and Color Graphics Adapter (CGA) video output cards of the original IBM PC; that is, it was implemented on hardware. The expression "Original Equipment Manufacturer" (OEM) arises from this kind of fact. Today, is still the primary font in the core of any EGA and VGA compatible graphic card, i.e. the text you can see on screen when a PC reboots is rendered with this code page.
All these display adapters have a basic 80-column text mode, in which every character cell is represented in the video RAM as a single byte (plus an additional byte which carries information about its colour and/or effect), giving 256 possible values for graphic characters. This way, beyond the original ASCII graphical character set (values 32 to 126, 95 in total), the implementors put in ROM a handful of miscellaneous characters even for the range 0 to 31, reserved in ASCII for control (non graphical) purposes.
So this code page has two main uses: as an information interchange code (through files and telecom), in which the values 0 to 127 plays the same role as in ASCII plus the international text characters 128 to 175 (see the table below), and as a graphical resource for screen and printers (by merely writing in the video RAM character cell/sending through line the appropriate code), in which the full range can be used to build fine presentations.
The following is a table representing CP437 using the equivalent Unicode characters. Standard ASCII and ISO 8859-1 (Latin-1) character glyphs, along with the Greek letters, are shown as coloured cells.
Due to the dual use of values in the range 0 to 31 (0h to 20h), there are two sets for these, the first being their meanings as ASCII control characters and the second their graphical output on screen/printer.
For value 127 (7Fh), its graphical output is shown in the last table, its meaning being the ASCII control character "DEL" (delete), Unicode value U+007F.
NOTE: graphical output for characters 0 (0h), 32 (20h) and 255 (FFh) is mere blank cells, without marks of any kind.
NOTE: the graphical output chosen for character number 0 is U+2007 FIGURE SPACE (FSP), a space of the same width as digits in the variable-pitch fonts.
In DOS and Windows, most characters from the currently active DOS code page can be inserted by holding down the Alt key and entering the character's three-digit decimal code on the numpad. This technique is called Windows Alt keycodes. One can find out which DOS code page is currently active by issuing the DOS commandmodecon or chcp.
Difference from ASCII
CP437 is based on ASCII, with the following modifications:
The C0 control range (00h–1Fh in hex) is mapped to graphics characters. The codes can assume their original function as controls, but when placed in display RAM and then viewed in text mode, for example in a screen editor like MS-DOS edit, they show as graphics. The graphics are various, such as smiling faces, card suits and musical notes. Code 127 (7Fh), DEL, similarly shows as a graphic (a house).
The high-bit range, 128 to 255 (80h–FFh), is mapped to various symbols: a few European characters (accented Latin vowels, etc) in no particular order and not sufficient for representation of most Western European languages, box drawing characters, mathematical symbols and a few Greek letters commonly used in mathematics and physics.
The repertoire of CP437 was taken from the character set of Wang word-processing machines, according to Bill Gates in an interview with Gates and Paul Allen that in the 2 October1995 edition of Fortune Magazine:
"… we were also fascinated by dedicated word processors from Wang, because we believed that general-purpose machines could do that just as well. That's why, when it came time to design the keyboard for the IBM PC, we put the funny Wang character set into the machine—you know, smiley faces and boxes and triangles and stuff. We were thinking we'd like to do a clone of Wang word-processing software someday."
The graphic character set selection, often accused to be somewhat bizarre, has some internal logic:
Table rows 0 and 1, codes 0 to 31 (0h to 20h), are assorted dingbats (complementary and decorative characters). The isolated character 127 (7Fh) also belongs to this group.
Table rows 2 to 7 (except character 127, 7Fh), codes 32 to 126 (20h to 7Eh), are the standard ASCII printable characters.
Table rows 8 to 10 (Ah), codes 128 to 175 (80h to AFh), are a picked selection of international text characters.
Table rows 11 (Bh) to 13 (Dh), codes 176 to 223 (B0h to DFh), are box drawing and block characters. This block is subarranged in such way that characters 192 to 223 (C0h to DFh) of the rows 12 and 13 (Ch and Dh) have all right arms (except 217, D9h) or right filled areas (except 221, DDh), and this is due to the following technical reason[3]: the original IBM PCMDA display adapter had stored the CP437 character glyphs as little bitmaps eight pixels wide, but displays them every nine pixels on screen, eight plus an additional gap, for visual enhancement. Thus, characters with connection designs at their right side must duplicate their eighth pixels in order to not interrupt visually the lines/filled surfaces they built when are put consecutively. This pixel extension is done by special hardware circuitry, and only this character subset is affected.
Table rows 14 (Eh) and 15 (Fh), codes 224 to 255 (E0h to FFh) are devoted to mathematical symbols, where the twelve first are a picked selection of Greek letters commonly used in physics. Characters 244 and 245 (F4h and F5h) are the upper and lower portion of an italic long S, the symbol used as integral sign (∫), and they can be extended through the character 179 (B3h), the vertical line of the box drawing block. Characters 249 and 250 (F9h and FAh) are almost indistinguishable: the first was only a single pixel, while the second resembles the typographic middle dot (·). It is unclear the real need to include this pair, where only one would be sufficient. The character 255 (FFh) is mere blank, and acts as a kind of non-breaking space in order to arrange math formulae.
Internationalisation
CP437 has a series of international characters, mainly values 128 to 175 (80H to AFh). However, it lacks many characters important to several Western languages:
It lacks many characters for Spanish (Á, Í, Ó, Ú), French, (À, Â, È, Ê, Ë, Ì, Î, Ï, Ò, Ô, Œ, œ, Ù, Û), and Portuguese (Ã, ã, Õ, õ).
It has umlauts for German (Ä, ä, Ö, ö, Ü, ü), but sharp S (ß) must be represented with the beta symbol (β).
It has Scandinavic Æ, æ, Å, å, but lacks Ø and ø (character number 237, empty set, may be used as a surrogate, but is not properly displayed within a word).
Along with the cent (¢), pound sterling (£) and yen/yuan (¥) currency symbols, it has a couple of European currency symbols, for the florin (ƒ, Netherlands) and the peseta (₧, Spain). The presence of the last is a real surprise, since the Spanish peseta was never an internationally relevant currency, and also never had a symbol of its own; it was simply abbreviated as "Pt", "Pta", "Pts", or "Ptas". The only related fact is that Spanish models of the IBM electric typewriter also had a single type devoted to it.
Later MS-DOS character sets, such as CP850 (DOS Latin-1), CP852 (DOS Central-European) and CP737 (DOS Greek), filled the gaps for international use with some compatibility to with CP437 by retaining the single and double box-drawing characters, while discarding the mixed ones (e.g. horizontal double/vertical single). All CP437 characters are in Unicode and in Microsoft's WGL4 character set, therefore in most of the fonts on Microsoft Windows, and also in the default VGA font of the Linux kernel, and the ISO 10646 fonts for X11.
Multiple meaning character glyphs
Along with the characters in the range 0 to 31, which can be interpreted as ASCII controls as well as graphical dingbats, some characters with ambiguous look (to the eyes of its implementors, not to the eyes of a typographer) have overloaded meanings, depending upon context:
225 (E1h) is both the German sharp S (U+00DF, ß) and the Greek lowercase beta (U+03B2, β).
228 (E4h) is both the n-ary summation sign (U+2211, ∑) and the Greek uppercase sigma (U+03A3, Σ).
230 (E6h) is both the micro sign (U+00B5, µ) and the Greek lowercase mu (U+03BC, μ).
234 (EAh) is both the ohm sign (U+2126, Ω) and the Greek uppercase omega (U+03A9, Ω) (note that in Unicode as well, the ohm sign is canonically equivalent to the capital omega, and its use is discouraged in favor of capital omega[1]).
235 (EBh) is the Greek lowercase delta (U+03B4, δ), but it has been used also as an approximated surrogate for the Icelandic lowercase eth (U+00F0, ð) and as simil of the partial derivative sign (U+2202, ∂).
237 (EDh) is mainly the empty set sign (U+2205, ) and it was also used as Greek phi symbol in italics (U+03D5, ) to name angles, diameter sign (U+2300, ) and as an approximated surrogate for the Latin lowercase O with stroke (U+00F8, ø), but rarely as Greek lowercase phi (U+03D6, φ) due to its IBM original shape, which seems to be merely a circle crossed by a slash, and does not closely resemble this Greek lowercase letter.
238 (EEh) is both the element-of sign (U+2208, ∈) and the Greek lowercase epsilon (U+03B5, ε). Also, in some dot matrix ticket printers (with CP 437 in ROM), it is used today in place of the euro sign (U+20AC, €), in the European countries where the euro is the official currency.
The main reason for this spawning is that the CP437 character set of the original IBM PCMDA and CGA display adapters, as well that of compatible printers, was fixed in ROM and could not be changed by software, so developers and users tried to take the maximum advantage of the available resources.
Implementors of mapping tables to Unicode should note that these "unified" characters may have not a unique, single meaning: the correct choice depend upon context.
Microsoft reference Unicode values
In the Microsoft reference documentation, the following CP437 characters have Unicode values assigned which depart from the values given in the table above:
00h = U+0000 NULL
7Fh = U+007F DELETE
E1h = U+00DF LATIN SMALL LETTER SHARP S
EDh = U+03C6 GREEK SMALL LETTER PHI
EEh = U+03B5 GREEK SMALL LETTER EPSILON