|
Article on other languages:
|
The term Base64 refers to a specific MIME content transfer encoding. It is also used as a generic term for any similar encoding scheme that encodes binary data by treating it numerically and translating it into a base 64 representation. The particular choice of base is due to the history of character set encoding: one can choose 64 characters that are both part of the subset common to most encodings, and also printable. This combination leaves the data unlikely to be modified in transit through systems, such as email, which were traditionally not 8-bit clean. The precise choice of characters is difficult. The earliest instances of this type of encoding were created for dialup communication between systems running the same OS - e.g. Uuencode for UNIX, BinHex for the TRS-80 (later adapted for the Macintosh) - and could therefore make more assumptions about what characters were safe to use. For instance, Uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase, since UNIX was sometimes used with terminals that did not support distinct letter case. Unfortunately for interoperability with non-UNIX systems, some of the punctuation characters do not exist in other traditional character sets. The MIME base64 encoding replaces most of the punctuation characters with the lowercase letters, a reasonable requirement by the time it was designed. MIME Base64 uses A–Z, a–z, and 0–9 for the first 62 digits. There are other similar systems, usually derived from Base64, that share this property but differ in the symbols chosen for the last two digits; an example is UTF-7.
History of base64 encoding schemesPrivacy-Enhanced Mail (PEM)The first known use of the encoding now called MIME Base 64 was in the Privacy-enhanced Electronic Mail (PEM) protocol, proposed by RFC 989 in 1987. PEM defines a "printable encoding" scheme that uses Base 64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 7-bit characters, as required by transfer protocols such as SMTP. The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The original specification, RFC 989, additionally used the "*" symbol to delimit encoded but unencrypted data within the output stream. To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits. After encoding padded data, if two octets were remaining to encode, one "=" character is appended to the output; if one octet was remaining, two "=" characters are appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes. PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions. MIMEThe MIME (Multipurpose Internet Mail Extensions) specification, defined in RFC 2045, lists "base64" as one of several binary-to-text encoding schemes. MIME's base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM, and uses the "=" symbol for output padding in the same way. MIME does not specify a fixed length for base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally it specifies that any extra-alphabetic characters must be ignored by a compliant decoder, although most implementations use a Thus, the actual length of MIME-compliant base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be a lot higher because of the overhead of the headers. Very roughly, the final size of base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). UTF-7UTF-7, described in RFC 2152, introduced a system called Modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the base64 encoding used in MIME. The "Modified Base64" alphabet consists of the MIME base64 alphabet, but does not use the "=" padding character. UTF-7 is intended for use in mail headers (defined in RFC 2047), and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified base64 simply omits the padding and ends immediately after the last BASE64 digit containing useful bits (leaving 0-4 unused bits in the last base64 digit) OpenPGPOpenPGP, described in RFC 2440, describes Radix-64 encoding, also known as "ASCII Armor". Radix-64 is identical to the "base64" encoding described from MIME, with the addition of a 24-bit CRC checksum. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same base64 algorithm and, using an additional "=" symbol as separator, concatenated to the encoded output data. RFC 3548RFC 3548 (The Base16, Base32, and Base64 Data Encodings) is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of base64 encodings, alternative-alphabet encodings, and the seldom-used Base 32 and Base 16 encodings. RFC 3548 forbids implementations from adding non-alphabetic characters unless they are written to a specification that refers to RFC 3548 and specifically requires otherwise; it also declares that decoder implementations must reject data that contains non-alphabetic characters unless they are written to a specification that refers to RFC 3548 and specifically requires otherwise. RFC 4648This RFC obsoletes RFC 3548 and focuses on base 64/32/16:
ExampleA quote from Thomas Hobbes's Leviathan:
is encoded in MIME's base64 scheme as follows: TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4= In the above quote the encoded value of Man is TWFu. Encoded in ASCII, M, a, n are stored as the bytes
As this example illustrates, Base 64 encoding converts 3 uncoded bytes (in this case, ASCII characters) into 4 encoded ASCII characters. The example below illustrates how shortening the input changes the output padding: Input ends with: carnal pleasure. Output ends with: c3VyZS4= Input ends with: carnal pleasure Output ends with: c3VyZQ== Input ends with: carnal pleasur Output ends with: c3Vy Input ends with: carnal pleasu Output ends with: c3U= Note that the same characters will be encoded differently depending on their position within the three-octet group which is encoded to produce the four characters. For example The String leasure. Encodes to bGVhc3VyZS4= The String easure. Encodes to ZWFzdXJlLg== The String asure. Encodes to YXN1cmUu The String sure. Encodes to c3VyZS4= Online toolsThere are a variety of on-line Base64 tools such as:
URL applicationsbase64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. Hibernate, a database persistence framework for Java objects, uses base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in not only a compact way, but in a relatively unreadable one when trying to obscure the nature of data from a casual human observer. Using a URL-encoder on standard Base64, however, is inconvenient as it will translate the '+' and '/' characters into special percent-encoded hexadecimal sequences ('+' = '%2B' and '/' = '%2F'). When this is later used with database storage or across heterogeneous systems, they will themselves choke on the '%' character generated by URL-encoders (because the '%' character is also used in ANSI SQL as a wildcard). For this reason, a modified Base64 for URL variant exists, where no padding '=' will be used, and the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', so that using URL encoders/decoders is no longer necessary and has no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. Another variant called modified Base64 for regexps uses '!-' instead of '*-' to replace the standard Base64 '+/', because both '+' and '*' may be reserved for regular expressions (note that '[]' used in the IRCu variant above would not work in that context). There are other variants that use '_-' or '._' when the Base64 variant string must be used within valid identifiers for programs, or '.-' for use in XML name tokens (Nmtoken), or even '_:' for use in more restricted XML identifiers (Name). Other applicationsBase64 can be used in a variety of contexts:
See alsoReferencesExternal links
Article keywords: base64 decode, base64 encoding, base64 encode, |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License.
Mercedes Car
This site monitored by SitePinger.net