A character encoding is a code that pairs a set of natural language character (computing)s (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. Common examples include Morse code, which encodes letters of the Roman alphabet as series of long and short depressions of a telegraph key; and ASCII, which encodes letters, numerals, and other symbols as both integers and 7-bit binary versions of those integers.
In some contexts (especially computer storage and communication) it makes sense to distinguish a character repertoire, which is a full set of abstract characters that a system supports, from a coded character set or character encoding which specifies how to represent characters from that set using a number of integer codes.
In the early days of computing, most systems used only the character repertoire of the ASCII code. This was soon seen to be inadequate, and a number of ad-hoc methods were used to extend this. The need to support multiple writing systems, including the CJK family of scripts, required a far larger number of characters to be supported, and required a systematic approach to character encoding to be used, rather than the previous ad-hoc approaches.
For example, the full repertoire of Unicode encompasses over 100,000 characters, each being assigned a unique integer code in the range 0 to hexadecimal 10FFFF (a little over 1.1 million, so not all integers in that range represent coded characters). Other common repertoires include ASCII and ISO 8859-1, which are identical to the first 128 and 256 coded characters of Unicode respectively.
The term character encoding is sometimes overloaded to also mean how characters are represented as a specific sequence of bits. This involves an encoding form where the integer code is converted to a series of integer code values that facilitate storage in a system that uses fixed bit widths. For example, integers greater than 65535 will not fit in 16 bits, so the UTF-16 encoding form mandates that these integers be represented as a surrogate pair of integers that are less than 65536 and that are not assigned to characters (e.g., hex 10000 becomes the pair D800 DC00). An encoding scheme then converts code values to bit sequences, with attention given to things like platform-dependent byte order issues (e.g. D800 DC00 might become 00 D8 00 DC on an Intel x86 architecture). A character set or character map or code page shortcuts this process by directly mapping abstract characters to specific bit patterns. Unicode Technical Report #17 explains this terminology in depth and provides further examples.
Since most applications use only a small subset of Unicode, encoding schemes like UTF-8 and UTF-16, and character maps like ASCII, provide efficient ways to represent Unicode characters in computer storage or communications using short binary words. Some of these simple text encodings use data compression techniques to represent a large repertoire with a smaller number of codes.
Popular character encodings:
- ISO 8859
- ASCII
- EBCDIC
- Big-5
- GB2312[?]
- Windows-1252
- ISO 2022[?]
- Unicode (And subsets thereof, such as the 16-bit 'Basic Multilingual Plane')
Links:
- Character sets registered by Internet Assigned Numbers Authority (http://www.iana.org/assignments/character-sets)
- Unicode Technical Report #17: Character Encoding Model (http://www.unicode.org/unicode/reports/tr17/)
- The Cyrillic Charset soup (http://czyborra.com/charsets/cyrillic.html)
Common misspelling and questions (FAQ)
haracter-encoding caracter-encoding chracter-encoding chaacter-encoding charcter-encoding charater-encoding characer-encoding charactr-encoding characte-encoding characterencoding character-ncoding character-ecoding character-enoding character-encding character-encoing character-encodng character-encodig character-encodin hcaracter-encoding cahracter-encoding chraacter-encoding chaarcter-encoding charcater-encoding charatcer-encoding characetr-encoding charactre-encoding characte-rencoding charactere-ncoding character-necoding character-ecnoding character-enocding character-encdoing character-encoidng character-encodnig character-encodign character-encodin ccharacter-encoding chharacter-encoding chaaracter-encoding charracter-encoding charaacter-encoding characcter-encoding charactter-encoding characteer-encoding characterr-encoding character--encoding character-eencoding character-enncoding character-enccoding character-encooding character-encodding character-encodiing character-encodinng character-encodingg dharacter-encoding xharacter-encoding fharacter-encoding fharacter-encoding vharacter-encoding cyaracter-encoding cgaracter-encoding cbaracter-encoding cuaracter-encoding cnaracter-encoding cuaracter-encoding cjaracter-encoding cnaracter-encoding chqracter-encoding chwracter-encoding chzracter-encoding chwracter-encoding chsracter-encoding chzracter-encoding cha4acter-encoding chaeacter-encoding chadacter-encoding cha5acter-encoding chafacter-encoding cha5acter-encoding chatacter-encoding chafacter-encoding charqcter-encoding charwcter-encoding charzcter-encoding charwcter-encoding charscter-encoding charzcter-encoding charadter-encoding charaxter-encoding charafter-encoding charafter-encoding charavter-encoding charac5er-encoding characrer-encoding characfer-encoding charac6er-encoding characger-encoding charac6er-encoding characyer-encoding characger-encoding charact3r-encoding charactwr-encoding charactsr-encoding charact4r-encoding charactdr-encoding charact4r-encoding charactrr-encoding charactdr-encoding characte4-encoding charactee-encoding characted-encoding characte5-encoding charactef-encoding characte5-encoding charactet-encoding charactef-encoding character0encoding characterpencoding character[encoding character-3ncoding character-wncoding character-sncoding character-4ncoding character-dncoding character-4ncoding character-rncoding character-dncoding character-ehcoding character-ebcoding character-ejcoding character-ejcoding character-emcoding character-endoding character-enxoding character-enfoding character-enfoding character-envoding character-enc9ding character-enciding character-enckding character-enc0ding character-enclding character-enc0ding character-encpding character-enclding character-encoeing character-encosing character-encoxing character-encoring character-encocing character-encoring character-encofing character-encocing character-encod8ng character-encodung character-encodjng character-encod9ng character-encodkng character-encod9ng character-encodong character-encodkng character-encodihg character-encodibg character-encodijg character-encodijg character-encodimg character-encodint character-encodinf character-encodinv character-encodiny character-encodinb character-encodiny character-encodinh character-encodinb character-encodyng characyer-encoding charactyer-encoding character-encodingsadventures formed the plot of a tragi-comedy by T. P., entitled "A great applause by persons of quality in Whitsun week. Mary Carleton she appeared on the stage in her own character as the heroine of a to see her on April 15th, 1664. The rest of her life was one at Tyburn for stealing a piece of plate in Chancery Lane.] at the Gatehouse, at Westminster, and I to my brother's, and thence to my goes away to-morrow and I not seen her), but did find none of them him, and so home, and in my way did take two turns forwards and backwards off the doors there, and God forgive me I could scarce stay myself from once, as I have these two days, set upon pleasure again. So home and to supper, and then Creed and I to bed with good discourse, only my mind days; but I must impute it to the disquiet that my mind has been in of I have paid the due forfeit by money and abating the times of going to times that I am to go to Court plays to the end of this month, and so brother not being ready, he and I walked to the New Exchange, and there perceive the lawyers come all in as they go to the Hall, and I believe it poor, religious, well-meaning, good soul, talking of nothing but God .