A character encoding is a code that pairs a set of natural language character (computing)s (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. Common examples include Morse code, which encodes letters of the Roman alphabet as series of long and short depressions of a telegraph key; and ASCII, which encodes letters, numerals, and other symbols as both integers and 7-bit binary versions of those integers.
In some contexts (especially computer storage and communication) it makes sense to distinguish a character repertoire, which is a full set of abstract characters that a system supports, from a coded character set or character encoding which specifies how to represent characters from that set using a number of integer codes.
In the early days of computing, most systems used only the character repertoire of the ASCII code. This was soon seen to be inadequate, and a number of ad-hoc methods were used to extend this. The need to support multiple writing systems, including the CJK family of scripts, required a far larger number of characters to be supported, and required a systematic approach to character encoding to be used, rather than the previous ad-hoc approaches.
For example, the full repertoire of Unicode encompasses over 100,000 characters, each being assigned a unique integer code in the range 0 to hexadecimal 10FFFF (a little over 1.1 million, so not all integers in that range represent coded characters). Other common repertoires include ASCII and ISO 8859-1, which are identical to the first 128 and 256 coded characters of Unicode respectively.
The term character encoding is sometimes overloaded to also mean how characters are represented as a specific sequence of bits. This involves an encoding form where the integer code is converted to a series of integer code values that facilitate storage in a system that uses fixed bit widths. For example, integers greater than 65535 will not fit in 16 bits, so the UTF-16 encoding form mandates that these integers be represented as a surrogate pair of integers that are less than 65536 and that are not assigned to characters (e.g., hex 10000 becomes the pair D800 DC00). An encoding scheme then converts code values to bit sequences, with attention given to things like platform-dependent byte order issues (e.g. D800 DC00 might become 00 D8 00 DC on an Intel x86 architecture). A character set or character map or code page shortcuts this process by directly mapping abstract characters to specific bit patterns. Unicode Technical Report #17 explains this terminology in depth and provides further examples.
Since most applications use only a small subset of Unicode, encoding schemes like UTF-8 and UTF-16, and character maps like ASCII, provide efficient ways to represent Unicode characters in computer storage or communications using short binary words. Some of these simple text encodings use data compression techniques to represent a large repertoire with a smaller number of codes.
Popular character encodings:
- ISO 8859
- ASCII
- EBCDIC
- Big-5
- GB2312[?]
- Windows-1252
- ISO 2022[?]
- Unicode (And subsets thereof, such as the 16-bit 'Basic Multilingual Plane')
Links:
- Character sets registered by Internet Assigned Numbers Authority (http://www.iana.org/assignments/character-sets)
- Unicode Technical Report #17: Character Encoding Model (http://www.unicode.org/unicode/reports/tr17/)
- The Cyrillic Charset soup (http://czyborra.com/charsets/cyrillic.html)
Common misspelling and questions (FAQ)
haracter-set caracter-set chracter-set chaacter-set charcter-set charater-set characer-set charactr-set characte-set characterset character-et character-st character-se hcaracter-set cahracter-set chraacter-set chaarcter-set charcater-set charatcer-set characetr-set charactre-set characte-rset characters-et character-est character-ste character-se ccharacter-set chharacter-set chaaracter-set charracter-set charaacter-set characcter-set charactter-set characteer-set characterr-set character--set character-sset character-seet character-sett dharacter-set xharacter-set fharacter-set fharacter-set vharacter-set cyaracter-set cgaracter-set cbaracter-set cuaracter-set cnaracter-set cuaracter-set cjaracter-set cnaracter-set chqracter-set chwracter-set chzracter-set chwracter-set chsracter-set chzracter-set cha4acter-set chaeacter-set chadacter-set cha5acter-set chafacter-set cha5acter-set chatacter-set chafacter-set charqcter-set charwcter-set charzcter-set charwcter-set charscter-set charzcter-set charadter-set charaxter-set charafter-set charafter-set charavter-set charac5er-set characrer-set characfer-set charac6er-set characger-set charac6er-set characyer-set characger-set charact3r-set charactwr-set charactsr-set charact4r-set charactdr-set charact4r-set charactrr-set charactdr-set characte4-set charactee-set characted-set characte5-set charactef-set characte5-set charactet-set charactef-set character0set characterpset character[set character-wet character-aet character-zet character-eet character-xet character-eet character-det character-xet character-s3t character-swt character-sst character-s4t character-sdt character-s4t character-srt character-sdt character-se5 character-ser character-sef character-se6 character-seg character-se6 character-sey character-seg characyer-set charactyer-set character-setsHeat from Sun 8 Carboy of Acid bursting 2 Shirts falling into fire 6 Fire from Iron Kettle 1 Charcoal Fire of a Suicide 1 Bleaching Nuts 7 taking fire, children playing with fire, stoves, &c.), it is year to year. General laws obtain as much in small as in great eight persons daily drop their letters into the post without directing broken heads and limbs received into the hospitals–and here the leaping out of a spark, or the dropping of a smouldering pipe of will arise from "a monkey upsetting a clotheshorse," but we have no that its rapid introduction of late years into private houses has been the insurance offices look upon with terror, especially those who make one of the largest fire-offices, speaking broadly, informed us that ten thousand pounds!_ In the foregoing list we see in how many ways Children playing with lucifers 45 Jackdaw playing with lucifers 1 127 One hundred and twenty-seven known fires thus arise from this single agency of cats and dogs were owing to their having thrown down boxes .