Information, Characters, Unicode
Information, Characters, Unicode
Unicode
? 3 April 2023
1/1
Hidden Moral
Small mistakes can be catastrophic! Style
Care about every character of your program.
Tip: printf Care about every character in the program's output.
(Be reasonably tolerant and defensive about the input. "Fail early" and clearly.)
Unicode
? 3 April 2023
2/1
Imperative
Thou s halt care about every aracter in your program.
Unicode
? 3 April 2023
3/1
Corollaries
Thou s halt know every aracter in the input.
Thou s halt care about every aracter in your output.
Unicode
? 3 April 2023
4/1
Information ? Characters
In modern computing, natural-language text is very important information. ("number-crunching" is less important.) Characters of text are represented in several different ways and a known character encoding is necessary to exchange text information. For many years an important encoding standard for characters has been US ASCII?a 7-bit encoding. Since 7 does not divide 32, the ubiquitous word size of computers, 8-bit encodings are more common. Very common is ISO 8859-1 aka "Latin-1," and other 8-bit encodings of characters sets for languages other than English. Currently, a very large multi-lingual character repertoire known as Unicode is important.
Unicode
Character Sets
? 3 April 2023
5/1
Information ? Characters
Bits are not information until the relevant parties agree and what they represent. A standard is required to successfully communicate a character of text. The bits are mostly arbitrary choices.
binary oct dec hex char 0110 0001 041 97 0x61 a the letter `a' 0110 0010 042 98 0x62 b the letter `b' 0110 0011 043 99 0x63 c the letter `c'
Blocks of n bits have 2n different bit patterns and so 2n characters can be represented.
Unicode
Character Sets
? 3 April 2023
6/1
ASCII (American Standard Code for Information Interchange), is a 7-bit character encoding standard for digital communication. It has defined 27 = 128 bit patterns.
It was one of the first standards for encoding symbols (letters, numbers, and punctuation used in English text). This fixed-width encoding evolved in the 1960s by the institution for standards for the United States. It has been in widespread use for information exchange ever since, but now supplanted by other standards. A survey (2023) suggests that US-ASCII is used by far less than 1% of websites and UTF-8 (described later) by 98% of websites ( ). (But UTF-8 retains US-ASCII.)
The Internet Assigned Numbers Authority (IANA) prefers the name US-ASCII for this character encoding.
Unicode
Character Sets
? 3 April 2023
7/1
Some US-ASCII Characters
Each character has a unique bit pattern used to represent it (and a Unicode name as we shall see later).
binary oct dec char
Unicode
0000 1001 0011 9 HT U+0009 horizontal tabulation
0010 0000 0040 32
U+0020 space
0010 1110 0056 46 . U+002E full stop
0010 1111 0057 47 / U+002F solidus
0011 0000 0060 48 0 U+0030 digit zero
0011 0001 0061 49 1 U+0031 digit one
Although 8 bits are shown above, only 7 bits are used in the US-ASCII standard.
Unicode
Character Sets
? 3 April 2023
8/1
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- unicode mathematical alphanumeric symbols
- unicode union symbol
- unicode symbols keyboard
- unicode utf 8 decoder
- unicode to utf 8 online
- unicode utf 8 utf 16
- unicode to utf 8 converter
- unicode character list
- unicode vs utf 8
- python convert unicode to ascii
- convert hex to unicode char
- convert unicode to hexadecimal