Introduction to Programming
Computer Programming I
COP 2210
Data Representation
(The ASCII and Unicode Character Sets)
All data stored in a computer must be represented (or, encoded) as bits – 0’s and 1’s.
Standard codes were established so that data could be exchanged among different kinds of computers easily.
The two most common coding schemes for characters are the ASCII and Unicode codes, although there are others.
I. ASCII
• The American Standard Code for Information Interchange encodes each character as a unique 8-bit pattern.
• For example, the ASCII code for the ‘A’ is 0100 0001
0100 00012 = 6510
(0100 0001 in the base 2 is 65 in the base 10)
So we say that the ‘A’ is ASCII character number 65.
• Here are some famous ASCII characters:
32. (Space or Blank)
48..57 ‘0’..‘9’ (digits)
65..90 ‘A’..‘Z’ (UPPERCASE LETTERS)
97..122 ‘a’..‘z’ (lowercase letters)
(others are punctuation or “special” characters, and ASCII chars 0 to 31 are unprintable “control” chars)
• If you have N bits, then you can have 2N different combinations
For example, suppose we have only 2 bits. Then we can have 4 (i.e., 22) different combinations:
00 01 10 11
• Since ASCII is an 8-bit code, there are 256 (28) different ASCII characters. This is more than enough for the English language, but what about all the other languages?
• Although some other languages use the same characters as English, many do not. For example, Greek, Hebrew, Arabic, etc., etc. In the Chinese language, the individual characters (or glyphs) do not represent sounds at all but things or ideas, and there are over 60,000 different ones (although only about 20,000 are commonly used today).
II. Unicode
• Unicode - the successor to ASCII - is a 16-bit code. That is, it assigns a unique pattern of 16 bits to each character.
• Given 16 bits, there are 216 or 65,536 possible combinations, enough for all the known languages in this quadrant of the galaxy!
• In the Java language, all chars are stored as 16-bit Unicode characters
• In the interest of compatibility, the first 256 Unicode chars are the ASCII chars
• Unicode chars that do not appear on the keyboard may be specified as the escape sequence \u followed by four hexadecimal digits (The hexadecimal, or base 16, number system will be covered in CDA 3103 – Fundamentals of Computer Systems)
For example, \u00BF is the code for the “¿” character
III. Etc.
• For all coding schemes, these 3 relationships hold:
The digits (‘0’..’9’) are consecutive and in order
Uppercase letters (‘A’..‘Z’) are consecutive and in order
Lowercase letters (‘a’..‘z’) are consecutive and in order
• In text files (not binary files) all characters – letters, digits, punctuation - are stored as ASCII/Unicode characters
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- special characters for the web usda
- ascii values and characters
- introduction to programming
- wawf first line haul mode codes
- non ascii character translation conversion list
- assignment 4
- complete list of ascii codes format word document www
- chapter 6 standards and conventions
- documentation
- appendix 6 35 irrd dd form 1348 1a with code 39 three
Related searches
- introduction to financial management pdf
- introduction to finance
- introduction to philosophy textbook
- introduction to philosophy pdf download
- introduction to philosophy ebook
- introduction to marketing student notes
- introduction to java programming pdf
- how to cite introduction to sociology 2e
- introduction to java programming and data structures
- introduction to java programming 10th
- introduction to java programming liang
- introduction to java programming ppt