Character Sets and Unicode in Firebird

[Pages:65]Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 1

Character Sets and Unicode in Firebird

Stefan Heymann

consic.de heymann@consic.de

After a short introduction to the world of Character Sets and Unicode, this session will show you how to bring it all to work in Firebird. You will learn what all those character sets and collations are and how you can properly use them to get the right characters into the database and onto your screen.

Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 2

Topics

Characters Character Sets Unicode Firebird Examples

Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 3

Characters

Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 4

Glyphs vs. Characters

Latin uppercase A

Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 5

Glyph, Character, Character Set

A Glyph is something you can see with your eyes A Character is an abstract concept Rendering of characters as Glyphs is the job of the

rendering machine (Postscript, GDI, TrueType, Web Browser, etc.) We mostly care for processing the characters A Character Set assigns a number to a character:

Uppercase A = 65 Uppercase B = 66 etc.

Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 6

Glyphs

Not all languages display glyphs as a string of leftto-right, contiguous rectangles

Right-to-left (Arabic, Hebrew), top-to-bottom (Japanese, Chinese)

Several characters can ,,melt" into one glyph

Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 7

Character Sets

Firebird Conference 2011 ? Luxembourg

Session: Character Sets and Firebird Speaker: Stefan Heymann Page: 8

ASCII: The Mother of Character Sets

American Standard Code for Information Interchange: ASCII, ISO 646

7 bits, characters ranging from 0 to 127 (00..7F) 32 invisible control characters

(NUL, TAB, CR, LF, FF, BEL, ESC, ...) A..Z, a..z, Digits 0..9, Punctuation (;.-?) Optimized for English Only Latin characters, no accents, no umlauts MIME code: US-ASCII

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download