Unicode in Python - UiO
Unicode in Python
Simon Funke, Center for Biomedical Computing, Simula Research Laboratory & Dept. of Informatics, University of Oslo,
based on Kumar McMillan, Unicode In Python, Completely Demystied
Sep 22, 2015
Introduction
Unicode is useful if you want to handle non-English languages in your program. Seen this before?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 10: ordinal not in range(128)
Then you are not handling strings correctly in Python!
Some important terms
Unicode: Unicode is a coded character set. It denes all
characters of mayjor languages today, and denes a mapping between these characters and integer codes representing them.
UTF-8: UTF-8 is a character encoding capable of encoding
all possible characters, or code points, in Unicode.
ASCII: ASCII is an old character encoding. It species the
characters used in the English language into numbers ranging from 0 to 127.
When saving a string to a le/database/... Python needs to encode
the string with a character encoding.
When reading a string to a le/database/... Python needs to
decode the string with a character encoding.
The same unicode string might have dierent representations for dierent character encodings.
Bokm?l
Lets read a UTF-8 le with the word Bokm?l.
#!/usr/bin/env python import sys # wget # noreg.txt is encoded in the UTF-8 character encoding f = open("noreg.txt", "r") s_utf8 = f.readline().split("\t")[12] s_utf8 # Out: 'Bokm\xc3\xa5l' type(s_utf8) # Out: str
s_utf8 is a string encoded in UTF-8 format
The encoding assigns a numeric value to each character Note that ? takes 2 bytes Python supports many encodings (over 100) See the default encoding with sys.getdefaultencoding(). It is typically 'ascii'.
Python string types (Python 2)
| +-- | +--
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- sort dictionary in python by values
- shape in python numpy
- array shape in python numpy
- str in python example
- join in python using on
- replace character in python string
- create a matrix in python using for
- random generator in python examples
- create matrix in python numpy
- install numpy in python 2 7
- tuple in python example
- numpy in python tutorial