Python Unicode: Difference between revisions

From OLPC
Jump to navigation Jump to search
m (re-categorization)
m (links)
Line 1: Line 1:
Python has good unicode support, but it is not necessarily ''easy'' to use. Some things to note:
Python has good [[unicode]] support, but it is not necessarily ''easy'' to use. Some things to note:


* You ''must'' test your application with real Unicode (not ASCII-encodable) text. You can miss lots of bugs if you just use normal ASCII text (i.e., a-z, no accents).
* You ''must'' test your application with real Unicode (not ASCII-encodable) text. You can miss lots of bugs if you just use normal ASCII text (i.e., a-z, no accents).
Line 9: Line 9:
==Resources==
==Resources==


Some resources to learn about Unicode:
Some resources to learn about [[Unicode]]:


* [http://joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)] by Joel Spolsky -- general Unicode information
* [http://joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)] by Joel Spolsky -- general Unicode information

Revision as of 02:27, 12 January 2007

Python has good unicode support, but it is not necessarily easy to use. Some things to note:

  • You must test your application with real Unicode (not ASCII-encodable) text. You can miss lots of bugs if you just use normal ASCII text (i.e., a-z, no accents).
  • You should be careful not to confuse 8-bit strings (that contain binary data and are of type "str"), and text (that contains unicode data and is of type "unicode"). It's easy to substitute one for the other, until you use non-ASCII text, then you'll get a UnicodeEncode/DecodeError.
  • The codecs module has some helpers for reading unicode from files.

Resources

Some resources to learn about Unicode: