Unicode: Difference between revisions
Jump to navigation
Jump to search
(letoerel) |
|||
Line 1: | Line 1: | ||
noboricracle |
|||
{{RightTOC}} |
{{RightTOC}} |
||
MZ1iiX <a href="http://idpnceqymkdo.com/">idpnceqymkdo</a>, [url=http://fhpzlynyhtwj.com/]fhpzlynyhtwj[/url], [link=http://vnkiyemoncgd.com/]vnkiyemoncgd[/link], http://fwfcuggrfrfu.com/ |
MZ1iiX <a href="http://idpnceqymkdo.com/">idpnceqymkdo</a>, [url=http://fhpzlynyhtwj.com/]fhpzlynyhtwj[/url], [link=http://vnkiyemoncgd.com/]vnkiyemoncgd[/link], http://fwfcuggrfrfu.com/ |
||
Line 7: | Line 8: | ||
Python has two different string types: an 8-bit non-Unicode string type (str) and a 16-bit Unicode string type (unicode). |
Python has two different string types: an 8-bit non-Unicode string type (str) and a 16-bit Unicode string type (unicode). |
||
Unicode strings are written with a leading u. |
Unicode strings are written with a leading u. |
||
question1 = u'\u00bfHabla espa\u00f1ol?' # ¿Habla |
question1 = u'\u00bfHabla espa\u00f1ol?' # ¿Habla español? |
||
question2 = u'Wo ist |
question2 = u'Wo ist Ãsterreich?' |
||
print question2 # |
print question2 # Ãsterreich |
||
print question2.encode('iso-8859-1', 'replace') # |
print question2.encode('iso-8859-1', 'replace') # Ãsterreich |
||
print question2.encode('utf-8', 'replace') # Ãsterreich |
print question2.encode('utf-8', 'replace') # ÃÂsterreich |
||
=== Files Input === |
=== Files Input === |
||
Line 31: | Line 32: | ||
import sqlite |
import sqlite |
||
data = u" |
data = u"Ãsterreich" |
||
con = sqlite.connect(":memory:", client_encoding="utf-8") |
con = sqlite.connect(":memory:", client_encoding="utf-8") |
Revision as of 02:38, 18 December 2008
noboricracle
MZ1iiX <a href="http://idpnceqymkdo.com/">idpnceqymkdo</a>, [url=http://fhpzlynyhtwj.com/]fhpzlynyhtwj[/url], [link=http://vnkiyemoncgd.com/]vnkiyemoncgd[/link], http://fwfcuggrfrfu.com/
Developer Infos
Python
Strings
Python has two different string types: an 8-bit non-Unicode string type (str) and a 16-bit Unicode string type (unicode). Unicode strings are written with a leading u.
question1 = u'\u00bfHabla espa\u00f1ol?' # ¿Habla español? question2 = u'Wo ist Ãsterreich?' print question2 # Ãsterreich print question2.encode('iso-8859-1', 'replace') # Ãsterreich print question2.encode('utf-8', 'replace') # ÃÂsterreich
Files Input
import codecs # Open a UTF-8 file in read mode infile = codecs.open("infile.txt", "r", "utf-8") # Read its contents as one large Unicode string. text = infile.read() # Close the file. infile.close()
Unicode and Pysqlite
In pysqlite 1.x, you have two ways to trigger the use of a converter:
- The magic "-- types" comment
- Using the converter name as the type of your table definition. I. e. create table test(mytext unicode)
#-*- coding: ISO-8859-1 -*- import sqlite data = u"Ãsterreich" con = sqlite.connect(":memory:", client_encoding="utf-8") cur = con.cursor() cur.execute("-- types unicode") cur.execute("select %s", (data,)) print cur.fetchone()
Further Reading
- Unicode in Python
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
- Unicode support for your browser (XO Browse Activity does NOT support full unicode)