Python i18n: Difference between revisions
(nav) |
RafaelOrtiz (talk | contribs) m (details) |
||
Line 15: | Line 15: | ||
#* <tt>import gettext</tt> (usually imported as '''<tt>_()</tt>''' in order to avoid clutter and typing) |
#* <tt>import gettext</tt> (usually imported as '''<tt>_()</tt>''' in order to avoid clutter and typing) |
||
#* Instrument the handling of strings to use <tt>gettext()</tt> (ie: <tt>message = 'Begin!'</tt> becomes <tt>message = _('Begin!')</tt>) |
#* Instrument the handling of strings to use <tt>gettext()</tt> (ie: <tt>message = 'Begin!'</tt> becomes <tt>message = _('Begin!')</tt>) |
||
# If using (correctly), the Sugar classes, will take care of proper initialization of <tt>gettext</tt> and the creation of the '''<tt>.POT</tt>''' in the <tt>./po</tt> directory (<tt>setup.py genpot</tt>) |
# If using (correctly), the Sugar classes, will take care of proper initialization of <tt>gettext</tt> and the creation of the '''<tt>.POT</tt>''' in the <tt>./po</tt> directory (This means executing the <tt>setup.py genpot</tt> script) |
||
#* Use '''<tt>xgettext</tt>''' to create a '''<tt>.POT</tt>''' file (ie: <tt>./po/mysource.pot</tt>). |
#* Use '''<tt>xgettext</tt>''' to create a '''<tt>.POT</tt>''' file (ie: <tt>./po/mysource.pot</tt>). |
||
#* If your activity has more than one file, you have to create a '''<tt>./po/POTFILES.in</tt>''' file.<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html"> ''The file <tt>./po/POTFILES.in</tt> specifies which source files should be used for building the '''<tt>.POT</tt>''' and '''<tt>.PO</tt>''' files. It should list the file names, with paths relative to the project root, each on a single line.'' [http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#POTFILES.in-and-POTFILES.skip]</blockquote> |
#* If your activity has more than one file, you have to create a '''<tt>./po/POTFILES.in</tt>''' file.<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html"> ''The file <tt>./po/POTFILES.in</tt> specifies which source files should be used for building the '''<tt>.POT</tt>''' and '''<tt>.PO</tt>''' files. It should list the file names, with paths relative to the project root, each on a single line.'' [http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#POTFILES.in-and-POTFILES.skip]</blockquote> |
Revision as of 17:17, 28 September 2007
Sugar framework |
Python framework |
Localizing an XO |
Keyboards |
Changing language |
Getting started |
Website translation |
modify |
Notice: This is an ongoing tutorial, and some things need be verified in order to ensure that things are done by the book. Take it as a first test-drive of the steps needed to internationalize (i18n) an activity that will later be localized (l10n) to each country/language/region.
Summary
Wikipedia: The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. The processes are complementary, and must be combined to lead to the objective of a system that works globally.
Process overview
- Set up the appropriate directory & file structure.
- Include gettext into your source code. This basically means you have to:
- import gettext (usually imported as _() in order to avoid clutter and typing)
- Instrument the handling of strings to use gettext() (ie: message = 'Begin!' becomes message = _('Begin!'))
- If using (correctly), the Sugar classes, will take care of proper initialization of gettext and the creation of the .POT in the ./po directory (This means executing the setup.py genpot script)
- Use xgettext to create a .POT file (ie: ./po/mysource.pot).
- If your activity has more than one file, you have to create a ./po/POTFILES.in file.
The file ./po/POTFILES.in specifies which source files should be used for building the .POT and .PO files. It should list the file names, with paths relative to the project root, each on a single line. [1]
- Set up & translate to a particular language:
- Use msginit to create a .PO for a specific language code in the ./po directory (ie: ar, es, pt, rw, etc.)
- Actual translation of each of the msgstr into the target language in the file (ie: mysource.es.po, mysource.pt.po, mysource.rw.po, etc.)
- Reintegrate the translated .PO files into the development
- The translated .PO should go in the ./po directory where Sugar will compile it.
- Use of msgfmt to compile the .PO into their corresponding .MO (ie: mysource.es.po into mysource.es.mo)
- The translated .PO should go in the ./po directory where Sugar will compile it.
In the end you should have in the ./po directory, the following:
- One .POT file (ie: myactivity.pot)
- One .PO file per localized language (ie: myactivity.es.po)
- One .MO per .PO file (ie: myactivity.es.mo)
what about LINGUA.in - or something like that - and other Fedora files?
we are using ISO 639-1 and ISO 639-2... at some point we may have to worry about languages not covered by these lists.
Some tips
- Please, use UTF-8 ... use UTF-8, use UTF-8, use UTF-8, use UTF-8, use UTF-8, use UTF-8... ok? But note:
If you use UTF-8 in the translateable strings of your application, you need to add the special [encoding: UTF-8] keyword before the list of source files in the POTFILES.in file of your application. If this isn't done and there are non-ASCII characters present in the translateable strings, xgettext will exit with a fatal error, and so will the build of your application.
- l10n is not just translating strings!
- Although the translation of strings may be an important part of the l10n effort, it's not the only thing that needs to be localized. Things like currency symbols and layout, decimal numbers and delimeters, date & time formats, timezones, units and more are also a big part of localizing software.
- NOTE: you want to ensure also that two XOs localized differently can actually collaborate! Say a brazilian kid meets with an uruguayan and decide to collaborate...
- As a developer, you may want to add some comments to denote the particular sense in which a specific term is used in your source. This will ensure that the translators will be able to translate it properly avoiding ambiguity — ie:
** /* This is the verb, not the noun */ g_printf (_("Profile")); This will automatically turn into this in the pot and po files: #. This is the verb, not the noun #: foo.c:42 msgid "Profile" msgstr ""
- What strings should be localized?
- For starters, everything — which obviously includes anything that the end user will be able to read or lay eyes upon. The answer usually is a bit more fuzzy for 'internal' things. Debugging strings (intended for developers) are usually not localized—although the XO is intended for developer kids with access to the view code ;)— and may be avoided. Error strings, which end up in logs and such, that will be read by local administrators and technical people should be localized—in order to avoid confusion when reporting bugs or problems, it is recommended you ID your log messages (ie: using a number).
Resources for i18n & l10n
- GNU.org gettext manual
- GNOME-i18n
- Internationalising GNOME applications by Malcolm Tredinnick
- GNOME - L10N Guidelines for Developers
Case Study: i18n & l10n of Kuku
Following the WxPython i18n tutorial, I added the following code at the top of my application:
File: kuku.py |
import gettext gettext.install('kuku', './locale', unicode=False) #one line for each language presLan_en = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['en']) presLan_sw = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['sw']) #only install one language - add program logic later presLan_en.install() # presLan_sw.install()# shouldn't it read unicode=True ? User:Xavi |
Here my application is called kuku.py, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file kuku.py - these strings I surrounded with _(). For example>
Before i18n | After i18n |
---|---|
message = 'Begin!' | message = _('Begin!') |
Next I need to create the i18n files. First I create a directory called 'locale' within my activity directory (this is referred to in the above lines (presLan_en ...). The first step is to make a POT file, which I use pygettext.py to process kuku.py
python <path to your python distribution>/Tools/i18n/pygettext.py -o kuku.pot kuku.py
which creates kuku.pot. When first created it looks like
File: kuku.pot |
# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:45+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: ENCODING\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:501 msgid "Begin!" msgstr "" |
The last little bit is the stuff we have to translate. I had to modify the stuff at the top to change the ENCODING and CHARSET. I changed both of these to utf-8, so my file now reads:
File: kuku.pot |
# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:15+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: utf-8\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:500 msgid "Begin!" msgstr "" |
Now I moved kuku.pot to ./locale . Then for each language I want to localize to, I create subdirectories within ./locale according to their language codes. Within each of these subdirectories, I create subdirectories called LC_MESSAGES. For know I am using english and swahili, so my directory structure looks like
locale/ kuku.pot en/ LC_MESSAGES/ sw/ LC_MESSAGES/
Now we do translations. I copied kuku.po into ./locale/en/LC_MESSAGES/kuku.po and ./locale/sw/LC_MESSAGES/kuku.po, and performed the translations:
File: ./locale/en/LC_MESSAGES/kuku.po |
#./locale/en/LC_MESSAGES/kuku.po # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:15+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: utf-8\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:500 msgid "Begin!" msgstr "Begin!" |
File: ./locale/sw/LC_MESSAGES/kuku.po |
#./locale/sw/LC_MESSAGES/kuku.po # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:15+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: utf-8\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:500 msgid "Begin!" msgstr "Kuanza!" |
Now my directory structure looks like
locale/ kuku.pot en/ LC_MESSAGES/ kuku.po sw/ LC_MESSAGES/ kuku.po
One last step before we are ready to go. We need to make the binary files used by gettext. We do that with msgfmt.py:
cd <project path>/locale/en/LC_MESSAGES/ python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po cd <project path>/locale/en/LC_MESSAGES/ python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po
This creates binary .mo files, and now my directory structure looks like:
locale/ kuku.pot en/ LC_MESSAGES/ kuku.po kuku.mo sw/ LC_MESSAGES/ kuku.po kuku.mo
To add new languages, we need to add a subdirectory for each language, perform the translations, create the .mo files, and add the relevant code in the application to select the language.
Sidebar: Getting non-latin text from translation web sites
If you are running Gnome, you can do the following.
Here is an arabic google translation of "hi". Open a gnome-terminal, and run "cat > tmpfile". Cut-and-paste the arabic into the terminal, and thus the tmpfile. This avoids mangling the text as encoding information is lost.
In emacs (and perhaps other editors as well?), insert the tmpfile. And that's it. You can test this all by creating a two line python file,
# -*- coding: utf-8 -*- print u'the string goes here'
And running it in the terminal.
I haven't tried this with po files yet.
Resources
These are the two docs that I used to learn about i18n (with no prior knowledge). Read the WxPython reference first, and instead of using the mki18n.py file mentioned on the WkPython page, use the tools in the Python standard distribution: pygettext.py and msgfmt.py.
See also
- Babel - A collection of tools for internationalizing Python applications.