Python i18n: Difference between revisions

From OLPC
Jump to navigation Jump to search
(Added a 'how to get non-latin text from translation web sites'.)
(expanded process)
Line 2: Line 2:


__TOC__
__TOC__
== Summary ==
== Localization within Python/Pygame ==

<blockquote>
; [http://en.wikipedia.org/wiki/Internationalization_and_localization Wikipedia] : The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. The processes are complementary, and must be combined to lead to the objective of a system that works globally.
</blockquote>

== Process ==

# Set up the appropriate directory & file structure.
# Include [http://en.wikipedia.org/wiki/Gettext gettext] into your code. This basically means you have to:
#* <tt>import gettext</tt> within your source and configure it.
#* Instrument the handling of strings in the source code to use <tt>gettext()</tt>
#** (usually imported as '''<tt>_()</tt>''' in order to avoid clutter and typing)
#** ie: <tt>message = 'Begin!'</tt> becomes <tt>message = _('Begin!')</tt>
# Use '''<tt>xgettext</tt>''' to create a '''<tt>.POT</tt>''' file (ie: <tt>mysource.pot</tt>).
# Set up & translate to a particular language:
#* Use '''<tt>msginit</tt>''' to create a '''<tt>.PO</tt>''' for a specific [[ISO 639|language code]] (ie: <tt>ar, es, pt, rw,</tt> etc.)
#* Actual translation of each of the <tt>msgstr</tt> into the target language in the file (ie: <tt>mysource.es.po</tt>, <tt>mysource.pt.po</tt>, <tt>mysource.rw.po</tt>, etc.)
#* Use of '''<tt>msgfmt</tt>''' to compile the '''<tt>.PO</tt>''' into their corresponding '''<tt>.MO</tt>''' (ie: <tt>mysource.es.po</tt> into <tt>mysource.es.mo</tt>
#* Put the files in the appropriate directories (verifying the environment variables <tt>LC_MESSAGES</tt>, <tt>LANG</tt> & <tt>LANGUAGES</tt>) {{Pending|verify '''which''' are really used}}
#** '''<tt>.POT</tt>''' into <tt>./locale</tt>
#** '''<tt>.PO</tt>''' & '''<tt>.MO</tt>''' into <tt>./locale/[[ISO 639|xx]]</tt> (ie: <tt>./locale/'''es'''</tt> for <tt>mysources.'''es'''.po & mysources.'''es'''.mo</tt>)

{{Pending|what about <tt>LINGUA.in</tt> - or something like that - and other Fedora files?}}

== Case Study: i18n & l10n of [[Kuku]] ==


Following the [http://wiki.wxpython.org/Internationalization WxPython i18n tutorial], I added the following code at the top of [[Kuku|my application]]:
Following the [http://wiki.wxpython.org/Internationalization WxPython i18n tutorial], I added the following code at the top of [[Kuku|my application]]:


{{ Box File | kuku.py | 2=
<pre>
import gettext
import gettext
gettext.install('kuku', './locale', {{Pending|1=unicode=False}}) <!-- should it read 'unicode=True' ? [[User:Xavi]] -->
gettext.install('kuku', './locale', unicode=False)
#one line for each language
#one line for each language
Line 16: Line 43:
presLan_en.install()
presLan_en.install()
# presLan_sw.install()
# presLan_sw.install()
</pre>
{{Pending|1=# shouldn't it read '''unicode=True''' ? [[User:Xavi]]}}}}


Here my application is called <tt>kuku.py</tt>, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file kuku.py - these strings I surrounded with
Here my application is called <tt>kuku.py</tt>, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file <tt>kuku.py</tt> - these strings I surrounded with
<tt>_()</tt>. For example
<tt>'''_()'''</tt>. For example>


{|
<pre>
|-
message = _('Begin!')
! Before i18n !! After i18n
</pre>
|-
| <tt>message = 'Begin!'</tt>
| <tt>message = '''_(''''Begin!'''')'''</tt>
|}


Next I need to create the i18n files. First I create a directory called '<tt>locale</tt>' within my activity directory (this is referred to in the above lines (<tt>presLan_en ...</tt>). The first step is to make a pot file, which I use <tt>pygettext.py</tt> to process <tt>kuku.py</tt>
Next I need to create the i18n files. First I create a directory called '<tt>locale</tt>' within my activity directory (this is referred to in the above lines (<tt>presLan_en ...</tt>). The first step is to make a '''POT''' file, which I use <tt>pygettext.py</tt> to process <tt>kuku.py</tt>


<pre>
<pre>
Line 32: Line 65:
which creates kuku.pot. When first created it looks like
which creates kuku.pot. When first created it looks like


{{ Box File | kuku.pot | 2=
<pre>
<pre>
# SOME DESCRIPTIVE TITLE.
# SOME DESCRIPTIVE TITLE.
Line 53: Line 87:
msgid "Begin!"
msgid "Begin!"
msgstr ""
msgstr ""
</pre>
</pre>}}


The last little bit is the stuff we have to translate. I had to modify the stuff at the top to change the <tt>ENCODING</tt> and <tt>CHARSET</tt>. I changed both of these to utf-8, so my file now reads:
The last little bit is the stuff we have to translate. I had to modify the stuff at the top to change the <tt>ENCODING</tt> and <tt>CHARSET</tt>. I changed both of these to utf-8, so my file now reads:


{{ Box File | kuku.pot | 2=
<pre>
<pre>
# SOME DESCRIPTIVE TITLE.
# SOME DESCRIPTIVE TITLE.
Line 78: Line 113:
msgid "Begin!"
msgid "Begin!"
msgstr ""
msgstr ""
</pre>
</pre>}}


Now I moved <tt>kuku.pot</tt> to <tt>./locale</tt> . Then for each language I want to localize to, I create subdirectories within <tt>./locale</tt> according to their [http://www.w3.org/WAI/ER/IG/ert/iso639.htm language codes]. Within each of these subdirectories, I create subdirectories called <tt>LC_MESSAGES</tt>. For know I am using english and swahili, so my directory structure looks like
Now I moved <tt>kuku.pot</tt> to <tt>./locale</tt> . Then for each language I want to localize to, I create subdirectories within <tt>./locale</tt> according to their [http://www.w3.org/WAI/ER/IG/ert/iso639.htm language codes]. Within each of these subdirectories, I create subdirectories called <tt>LC_MESSAGES</tt>. For know I am using english and swahili, so my directory structure looks like
Line 91: Line 126:
</pre>
</pre>


Now we do translations. I copied <tt>kuku.pot</tt> into <tt>./locale/en/LC_MESSAGES/kuku.po</tt> and <tt>./locale/sw/LC_MESSAGES/kuku.po</tt>, and performed the translations:
Now we do translations. I copied <tt>kuku.po</tt> into <tt>./locale/en/LC_MESSAGES/kuku.po</tt> and <tt>./locale/sw/LC_MESSAGES/kuku.po</tt>, and performed the translations:


{{ Box File | ./locale/'''en'''/LC_MESSAGES/kuku.po | 2=
<pre>
<pre>
#./locale/en/LC_MESSAGES/kuku.po
#./locale/en/LC_MESSAGES/kuku.po
Line 115: Line 151:
msgid "Begin!"
msgid "Begin!"
msgstr "Begin!"
msgstr "Begin!"
</pre>
</pre>}}


{{ Box File | ./locale/'''sw'''/LC_MESSAGES/kuku.po | 2=
<pre>
<pre>
#./locale/sw/LC_MESSAGES/kuku.po
#./locale/sw/LC_MESSAGES/kuku.po
Line 139: Line 176:
msgid "Begin!"
msgid "Begin!"
msgstr "Kuanza!"
msgstr "Kuanza!"
</pre>
</pre>}}


Now my directory structure looks like
Now my directory structure looks like
Line 145: Line 182:
<pre>
<pre>
locale/
locale/
kuku.pot
kuku.pot
en/
en/
LC_MESSAGES/
LC_MESSAGES/
kuku.po
kuku.po
sw/
sw/
LC_MESSAGES/
LC_MESSAGES/
kuku.po
kuku.po
</pre>
</pre>


One last step before we are ready to go. We need to make the binary files used by <tt>gettext</tt>. We do that with <tt>msgfmt.py</tt>:
One last step before we are ready to go. We need to make the binary files used by <tt>gettext</tt>. We do that with <tt>msgfmt.py</tt>:


cd ''<project path>''/locale/en/LC_MESSAGES/
<pre>
python ''<path to your python distribution>''/Tools/i18n/msgfmt.py kuku.po
cd <project path>/locale/en/LC_MESSAGES/
cd ''<project path>''/locale/en/LC_MESSAGES/
python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po
python ''<path to your python distribution>''/Tools/i18n/msgfmt.py kuku.po
cd <project path>/locale/en/LC_MESSAGES/
python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po
</pre>


This creates binary <tt>.mo</tt> files, and now my directory structure looks like:
This creates binary <tt>.mo</tt> files, and now my directory structure looks like:
Line 167: Line 202:
<pre>
<pre>
locale/
locale/
kuku.pot
kuku.pot
en/
en/
LC_MESSAGES/
LC_MESSAGES/
kuku.po
kuku.po
kuku.mo
kuku.mo
sw/
sw/
LC_MESSAGES/
LC_MESSAGES/
kuku.po
kuku.po
kuku.mo
kuku.mo
</pre>
</pre>



Revision as of 00:14, 14 September 2007

Notice: This is an ongoing tutorial, and some things need be verified in order to ensure that things are done by the book. Take it as a first test-drive of the steps needed to internationalize (i18n) an activity that will later be localized (l10n) to each country/language/region.

Summary

Wikipedia
The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. The processes are complementary, and must be combined to lead to the objective of a system that works globally.

Process

  1. Set up the appropriate directory & file structure.
  2. Include gettext into your code. This basically means you have to:
    • import gettext within your source and configure it.
    • Instrument the handling of strings in the source code to use gettext()
      • (usually imported as _() in order to avoid clutter and typing)
      • ie: message = 'Begin!' becomes message = _('Begin!')
  3. Use xgettext to create a .POT file (ie: mysource.pot).
  4. Set up & translate to a particular language:
    • Use msginit to create a .PO for a specific language code (ie: ar, es, pt, rw, etc.)
    • Actual translation of each of the msgstr into the target language in the file (ie: mysource.es.po, mysource.pt.po, mysource.rw.po, etc.)
    • Use of msgfmt to compile the .PO into their corresponding .MO (ie: mysource.es.po into mysource.es.mo
    • Put the files in the appropriate directories (verifying the environment variables LC_MESSAGES, LANG & LANGUAGES) verify which are really used
      • .POT into ./locale
      • .PO & .MO into ./locale/xx (ie: ./locale/es for mysources.es.po & mysources.es.mo)

what about LINGUA.in - or something like that - and other Fedora files?

Case Study: i18n & l10n of Kuku

Following the WxPython i18n tutorial, I added the following code at the top of my application:

 File: kuku.py
 import gettext
 gettext.install('kuku', './locale', unicode=False)
 
 #one line for each language
 presLan_en = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['en'])
 presLan_sw = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['sw'])
 
 #only install one language - add program logic later
 presLan_en.install()
 # presLan_sw.install()
# shouldn't it read unicode=True ? User:Xavi

Here my application is called kuku.py, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file kuku.py - these strings I surrounded with _(). For example>

Before i18n After i18n
message = 'Begin!' message = _('Begin!')

Next I need to create the i18n files. First I create a directory called 'locale' within my activity directory (this is referred to in the above lines (presLan_en ...). The first step is to make a POT file, which I use pygettext.py to process kuku.py

python <path to your python distribution>/Tools/i18n/pygettext.py -o kuku.pot kuku.py

which creates kuku.pot. When first created it looks like

 File: kuku.pot
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:45+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:501
msgid "Begin!"
msgstr ""

The last little bit is the stuff we have to translate. I had to modify the stuff at the top to change the ENCODING and CHARSET. I changed both of these to utf-8, so my file now reads:

 File: kuku.pot
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:15+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: utf-8\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:500
msgid "Begin!"
msgstr ""

Now I moved kuku.pot to ./locale . Then for each language I want to localize to, I create subdirectories within ./locale according to their language codes. Within each of these subdirectories, I create subdirectories called LC_MESSAGES. For know I am using english and swahili, so my directory structure looks like

locale/
  kuku.pot
  en/
    LC_MESSAGES/
  sw/
    LC_MESSAGES/

Now we do translations. I copied kuku.po into ./locale/en/LC_MESSAGES/kuku.po and ./locale/sw/LC_MESSAGES/kuku.po, and performed the translations:

 File: ./locale/en/LC_MESSAGES/kuku.po
#./locale/en/LC_MESSAGES/kuku.po
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:15+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: utf-8\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:500
msgid "Begin!"
msgstr "Begin!"
 File: ./locale/sw/LC_MESSAGES/kuku.po
#./locale/sw/LC_MESSAGES/kuku.po
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2007-06-19 17:15+EDT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: utf-8\n"
"Generated-By: pygettext.py 1.5\n"


#: kuku.py:500
msgid "Begin!"
msgstr "Kuanza!"

Now my directory structure looks like

locale/
       kuku.pot
       en/
          LC_MESSAGES/
                      kuku.po
       sw/
          LC_MESSAGES/
                      kuku.po

One last step before we are ready to go. We need to make the binary files used by gettext. We do that with msgfmt.py:

cd <project path>/locale/en/LC_MESSAGES/
python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po 
cd <project path>/locale/en/LC_MESSAGES/
python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po

This creates binary .mo files, and now my directory structure looks like:

locale/
       kuku.pot
       en/
          LC_MESSAGES/
                      kuku.po
                      kuku.mo
       sw/
          LC_MESSAGES/
                      kuku.po
                      kuku.mo

To add new languages, we need to add a subdirectory for each language, perform the translations, create the .mo files, and add the relevant code in the application to select the language.

Sidebar: Getting non-latin text from translation web sites

If you are running Gnome, you can do the following.

Here is an arabic google translation of "hi". Open a gnome-terminal, and run "cat > tmpfile". Cut-and-paste the arabic into the terminal, and thus the tmpfile. This avoids mangling the text as encoding information is lost.

In emacs (and perhaps other editors as well?), insert the tmpfile. And that's it. You can test this all by creating a two line python file,

# -*- coding: utf-8 -*-
print u'the string goes here'

And running it in the terminal.

I haven't tried this with po files yet.

Resources

These are the two docs that I used to learn about i18n (with no prior knowledge). Read the WxPython reference first, and instead of using the mki18n.py file mentioned on the WkPython page, use the tools in the Python standard distribution: pygettext.py and msgfmt.py.

Python Reference

WxPython i18n

See also

  • Babel - A collection of tools for internationalizing Python applications.