Python i18n: Difference between revisions
(→Process: some tips) |
|||
(36 intermediate revisions by 13 users not shown) | |||
Line 1: | Line 1: | ||
{{l10n-nav}} |
|||
<big>Notice:</big> This is an ongoing tutorial, and some things need be verified in order to ensure that things are ''done by the book''. Take it as a first test-drive of the steps needed to internationalize (i18n) an activity that will later be localized (l10n) to each country/language/region. |
|||
__TOC__ |
__TOC__ |
||
== Summary == |
== Summary == |
||
<blockquote cite="http://en.wikipedia.org/wiki/Internationalization_and_localization"> |
|||
<blockquote> |
|||
'''[http://en.wikipedia.org/wiki/Internationalization_and_localization Wikipedia:]''' The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. The processes are complementary, and must be combined to lead to the objective of a system that works globally. |
|||
</blockquote> |
</blockquote> |
||
For those new to the topic, "i18n" is a short-hand for "Internationalization" referring to the 18 letters between the I and and N. Similarly, "l10n" is shorthand for "localization". Internationalization prepares an application for localization and must be done once per application. Localization provides local translations and other help and must be done once per locale, or language, into which the application will be deployed. |
|||
== Process == |
|||
== Overview of internationalization == |
|||
# Set up the appropriate directory & file structure. |
|||
# Include [http://en.wikipedia.org/wiki/Gettext gettext] into your code. This basically means you have to: |
|||
#* <tt>import gettext</tt> within your source and configure it. |
|||
#* Instrument the handling of strings in the source code to use <tt>gettext()</tt> |
|||
#** (usually imported as '''<tt>_()</tt>''' in order to avoid clutter and typing) |
|||
#** ie: <tt>message = 'Begin!'</tt> becomes <tt>message = _('Begin!')</tt> |
|||
# Use '''<tt>xgettext</tt>''' to create a '''<tt>.POT</tt>''' file (ie: <tt>mysource.pot</tt>). |
|||
# Set up & translate to a particular language: |
|||
#* If your activity has more than one file, you have to create a '''<tt>./po/POTFILES.in</tt>''' file.<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html"> ''The file <tt>./po/POTFILES.in</tt> specifies which source files should be used for building the <tt>.POT</tt> and '''<tt>.PO</tt>''' files. It should list the file names, with paths relative to the project root, each on a single line.'' [http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#POTFILES.in-and-POTFILES.skip]</blockquote> |
|||
#* Use '''<tt>msginit</tt>''' to create a '''<tt>.PO</tt>''' for a specific [[ISO 639|language code]] in the <tt>./po</tt> directory (ie: <tt>ar, es, pt, rw,</tt> etc.) |
|||
#* Actual translation of each of the <tt>msgstr</tt> into the target language in the file (ie: <tt>mysource.es.po</tt>, <tt>mysource.pt.po</tt>, <tt>mysource.rw.po</tt>, etc.) |
|||
#* Use of '''<tt>msgfmt</tt>''' to compile the '''<tt>.PO</tt>''' into their corresponding '''<tt>.MO</tt>''' (ie: <tt>mysource.es.po</tt> into <tt>mysource.es.mo</tt> |
|||
#* Put the files in the appropriate directories (verifying the environment variables <tt>LC_MESSAGES</tt>, <tt>LANG</tt> & <tt>LANGUAGES</tt>) {{Pending|verify '''which''' are really used}} |
|||
#** '''<tt>.POT</tt>''' into <tt>./po</tt> |
|||
#** '''<tt>.PO</tt>''' & '''<tt>.MO</tt>''' into <tt>./po/[[ISO 639|xx]]</tt> (ie: <tt>./po/'''es'''</tt> for <tt>mysources.'''es'''.po & mysources.'''es'''.mo</tt>) |
|||
There are two major tasks that require deep knowledge in creating a translation: knowing which strings to translate and knowing the correct translations. All the other internationalization are a mechanical cookbook of cut-and-paste code with some system administration tasks. |
|||
{{Pending|what about <tt>LINGUA.in</tt> - or something like that - and other Fedora files?}} |
|||
Knowing which strings to translate separates human language strings from internal computer strings. This 'instrumenting' requires the knowledge of the developer and '''every developer should do this during development'''. Fortunately, it's just a matter of wrapping strings in a function call. Here is an example: |
|||
=== Some tips === |
|||
from gettext import gettext as _ |
|||
* Please, use UTF-8 ... use UTF-8, use UTF-8, use UTF-8, use UTF-8, use UTF-8, use UTF-8... ok? '''But note:''' |
|||
# Naming a function _ may seem odd, but is saves typing and |
|||
*: <blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html">If you use UTF-8 in the translateable strings of your application, you need to add the special [encoding: UTF-8] keyword before the list of source files in the POTFILES.in file of your application. If this isn't done and there are non-ASCII characters present in the translateable strings, xgettext will exit with a fatal error, and so will the build of your application.</blockquote> |
|||
# improves code readability. |
|||
* l10n is '''not''' just translating strings! |
|||
*: Although the translation of strings may be an important part of the l10n effort, it's not the only thing that needs to be localized. Things like currency symbols and layout, decimal numbers and delimeters, date & time formats, timezones, units and more are also a big part of localizing software. |
|||
print _("Greetings") |
|||
*: '''NOTE:''' you want to ensure also that two XOs localized differently can actually collaborate! Say a brazilian kid meets with an uruguayan and decide to collaborate... |
|||
# prints the word "Greetings" in the language being used. |
|||
# This is really a function call to the function gettext in module gettext. |
|||
# If no other internationalization is done, this will just print "Greetings" |
|||
greetings_file = file("greetings", "r") |
|||
# Open the file named "greetings" for read access. This "greetings" and "r" |
|||
# should not be translated because they are internal to the computer, the |
|||
# file system, or the operating system. These strings are not wrapped with |
|||
# function calls. |
|||
... |
|||
The second major activity is translating. Translating is a human activity requiring creativity with limited context. Each human language and perhaps for each language for each region can have its own translation. For example, the word "greetings" might be entered as follows: |
|||
{| class="wikitable" |
|||
|- |
|||
! Language and region |
|||
! Locale identifier |
|||
! Translation of "Greetings" |
|||
|- |
|||
| English in the United States |
|||
| en-US |
|||
| Greetings |
|||
|- |
|||
| English in Australia |
|||
| en-AU |
|||
| G'day |
|||
|- |
|||
| French in France |
|||
| fr-FR |
|||
| Bonjour |
|||
|- |
|||
| French in Canada |
|||
| fr-CA |
|||
| Bonjour, eh? |
|||
|- |
|||
| Spanish, where no regional translation available |
|||
| es |
|||
| Hola |
|||
|} |
|||
The rest of this article describes the process and mechanisms for getting the program to print out the correct translation. |
|||
== Process overview == |
|||
This process is fairly mechanical. The details may change with future Sugar releases as they have with previous releases. As a consequence, different existing projects show some variations with the same goal. The goal is to get translated strings into <tt>.po</tt> files, which are then translated into binary <tt>.mo</tt> files in a directory named something like |
|||
''<tt>./locale/xx-yy/LC_MESSAGES/my_application.mo</tt>'' where ''<tt>xx-yy</tt>'' is the country and region (locale code) of the translation. |
|||
* If you have not done so, instrument your source code, as described [[Python_i18n#Overview_of_internationalization|above]]. This basically means you have to: |
|||
** <tt>import gettext</tt> (usually imported as '''<tt>from gettext import gettext as _</tt>''' in order to avoid clutter and typing) |
|||
** Instrument the handling of strings to use <tt>gettext()</tt> (ie: <tt>message = 'Begin!'</tt> becomes <tt>message = _('Begin!')</tt>) |
|||
* Set up a subdirectory named '''<tt>./po</tt>''' from the head of your project. |
|||
* Create a '''<tt>./po/POTFILES.in</tt>''' file that contains an encoding line and the name of each file in your source tree.[http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#POTFILES.in-and-POTFILES.skip] |
|||
**Use UTF-8 as your encoding string. Yes, UTF-8. That's UTF-8. UTF-16 is right out. Odd crashes will occur. |
|||
**Failure to include an encoding string will cause <tt>xgettext</tt> to fail later or just cause random crashes in your code. |
|||
**Here is a sample <tt>POTFILES.in</tt> file: |
|||
encoding: UTF-8 |
|||
calculate.py |
|||
eqnparser.py |
|||
* Create a '''<tt>.POT</tt>''' (gettext Portable Object Template) which contains all the instrumented strings you wish to translate. That is, all strings for which you called <tt>gettext</tt>. |
|||
** If using the Sugar classes correctly, they will take care of proper initialization of <tt>gettext</tt> and the creation of the '''<tt>.POT</tt>''' in the <tt>./po</tt> directory. This occurs when you execute the <tt>setup.py genpot</tt> script. Manually, you can: |
|||
** Use '''<tt>xgettext</tt>''' to create a '''<tt>.POT</tt>''' file. That is, <tt>cd po; xgettext -f POTFILES.in ../mysource.py --output=mysource.pot</tt>). |
|||
*Set up & translate to a particular language: |
|||
** Use '''<tt>msginit</tt>''' (from console) to create a '<tt>.po</tt>' (gettext Portable Object) for a specific [[ISO 639|language code]] in the <tt>./po</tt> directory (ie: <tt>ar, es, pt_BR, rw,</tt> etc.). For example, '''<tt>cd po; msginit -l es</tt>''' generates the '<tt>es.po</tt>' spanish .po file for your activity. This contains a filled-in '<tt>msgid</tt>' for each instrumented string and an empty '<tt>msgstr</tt>' to hold each translation. |
|||
** Edit the <tt>.po</tt> file to provide the actual translation of each <tt>msgstr</tt> into the target language. Repeat for each file (ie: <tt>mysource.es.po</tt>, <tt>mysource.pt.po</tt>, <tt>mysource.rw.po</tt>, etc.) |
|||
* Reintegrate the translated '''<tt>.po</tt>''' files into the development |
|||
** The translated '''<tt>.po</tt>''' should go in the <tt>./po</tt> directory where Sugar will compile it. |
|||
*** Use of '''<tt>msgfmt</tt>''' to compile the '''<tt>.po</tt>''' into their corresponding '''<tt>.mo</tt>''' (gettext Machine Object). For example, <tt>mysource.es.po</tt> compiles into <tt>mysource.es.mo</tt>. |
|||
In the end you should have in the <tt>./po</tt> directory, the following: |
|||
* One '''<tt>.POT</tt>''' file (ie: <tt>myactivity.pot</tt>) |
|||
* One '''<tt>.PO</tt>''' file per localized language (ie: <tt>myactivity.es.po</tt>) |
|||
* One '''<tt>.MO</tt>''' per '''<tt>.PO</tt>''' file (ie: <tt>myactivity.es.mo</tt>) |
|||
* A set of installed '''<tt>.MO</tt>''' files somewhere (ie: <tt>/usr/share/locale/es/LC_MESSAGES/myactivity.mo</tt>) per localized language |
|||
== Some tips == |
|||
* This process barely works because the process does not cope with code maintenance. For example, it can be difficult to update a <tt>.PO</tt> file without losing the previously entered translations. Various options on the tools, and other tools, must exist, but no clear consensus has arisen. |
|||
* [[Pootle]] is an online tool used in much of the translation work for XO applications. It tracks progress on translations and automates much of the mechanical management. You can check out a running Pootle server at http://translate.sugarlabs.org. |
|||
* More on using UTF-8: |
|||
<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html">If you use UTF-8 in the translateable strings of your application, you need to add the special [encoding: UTF-8] keyword before the list of source files in the POTFILES.in file of your application. If this isn't done and there are non-ASCII characters present in the translateable strings, xgettext will exit with a fatal error, and so will the build of your application.</blockquote> |
|||
* Localization, l10n, is '''not''' just translating strings! |
|||
** Translation of strings is an important part of the localization effort. It makes an application useful, but not elegant, in a foreign language. |
|||
** Other localizations the developer will need to insert include decimal numbers (e.g., "4,242.10" in English versus "4.242,10" in German), currency symbols and layout, date and time formats, and choices of units. |
|||
** Some localization help can be found using the [http://docs.python.org/lib/module-locale.html Python locale module.]. This provides a method for automatically converting decimal numbers, date and time formats, and learning the appropriate currency and unit formats. |
|||
** Localization, or L10N, can encompass many "soft" optimizations relating to culturally specific information or layout. This is usually classified as either "extra credit" or "trying too hard". |
|||
** Two other tools used in L10N are Cairo and Pango. These are both used in creating high quality fonts with correct layouts. |
|||
* Some applications may want to ensure also that two XOs localized differently can actually collaborate. For example, a Brazilian kid meets an Uruguayan and decides to collaborate. This can be an added level of complexity if translated strings are passed around. |
|||
* As a developer, you may want to add some comments to denote the particular sense in which a specific term is used in your source. This will ensure that the [[translators]] will be able to translate it properly avoiding ambiguity — ie: |
* As a developer, you may want to add some comments to denote the particular sense in which a specific term is used in your source. This will ensure that the [[translators]] will be able to translate it properly avoiding ambiguity — ie: |
||
<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#Use-comments"> |
<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#Use-comments"> |
||
<pre> |
<pre> |
||
/* This is the verb, meaning to describe, not the noun or side of a face. */ |
|||
g_printf (_("Profile")); |
g_printf (_("Profile")); |
||
This will automatically turn into |
This will automatically turn into the commented code below in the .pot file and .po files: |
||
#. This is the verb, not the noun |
#. This is the verb, meaning to describe, not the noun or side of a face. |
||
#: foo.c:42 |
#: foo.c:42 |
||
msgid "Profile" |
msgid "Profile" |
||
msgstr ""</pre></blockquote> |
msgstr ""</pre></blockquote> |
||
* What strings should be localized? |
* What strings should be localized? |
||
* |
** '''Everything''' written in a human language should be localized. This obviously includes anything that the end user will be able to read or lay eyes upon. The answer usually is a bit more fuzzy for 'internal' things. '''Debugging strings''' (intended for developers) are usually not localized—although the XO is intended for developer kids with access to the ''view code'' ;)— and may be avoided. '''Error strings''', which end up in logs and such, that will be read by local administrators and technical people should be localized—in order to avoid confusion when reporting bugs or problems, it is recommended you ID your log messages (ie: using a number). |
||
** Strings used by the operating system or internal to the computer should not be localized. Localizing these strings would cause the program to function incorrectly. |
|||
* What localization changes are coming in Python 3.0 and Python 2.6? |
|||
=== Resources for i18n & l10n === |
|||
** Python 2.6 is compatible with current version of Python. Python 3.0 has different semantics than current versions of Python and will take longer to adopt. Both will ship in late 2008. |
|||
** Python 2.6 and 3.0 support new formatting options, including a "%n" localized number format. |
|||
** Python 3.0 will make clear distinctions between byte strings and unicode strings.This will lead to fewer mistakes about confusing i18n strings that have been decoded for display with internal byte strings that have been encoded for file I/O. |
|||
* Translating format strings. |
|||
* [http://www.gnu.org/manual/gettext/html_chapter/gettext_toc.html GNU.org gettext manual] |
|||
** Often times text that is displayed to the user requires formatting, for example when embedding numbers and other strings. For example: |
|||
* [http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html GNOME-i18n] |
|||
* [http://www.gnome.org/~malcolm/i18n/ Internationalising GNOME applications] by Malcolm Tredinnick |
|||
* [http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html GNOME - L10N Guidelines for Developers] |
|||
<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#Use-comments"><pre> |
|||
== Case Study: i18n & l10n of [[Kuku]] == |
|||
print '%s has a score of %03d.' % (buddy.get_name(), 200) </pre></blockquote> |
|||
This can be frustrating to [[translators]] as they will not be able to reorder words in the sentence. Instead, use format strings with named arguments and interpolate with a map. |
|||
<blockquote cite="http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#Use-comments"><pre> |
|||
print '%(name)s has a score of %(score)03d.' % {'name': buddy.get_name(), 'score': 200} </pre></blockquote> |
|||
== Obsolete Case Study: i18n & l10n of [[Kuku]] == |
|||
:'''This section is out of date!''' [[User:MitchellNCharity|MitchellNCharity]] 17:14, 29 November 2007 (EST) |
|||
:* Sugar now handles gettext initialization (gettext.install, etc) for you. Don't add it to your activity. |
|||
:* The directory structure is now simpler. Just a po/ directory, with po/TheActivityName.pot, and po/es.po, po/pt.po, etc. |
|||
Following the [http://wiki.wxpython.org/Internationalization WxPython i18n tutorial], I added the following code at the top of [[Kuku|my application]]: |
Following the [http://wiki.wxpython.org/Internationalization WxPython i18n tutorial], I added the following code at the top of [[Kuku|my application]]: |
||
:'''This is no longer needed!''' |
|||
{{ Box File | kuku.py | 2= |
{{ Box File | kuku.py | 2= |
||
Line 74: | Line 162: | ||
# presLan_sw.install() |
# presLan_sw.install() |
||
</pre> |
</pre> |
||
{{Pending|1=# shouldn't it read '''unicode=True''' ? [[User:Xavi]]}}}} |
|||
Here my application is called <tt>kuku.py</tt>, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file <tt>kuku.py</tt> - these strings I surrounded with |
Here my application is called <tt>kuku.py</tt>, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file <tt>kuku.py</tt> - these strings I surrounded with |
||
Line 145: | Line 232: | ||
</pre>}} |
</pre>}} |
||
Now I moved <tt>kuku.pot</tt> to <tt>./locale</tt> . Then for each language I want to localize to, I create subdirectories within <tt>./locale</tt> according to their [http://www. |
Now I moved <tt>kuku.pot</tt> to <tt>./locale</tt> . Then for each language I want to localize to, I create subdirectories within <tt>./locale</tt> according to their [http://www.loc.gov/standards/iso639-2/php/code_list.php language codes]. Within each of these subdirectories, I create subdirectories called <tt>LC_MESSAGES</tt>. For know I am using english and swahili, so my directory structure looks like |
||
<pre> |
<pre> |
||
Line 245: | Line 332: | ||
To add new languages, we need to add a subdirectory for each language, perform the translations, create the <tt>.mo</tt> files, and add the relevant code in the application to select the language. |
To add new languages, we need to add a subdirectory for each language, perform the translations, create the <tt>.mo</tt> files, and add the relevant code in the application to select the language. |
||
== |
== Getting non-latin text from translation web sites == |
||
If you are running Gnome, you can do the following. |
If you are running Gnome, you can do the following. |
||
Line 262: | Line 349: | ||
These are the two docs that I used to learn about i18n (with no prior knowledge). Read the [http://wiki.wxpython.org/Internationalization WxPython reference] first, and instead of using the <tt>mki18n.py</tt> file mentioned on the WkPython page, use the tools in the [[Python]] standard distribution: <tt>pygettext.py</tt> and <tt>msgfmt.py</tt>. |
These are the two docs that I used to learn about i18n (with no prior knowledge). Read the [http://wiki.wxpython.org/Internationalization WxPython reference] first, and instead of using the <tt>mki18n.py</tt> file mentioned on the WkPython page, use the tools in the [[Python]] standard distribution: <tt>pygettext.py</tt> and <tt>msgfmt.py</tt>. |
||
[http://docs.python.org/lib/node738.html Python Reference] |
*[http://docs.python.org/lib/node738.html Python Reference] ''see also'' ([http://docs.python.org/lib/module-locale.html module-locale]) |
||
*[http://wiki.wxpython.org/Internationalization WxPython i18n] |
|||
Here is a well written example of translation with copious comments. |
|||
[http://wiki.wxpython.org/Internationalization WxPython i18n] |
|||
* [http://www.learningpython.com/2006/12/03/translating-your-pythonpygtk-application/ Translating Your Python PyGTK Application] |
|||
Here are some general resources |
|||
== See also == |
|||
* [http://www.gnu.org/software/gettext/manual/gettext.html GNU.org gettext manual] |
|||
* [http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html GNOME-i18n] |
|||
* [http://www.gnome.org/~malcolm/i18n/ Internationalising GNOME applications] by Malcolm Tredinnick |
|||
* [http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html GNOME - L10N Guidelines for Developers] |
|||
== See also == |
|||
* [[Activity bundles]] |
|||
*[http://babel.edgewall.org/ Babel] - A collection of tools for internationalizing Python applications. |
*[http://babel.edgewall.org/ Babel] - A collection of tools for internationalizing Python applications. |
||
Line 275: | Line 370: | ||
[[Category:HowTo]] |
[[Category:HowTo]] |
||
[[Category:Developers]] |
[[Category:Developers]] |
||
[[Category:Language_support]] |
Latest revision as of 14:58, 8 August 2012
Sugar framework |
Python framework |
Localizing an XO |
Keyboards |
Changing language |
Getting started |
Website translation |
modify |
Summary
Wikipedia: The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. The processes are complementary, and must be combined to lead to the objective of a system that works globally.
For those new to the topic, "i18n" is a short-hand for "Internationalization" referring to the 18 letters between the I and and N. Similarly, "l10n" is shorthand for "localization". Internationalization prepares an application for localization and must be done once per application. Localization provides local translations and other help and must be done once per locale, or language, into which the application will be deployed.
Overview of internationalization
There are two major tasks that require deep knowledge in creating a translation: knowing which strings to translate and knowing the correct translations. All the other internationalization are a mechanical cookbook of cut-and-paste code with some system administration tasks.
Knowing which strings to translate separates human language strings from internal computer strings. This 'instrumenting' requires the knowledge of the developer and every developer should do this during development. Fortunately, it's just a matter of wrapping strings in a function call. Here is an example:
from gettext import gettext as _ # Naming a function _ may seem odd, but is saves typing and # improves code readability. print _("Greetings") # prints the word "Greetings" in the language being used. # This is really a function call to the function gettext in module gettext. # If no other internationalization is done, this will just print "Greetings" greetings_file = file("greetings", "r") # Open the file named "greetings" for read access. This "greetings" and "r" # should not be translated because they are internal to the computer, the # file system, or the operating system. These strings are not wrapped with # function calls. ...
The second major activity is translating. Translating is a human activity requiring creativity with limited context. Each human language and perhaps for each language for each region can have its own translation. For example, the word "greetings" might be entered as follows:
Language and region | Locale identifier | Translation of "Greetings" |
---|---|---|
English in the United States | en-US | Greetings |
English in Australia | en-AU | G'day |
French in France | fr-FR | Bonjour |
French in Canada | fr-CA | Bonjour, eh? |
Spanish, where no regional translation available | es | Hola |
The rest of this article describes the process and mechanisms for getting the program to print out the correct translation.
Process overview
This process is fairly mechanical. The details may change with future Sugar releases as they have with previous releases. As a consequence, different existing projects show some variations with the same goal. The goal is to get translated strings into .po files, which are then translated into binary .mo files in a directory named something like ./locale/xx-yy/LC_MESSAGES/my_application.mo where xx-yy is the country and region (locale code) of the translation.
- If you have not done so, instrument your source code, as described above. This basically means you have to:
- import gettext (usually imported as from gettext import gettext as _ in order to avoid clutter and typing)
- Instrument the handling of strings to use gettext() (ie: message = 'Begin!' becomes message = _('Begin!'))
- Set up a subdirectory named ./po from the head of your project.
- Create a ./po/POTFILES.in file that contains an encoding line and the name of each file in your source tree.[1]
- Use UTF-8 as your encoding string. Yes, UTF-8. That's UTF-8. UTF-16 is right out. Odd crashes will occur.
- Failure to include an encoding string will cause xgettext to fail later or just cause random crashes in your code.
- Here is a sample POTFILES.in file:
encoding: UTF-8 calculate.py eqnparser.py
- Create a .POT (gettext Portable Object Template) which contains all the instrumented strings you wish to translate. That is, all strings for which you called gettext.
- If using the Sugar classes correctly, they will take care of proper initialization of gettext and the creation of the .POT in the ./po directory. This occurs when you execute the setup.py genpot script. Manually, you can:
- Use xgettext to create a .POT file. That is, cd po; xgettext -f POTFILES.in ../mysource.py --output=mysource.pot).
- Set up & translate to a particular language:
- Use msginit (from console) to create a '.po' (gettext Portable Object) for a specific language code in the ./po directory (ie: ar, es, pt_BR, rw, etc.). For example, cd po; msginit -l es generates the 'es.po' spanish .po file for your activity. This contains a filled-in 'msgid' for each instrumented string and an empty 'msgstr' to hold each translation.
- Edit the .po file to provide the actual translation of each msgstr into the target language. Repeat for each file (ie: mysource.es.po, mysource.pt.po, mysource.rw.po, etc.)
- Reintegrate the translated .po files into the development
- The translated .po should go in the ./po directory where Sugar will compile it.
- Use of msgfmt to compile the .po into their corresponding .mo (gettext Machine Object). For example, mysource.es.po compiles into mysource.es.mo.
- The translated .po should go in the ./po directory where Sugar will compile it.
In the end you should have in the ./po directory, the following:
- One .POT file (ie: myactivity.pot)
- One .PO file per localized language (ie: myactivity.es.po)
- One .MO per .PO file (ie: myactivity.es.mo)
- A set of installed .MO files somewhere (ie: /usr/share/locale/es/LC_MESSAGES/myactivity.mo) per localized language
Some tips
- This process barely works because the process does not cope with code maintenance. For example, it can be difficult to update a .PO file without losing the previously entered translations. Various options on the tools, and other tools, must exist, but no clear consensus has arisen.
- Pootle is an online tool used in much of the translation work for XO applications. It tracks progress on translations and automates much of the mechanical management. You can check out a running Pootle server at http://translate.sugarlabs.org.
- More on using UTF-8:
If you use UTF-8 in the translateable strings of your application, you need to add the special [encoding: UTF-8] keyword before the list of source files in the POTFILES.in file of your application. If this isn't done and there are non-ASCII characters present in the translateable strings, xgettext will exit with a fatal error, and so will the build of your application.
- Localization, l10n, is not just translating strings!
- Translation of strings is an important part of the localization effort. It makes an application useful, but not elegant, in a foreign language.
- Other localizations the developer will need to insert include decimal numbers (e.g., "4,242.10" in English versus "4.242,10" in German), currency symbols and layout, date and time formats, and choices of units.
- Some localization help can be found using the Python locale module.. This provides a method for automatically converting decimal numbers, date and time formats, and learning the appropriate currency and unit formats.
- Localization, or L10N, can encompass many "soft" optimizations relating to culturally specific information or layout. This is usually classified as either "extra credit" or "trying too hard".
- Two other tools used in L10N are Cairo and Pango. These are both used in creating high quality fonts with correct layouts.
- Some applications may want to ensure also that two XOs localized differently can actually collaborate. For example, a Brazilian kid meets an Uruguayan and decides to collaborate. This can be an added level of complexity if translated strings are passed around.
- As a developer, you may want to add some comments to denote the particular sense in which a specific term is used in your source. This will ensure that the translators will be able to translate it properly avoiding ambiguity — ie:
/* This is the verb, meaning to describe, not the noun or side of a face. */ g_printf (_("Profile")); This will automatically turn into the commented code below in the .pot file and .po files: #. This is the verb, meaning to describe, not the noun or side of a face. #: foo.c:42 msgid "Profile" msgstr ""
- What strings should be localized?
- Everything written in a human language should be localized. This obviously includes anything that the end user will be able to read or lay eyes upon. The answer usually is a bit more fuzzy for 'internal' things. Debugging strings (intended for developers) are usually not localized—although the XO is intended for developer kids with access to the view code ;)— and may be avoided. Error strings, which end up in logs and such, that will be read by local administrators and technical people should be localized—in order to avoid confusion when reporting bugs or problems, it is recommended you ID your log messages (ie: using a number).
- Strings used by the operating system or internal to the computer should not be localized. Localizing these strings would cause the program to function incorrectly.
- What localization changes are coming in Python 3.0 and Python 2.6?
- Python 2.6 is compatible with current version of Python. Python 3.0 has different semantics than current versions of Python and will take longer to adopt. Both will ship in late 2008.
- Python 2.6 and 3.0 support new formatting options, including a "%n" localized number format.
- Python 3.0 will make clear distinctions between byte strings and unicode strings.This will lead to fewer mistakes about confusing i18n strings that have been decoded for display with internal byte strings that have been encoded for file I/O.
- Translating format strings.
- Often times text that is displayed to the user requires formatting, for example when embedding numbers and other strings. For example:
print '%s has a score of %03d.' % (buddy.get_name(), 200)
This can be frustrating to translators as they will not be able to reorder words in the sentence. Instead, use format strings with named arguments and interpolate with a map.
print '%(name)s has a score of %(score)03d.' % {'name': buddy.get_name(), 'score': 200}
Obsolete Case Study: i18n & l10n of Kuku
- This section is out of date! MitchellNCharity 17:14, 29 November 2007 (EST)
- Sugar now handles gettext initialization (gettext.install, etc) for you. Don't add it to your activity.
- The directory structure is now simpler. Just a po/ directory, with po/TheActivityName.pot, and po/es.po, po/pt.po, etc.
Following the WxPython i18n tutorial, I added the following code at the top of my application:
- This is no longer needed!
{{ Box File | kuku.py | 2=
import gettext gettext.install('kuku', './locale', unicode=False) #one line for each language presLan_en = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['en']) presLan_sw = gettext.translation("kuku", os.path.join(get_bundle_path(),'locale'), languages=['sw']) #only install one language - add program logic later presLan_en.install() # presLan_sw.install()
Here my application is called kuku.py, and I am using 'kuku' to be the domain of my i18n. Now I choose which strings I needed to localize within my application file kuku.py - these strings I surrounded with _(). For example>
Before i18n | After i18n |
---|---|
message = 'Begin!' | message = _('Begin!') |
Next I need to create the i18n files. First I create a directory called 'locale' within my activity directory (this is referred to in the above lines (presLan_en ...). The first step is to make a POT file, which I use pygettext.py to process kuku.py
python <path to your python distribution>/Tools/i18n/pygettext.py -o kuku.pot kuku.py
which creates kuku.pot. When first created it looks like
File: kuku.pot |
# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:45+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: ENCODING\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:501 msgid "Begin!" msgstr "" |
The last little bit is the stuff we have to translate. I had to modify the stuff at the top to change the ENCODING and CHARSET. I changed both of these to utf-8, so my file now reads:
File: kuku.pot |
# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:15+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: utf-8\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:500 msgid "Begin!" msgstr "" |
Now I moved kuku.pot to ./locale . Then for each language I want to localize to, I create subdirectories within ./locale according to their language codes. Within each of these subdirectories, I create subdirectories called LC_MESSAGES. For know I am using english and swahili, so my directory structure looks like
locale/ kuku.pot en/ LC_MESSAGES/ sw/ LC_MESSAGES/
Now we do translations. I copied kuku.po into ./locale/en/LC_MESSAGES/kuku.po and ./locale/sw/LC_MESSAGES/kuku.po, and performed the translations:
File: ./locale/en/LC_MESSAGES/kuku.po |
#./locale/en/LC_MESSAGES/kuku.po # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:15+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: utf-8\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:500 msgid "Begin!" msgstr "Begin!" |
File: ./locale/sw/LC_MESSAGES/kuku.po |
#./locale/sw/LC_MESSAGES/kuku.po # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR ORGANIZATION # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2007-06-19 17:15+EDT\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: utf-8\n" "Generated-By: pygettext.py 1.5\n" #: kuku.py:500 msgid "Begin!" msgstr "Kuanza!" |
Now my directory structure looks like
locale/ kuku.pot en/ LC_MESSAGES/ kuku.po sw/ LC_MESSAGES/ kuku.po
One last step before we are ready to go. We need to make the binary files used by gettext. We do that with msgfmt.py:
cd <project path>/locale/en/LC_MESSAGES/ python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po cd <project path>/locale/en/LC_MESSAGES/ python <path to your python distribution>/Tools/i18n/msgfmt.py kuku.po
This creates binary .mo files, and now my directory structure looks like:
locale/ kuku.pot en/ LC_MESSAGES/ kuku.po kuku.mo sw/ LC_MESSAGES/ kuku.po kuku.mo
To add new languages, we need to add a subdirectory for each language, perform the translations, create the .mo files, and add the relevant code in the application to select the language.
Getting non-latin text from translation web sites
If you are running Gnome, you can do the following.
Here is an arabic google translation of "hi". Open a gnome-terminal, and run "cat > tmpfile". Cut-and-paste the arabic into the terminal, and thus the tmpfile. This avoids mangling the text as encoding information is lost.
In emacs (and perhaps other editors as well?), insert the tmpfile. And that's it. You can test this all by creating a two line python file,
# -*- coding: utf-8 -*- print u'the string goes here'
And running it in the terminal.
I haven't tried this with po files yet.
Resources
These are the two docs that I used to learn about i18n (with no prior knowledge). Read the WxPython reference first, and instead of using the mki18n.py file mentioned on the WkPython page, use the tools in the Python standard distribution: pygettext.py and msgfmt.py.
- Python Reference see also (module-locale)
- WxPython i18n
Here is a well written example of translation with copious comments.
Here are some general resources
- GNU.org gettext manual
- GNOME-i18n
- Internationalising GNOME applications by Malcolm Tredinnick
- GNOME - L10N Guidelines for Developers
See also
- Activity bundles
- Babel - A collection of tools for internationalizing Python applications.