Pootle

From OLPC
Revision as of 12:43, 11 October 2007 by Xavi (talk | contribs) (moving out from IRC channel #olpc-l10n into #olpc-content)
Jump to: navigation, search
IRC #olpc-content

NOTICE 
The use of Pootle is currently under test and should by no means be taken or considered to be part of the Localization process for the XO.

These are the notes taken and rough sketches for the processes involved in the localization effort in order to use Pootle as a more liberal l10n platform.

There are several scenarios that depend on the roles and their associated responsibilities (ie: translators, coders, administrators, etc). Below we outline the two most important ones from the POV of the translators that we could classify as either the opportunistic translator (fixing a typo or translating a few missing strings) and a registered translator (that is somehow more committed with the whole l10n reality).

If we consider that in the ultimate case children will be dealing with code (thus with gettext), we want things and processes to be as simple and straight forward as possible, so that in the end anybody will be able to translate. With this in mind, we aim for an environment that will:

  • allow anybody to make suggestions,
  • allow registered users to make translations, and
  • administrators to commit.

If the quality is not satisfactory, we can probably revert to more 'traditional' and bureaucratic structures; or try to develop tools that guarantee or hint appropriately without sacrificing the liberty and agility that will be required by children.

Basic Scenarios

Opportunistic translator

This user just wants to help. She/he doesn't want to get tangled in the administrative tasks. The only possible collaboration available is to suggest translations (which will be reviewed by users who have been granted the Review permission in a particular language).

After navigating to the olpc pootle server project: olpc language: spanish to finally reach a file (ie: TamTamSynthLab)

The interface will display a series of PO entries (one will have the focus—if it doesn't, hovering over one will make an 'Edit' link appear that enables it—showing an entry field, and the following controls:

a picture is worth 1000 words

  • Back & Skip buttons — jump to the previous or next entry
  • Copy button — copies the original (msgid) value and continues in edit mode
  • Suggest button — the actual collaboration of suggesting a translation
  • Fuzzy checkbox — denotes that the suggested translation is/is not 100% trustworthy
  • Special characters specific to the language (see #Languages)
  • grow / shrink — allows growing and shrinking of the entry field (see #User options
  • Translator comments field — comments either extracted from the source code, or added by other translators

The opportunistic translator then proceeds to navigate the file/s entering suggestions (to be processed later by the reviewers).

Admin notes 
The Suggest permission must be granted on a per-language-project basis.
Each language may have specific or special characters that are may not be available in the user's keyboard, but can be provided for in the #Language specification.

Registered translator

Except for the mandatory pre-condition to register, which enables the extra [Submit] button when translating, the overall process is quite similar to that of the #Opportunistic translator. On the other hand, several user-specific permissions may apply (ie: off-line translation) and the GUI will adapt and offer them. Also, as a registered translator, you may be assigned specific files or strings to translate and/or review, allowing to better coordinate the overall effort.

Admin notes 
In a collaborative and low-entry-barrier process, the administrator/s should enable the default user to actually translate — Navigation: admin | projects | project_name | language | permissions +Translate.
The default #user permissions apply to any registered user (unless they are overridden in a case-by-case approach). In the out-of-the-box install they include: View, Suggest, Archive & Compile PO files

User Scenarios

Register as a translator

  1. Head towards the pootle site
  2. Follows the register link where you fill the following fields: Username, Password, Confirm password, Full Name & Email Address, clicking on the [Register Account] button that will send a confirmation message to the email address with the activation code.
  3. Following the link in the email will activate the newly created account. After which it will ask you to login and take you to your user page.

User options

In your user page there's a link to Change options that will allow you define some things:

  • Projects you wish to participate (for the moment only olpc should be considered a valid option)
  • Languages you wish to collaborate in. If you don't find your language, as an administrator you can associate it to the project (see #Languages) or as a mere mortal, you should contact an administrator.
  • Other options like personal data & translation UI are present.

After configuring, don't forget to hit the [Save changes] button to make them effective, after which you can return to your #user page by means of the Home Page link.

User page

aka: 'Home', or 'My account' page.

You can reach your user page following the Home or My account link (depending on where you are) and it will show the appropriate links to the selected projects grouped by languages:

  • The language link takes you to the statistics page per project
  • the project-in-a-language link takes you the the statistics page displaying all its files

Advanced User Scenarios

Reviewer

A user who has been granted the Review permission, may accept or reject the suggestions made by users (who must have the Suggest permission). The way to do this is there another? is to go to a language/project combination (ie: spanish-olpc) and follow the Show Editing Functions. Here you have two alternatives: review the whole set of suggestions in the language-project set, or work on the suggestions present in a specific file (ie: all suggestions or TamTamSynthLab suggestions).

Once in the review UI, the reviewer has four options: Accept, Reject, Back or Skip which are self evident, but nevertheless will mention:

Accept the reviewer translator accepts the suggestion and is registered
Reject the suggested translation is rejected does that mean erased for the user or the file?
Back goes to the previous suggestion assume a previous non-accepted/rejected suggestion
Skip goes to the next available suggestion

In the case where multiple suggestions have been made, each one has its Accept & Reject buttons, while only the last will have the Back & Skip buttons.

Regardless of the multiplicity of suggestions, each suggested string will be diffed with the current string highlighting the changes. It also displays (if available) the name of the user that made the suggestion.

Editing functions

This section of the UI displays on a per-file basis several functionalities that depend on the permissions of the user:

Translate My Strings enter the translate UI limiting the visibility to those strings assigned to the user. explore
Quick Translate My Strings same as above, unknown difference. explore
Quick Translate enter the translate UI over the whole set? explore
Translate All explore
PO file used for off-line translators in order to retrieve the whole .PO file.
XLIFF file used for off-line translators and interfaces to retrieve the whole file in said format. explore
Qt .ts file explore
CSV file explore

Show checks

This functionality is a real helper (at least for latin scripts) as it runs a series of checks on the translations. So Pootle besides the obvious verification of translated+fuzzy+untranslated check, it verifies:

  • simplecaps — for extra capital letters
  • startcaps — initial capital letter matches between source & translation
  • startpunc — if the original doesn't start with a letter, the translation probably shouldn't either
  • unchanged — the source & translation are the same
  • others!! — there are several others! Apparently, from traces in a somefile.po.stats the checks performed are:
    check-validchars, check-numbers, check-unchanged, check-doublespacing, check-purepunc, check-isreview, check-nplurals, check-brackets, blank, check-endpunc, check-xmltags, check-escapes, check-spellcheck, check-endwhitespace, check-functions, check-doublewords, check-singlequoting, check-simplecaps, check-blank, check-emails, check-startwhitespace, check-accelerators, check-long, check-musttranslatewords, has-suggestion, check-puncspacing, check-notranslatewords, check-variables, check-doublequoting, check-kdecomments, check-short, fuzzy, check-untranslated, check-simpleplurals, check-sentencecount, check-isfuzzy, check-startpunc, check-compendiumconflicts, check-tabs, check-newlines, check-urls, check-filepaths, check-startcaps, translated, check-printf, check-acronyms, and sourcewordcounts, targetwordcounts.

Obviously, these checks, as any automated language process, are to be taken as a guidance and not as a rule.

Zip of folder

Actually, any 'grouping' of PO files may be downloaded as a ZIP file if the user has the archive right. In other words, you can download the files in a language, goal, file, etc. It may be possible that Pootle extracts and recombines from several files in order to provide the zip with say 'My Strings' for offline work.

Administrator Scenarios

Administration

As an administrator, in your home page you have access to the Admin page which offers: Users, Languages & Projects.

Users 
is a simple interface allowing the manual addition of users, edition of their names & (invisible) passwords (ie: resetting them) and where you can activate, de-activate and remove an user.
Languages 
allows the maintenance of the list of available languages (based on the ISO 639 codes, a descriptive text, special characters (used in the translating UI), defining the number of plurals and its equation. Note: removing a language here would seem to affect only the ability to associate them to projects, with no apparent impact on the previously defined associations.
Projects 
this is the initial page where things start to come together. Besides being able to add a project by defining the values for the Project Code, Full Name, Project Description, Checker Style, File Type and Create MO Files fields, you can also Remove Project. The most mysterious parameter is Checker Style which offers Standard, creativecommons, kde, openoffice, mozilla and gnome as options.

The above covers the broad, high-level configuration, which must be followed by the project and language configuration:

Project languages 
A project needs to be informed of which languages it will have. This is accomplished by following the link of a specific project resulting in a page where you can add them. The only apparent way to remove a language is to delete its directory from outside of Pootle.
Project language permissions 
Each language mentioned above is a link that allows you to configure the #user permissions in said project-language combination. You can grant/remove specific rights to a particular user, that will only apply in said project+language context.
NOTES 
Several languages were removed from the initial (default) list with the intention to limit or focus on the core green languages: am, ar, en, es, fr, ha, hi, ig, ne, pt, ro, ru, rw, th, ur, & yo.
Still pending: Obtain and verify the plural formulas for all.

Defining goals

see Pootle::Goals

Goals are required in order to be able to assign translations, thus enabling the organizing and prioritizing of work within a given language-project (iow, no global goals). Also worth noting, goals seem to be nothing more than a 'bundle name', with no other data like deadlines, comments or anything else.

In order to define a goal you must enable the Show Goals in the project page, and on the right, there's an entry field to add new.

  1. Define a goal in the project page (ie: spanish-olpc) toggle the Show Goals
    • by default a null-goal (ie: Not in a goal) exists as a catch all.
    • on the right, a box with an entry field allows you to give a name and Add Goal.
  2. Assign files to a goal
    • as we are trying to include files into a goal, click on the Not in a goal
    • make sure that the Show Editing Funcions is enabled by clicking on it if necessary
    • once in the language-project-goals view (showing the 'Not in a goal' files), each file has a pull-down list to pick and set the goal.

After setting the goals you can view the files bundled in a goal, and later assign work.

Assigning translations

see Pootle::Assigning

This is a nice administrative functionality that will actually (or probably) make the translator's life simpler. As an administrator you can assign certain users to goals and also to specific files (either to translate everything, unassigned, and a couple other variants).

By assigning users and goals/files, those user will then be able to use special links (ie: Translate My Strings or Quick Translate My Strings) that will feed them with their workload, and thus allow focus where is needed for the project. Still, the interface and handling is not quite describable at this moment as many aspects come together and some things need to be better understood in order to get the most out of it (while trying not to complicate things too much).

how do you remove an user from a given goal?

Setting up version control systems

see Pootle::Version control

These are some notes of the trials, not actual documentation.

Things have been messier than expected. First of all, GIT is not supported in the stable version, and the newer (unstable) version is reorganizing several aspects of Pootle making it too risky to jump onto the bleeding edge... Thus the decision to backport the git support.

On top of this, although volunteer gnrfan joined the fray (doing the backport), our experience with git is way below average and we are totally in the dark about how does Pootle actually obtain and store the POT files. We seem to have found a way (using symlinks in the Pootle subdirectories pointing to the repository files) but this solution although it may work, is a bit obscure and not even hinted by the documentation of Pootle:

To have any sort of integration with version control from within Pootle, it is necessary to check out the translation files into their correct places in their Pootle projects. The CVS or SVN meta files (CVS/ or .svn/) need to be there. This has to be done outside of Pootle.

Our efforts to contact or elucidate some sort of answers from the Pootle community have not been very effective for the moment... but hope is not lost! :)

Fantasyland

The basic idea we are aiming for is the following:

We are assuming that the Pootle server has the ability to read & write throughout the xyzzy_project/po in dev.laptop.org through the git.

The overall process would be:

Developers do their i18n part resulting in one or more .POT files generated in their respective /po directories (and in theory, forget all about l10n).

  1. First time-off (iow, a new project with a .POT)
    • Pootle would sync downloading the .POT and make it available for all languages to translate
    • Pootle makes the POT (now POs) available for each language and are translated
    • Pootle commits the (new) POs to the git server in d.l.o
  2. Updates in the i18n part — new versions of the POT
    • Pootle in its sync process should note the update, download the new version and modify accordingly all the POs and replace the local POT with the 'real' POT
    • Pootle would commit as usual
  3. Updates in the l10n part — d.l.o has a new version of some PO
    • Pootle syncs modifying the local PO using the d.l.o version as the latest valid version and reference, meaning that the local versions need to be corrected against it
      The documentation mentions this is so, passing the (local) differences as 'suggestions')

The current problem we are facing is getting Pootle to read from git, iow, procuring the POTs! We haven't found a way (be it interface or code) to get the POTs. The update and commit are visible in the web GUI...

Pootle workflow as per #pootle channel

Pootle has some very nice features, but integration to the repositories is not really one of them. Particularly the initial bootstrapping of a project. As extracted from a chat with friedel in IRC#poole, the usual / standard workflow with no repositories is as follows:

  1. The POT file is manually injected into /po/project_name/templates
  2. Doing the Updtate from templates for a specific language in a project basically re-syncs the existing PO (doing a merge, and in the case of conflicts demotes them as suggestionsneed to test) or makes available the particular PO for said language (in the /po/project_name/lang_code).
  3. In its origins, Pootle was finished, and the re-injection of the PO & MO files into the original project was left for the language or project coordinator to do.

With the inclusion of repositories, apparently the only part really integrated is the commit phase which would somehow trigger a push of the PO files. Unfortunately the 'pulling' from the repositories is not so automatic. Therefore, the probable (and suggested in IRC) workflow we'll implement requires manual intervention and/or the development of some scripts, and would probably look like this:

  1. Have a local git repository that syncs with d.l.o (as any other repository)
  2. Manually create the project
  3. Manually inject the POT file (most likely a symlink to the POT in the local repository)
  4. Update from templates would be performed for all languages
    This step would actually create (or sync) the PO files for all languages in the project together with some associated internal files to Pootle. The actual PO file should then be symlinked and added to the appropriate place in the (local) repository.
    Another possibility would be to create the project directory as a symlink into the repository (and probably use the .gitignore to filter out the Pootle files). This alternative (which could be simpler) seems to clash with the idea that all POT and PO files are stored in the /po directory in d.l.o because Pootle handles each language as a subdirectory (under the assumption that each project has several POT files, something that even Etoys has reverted and now has just one big POT).
  5. Translate at will & perform commits at will / as necessary, staying alert for:
    • If the POT changes along the way, this should be somehow noted and an Update from templates should be carried out.
    • If a new POT file (not an update) is generated, it must be manually injected together with the corresponding PO files per language.
    • If a POT is deprecated (eliminated) the appropriate local removal should ensue.

The manual nature of Pootle's repository integration is currently the weakest point as it adds administrative overhead and coordination issues. Some of it could be eased by either developing some scripts or modifying Pootle itself (ie: the creation of a new PO file could be tweaked to automatically generate the symlink instead of the local file and the issuing of the appropriate command to add it to the repository; or, just settle for a simple symlink of the project's directory and all files would reside in the local repository). verify any possible conflict in naming conventions of the target PO—particularly with t.fp.o

Twilight Zone

music queues in... tiritiri...

The current status is that we need a repository for testing. Now, testing on live things is usually not very polite, and definitely hazardous if anything goes wrong. But we need to test against something... the initial thoughts were let's get a limited-account in dev.laptop.org to test and being Alfronso the owner of EduKT it was our first choice. Problem was the handling of the keys (delicate issue) and the fact that he had other obligations that consumed his time and leaving us out of the loop. And lets face it, we were stubborn and quite non-inventive leading us to 'not be able to test'. All this is to say and propose the following test:

  1. Make a TrueClone of dev.laptop.org
    Not just the git repository, but also the server part (iow, make a copy of dev.laptop.org, not just the data within) in order to avoid further tests or any possibility to confuse 'local' parameters with the real thing
  2. Then use the Pootle user to create it's git clone as its working copy of the POTs & POs.
  3. When Pootle 'commits' it will be committing to the TrueClone... voila!

This will avoid all the hassle of coordinating with dev.laptop.org admins (iow, avoid consuming their time and resources for testing) and gives us full independence to screw up! :)

Man why didn't we think of this before... for the moment, let's see how hard it may be to make a TrueClone... that would be git+ssh... Google, here we go...

Defining & using terminology

see Pootle::Terminology
see Pootle::Matching

This is an extremely helpful functionality, particularly if we want to open the translation process to non-professional and #Opportunistic translators because it aids them in preserving the terminology that has been decided upon by the more involved segments of the community (ie: will the olpc.es translate "cat" as "gato" or "mish"?)

The structure of the terminology file is a standard .PO file (ie: msgid / msgstr). And (apparently) many could co-exist (ie: color.po, hig.po, etc. There are ways need to explore on how to override the default terminology searches per project.

Although the documentation is not crystal clear on where and how these terminology files would go, a simple test has been made and works.

The steps, as an administrator, were:

  1. Enable the languages for which you want to have terminology:
  2. Join the project

You are 'done'. When later translating anything in the 'spanish' branch, the xavitest.po will be used to propose translations in a dynamic way.

really done? Need to verify process

Taking advantage of translation memory

see Pootle::TM

Very much like terminology but instead of working at a word level, the matching is performed at the whole string proposing full already used translations, saving some time but more importantly providing some level of consistency between translations. On the other hand, instead of being a dynamic feature, this is more of a batch or static process. Starting from a base translation memory file, one may generate suggestions for a specific file or a whole set of them.

pre-test notes 
This are some ideas, doubts and why-not stuff related to the reading of the documentation.
File per file is probably not desirable (ie: use one PO to generate a memory for another PO), so some sort of 'olpc' translation memory file (per language) should be made.
How do you generate the initial all-encompassing tm file?
Every time a new language or PO is added, the associated memories for them (as targets) should be (automatically?) created...
Updating a POT (iow, a new version) should trigger the update of all languages.
The updating (which is not done in real-time) would still need to be done in a reasonable time period.

User permissions

see Pootle::Permissions

Permissions are granted or revoked within a specific language-project pair, and optionally, within that scope, to specific users. This means that the nobody & default users have local, instead of global, behaviors.

The following is the list of the available permissions handled by Pootle. The actual description is a result of observation and deduction (haven't found a specific documentation on them), so they are to be considered with #, fuzzy tag...

NOTE: The system administrator flag is not reachable through the GUI, as it resides as a flag (rights.siteadmin) in the user.prefs file. This permission allows the users possesing it to administrate all the projects, languages and functionality (you should always have one user with it).

Permission Description Default users Comments
View Allows the browsing of the PO files and their translation nobody, default
Suggest Allows to suggest a translation default this should be enabled for nobody if we want to make things simpler for the #Opportunistic translator.
Translate Allows to submit a translation. none By default there are no users allowed to translate forcing the administrator to grant this right—bureaucratic and restrictive.
Overwrite When uploading a PO file, it allows the user to overwrite any existing file (iow, no merge of changes) none Handle with care.
Review Allows to approve/reject suggestions made by users capable of suggesting translations. none see #Reviewer
Archive Would allow the user to download the a set of PO files in zip format. Probably those of a language/project. none
Compile PO files Allows an user to generate / compile the PO into MO files. none It is unknown where the MO files will reside, or how they will be transfered or made available. need to explore
Assign Allows an user to assign files (or chunks of files) amongst users that have been granted the translate permission. none need to explore
Administrate It's granularity is not well defined (or understood): administrate everything (seems like it), or just a project? or just a language? or just a language-in-a-project? none need to explore
Commit The holy grail or point of all this: make a translation available. none Again, the granularity is not well defined or understood. need to explore

Advanced site administration

Lowering latency through web servers 
see Pootle::Apache
see Pootle::NginX
Translator statistics 
see Pootle::LogStats
Configurable logos 
see Pootle::Changes
Removing / Renaming files 
According to user friedel in IRC#pootle, you can 'safely' delete a file (and associates—ie: xyzzy.es.po.pending) without much to worry about. This has to be done outside of Pootle.

Proving Grounds

We are currently testing in an ad-hoc trial site

Language Translated Fuzzy Untranslated Total Diff.Spanish
arabic 198 78% 7 2% 48 18% 253 1425
portuguese BR 174 84% 12 5% 19 9% 205 1473
french 144 67% 15 7% 53 25% 212 1466
portuguese 12 42% 0 0% 16 57% 28 1650
spanish 1336 79% 342 20% 0 0% 1678

Observations

  • You can download the PO, so there's no forcing of the on-line UI.
  • In the translation interface, fuzzy entries are grayed out and there's a gray vertical line separating the terms

File structure

F/D+Type File/dir Purpose Notes
D-auto .../po root of the Pootle files
D-auto .../po/terminology/xx the directory where terminology files are stored on a per language basis (normal, but minimalistic, PO files).
Given that POTs are manually injected and updated, maybe the terminology directory is polymorphic so a templates language directory could be used — enabling to have system-wide terminology
D-gui .../po/project each 'project' has its own root
D-manual .../po/project/templates the directory where the POTs associated with the project must go (not accessible through the web-GUI)
D-gui .../po/project/xx within each 'project' a directory is set up for each language (using ISO 639 coding). Created when a language is added to the project. unknown how to remove it—probably just deletion)
F-auto .../po/project/xx/pootle-project-xx.prefs for each [language+project] holds the rights (for nobody, default & specific users) and the goals (name and list of files),
F-auto .../po/project/xx/pootle-project-xx.stats holds for each PO file six numbers (pressumably translated words, translated strings, non-translated words, non-translated string, total words, and total strings)
F-gui/rcs somefile.po the actual PO file.
F-auto somefile.po.pending if suggestions have been made, they are stored here in a pseudo-PO format that modifies the msgid (appends a "_: suggested by username\n" line). See #Bug dealing with Suggestions.
F-auto somefile.po.stats statistics about the internals of the file, apparently only about the checks and their results.
F-manual somefile.po.tm the results of applying a translation memory to that file, pressumably the suggested string translations.

To Do

Usage

  • Define goals
  • Assign translations
  • Merge of uploaded PO
  • Check if there any kind of verification when a PO file is uploaded to ensure the up-to-dateness of the original POT on which it is based? (ie: a PO is loaded but the POT on which it is based is a an older version)
  • Delete files (this is done outside of Pootle directly on them... some things will protest, but is the 'accepted procedure')
  • Update from templates (this needs the git interface on one side, but also if a POT is updated by whichever means, the PO should be merged/solved)
  • Define a workflow! — the #Basic Scenarios above just cover the translator part. After the git interface is tested, the handling of the developer input, and the Pootle output must be defined.

Config

  • Interface with GIT — both on importing POTs & POs from it & exporting POs back to it.
  • For completion's sake, language plurals & associated formulas must be defined (a bit pointless if developers don't actually do i18n right though, but...)

Done

  • user creation (both through the admin & registration interface that uses confirmation codes via mail)
  • add / remove languages (basically reduced the set to green countries languages in order to focus attention)
  • add project (only one for 'olpc' — deleting is performed through the GUI.
  • associate languages & projects
  • define permissions for nobody, default and specific user
  • upload PO file (note: it doesn't verify against the templates, so you may end up uploading anything anywhere... handle with care)
  • translate on line — may be a bit slow although it could be the browser, connection or server (see #Advanced site administration)
  • used the suggest funcionality, hit bug with review (see #Bug dealing with Suggestions)

Glitches

  • Something related to encodings was wrong during the setup
    Alfonso commented several language specs that apparently were iso8859 instead of UTF ??

Bug dealing with Suggestions

There seems to be something awry with the suggestions... somehow there seems to be a mismatch between the msgid for which the suggestion was made, and the msgid displayed in the review process. IOW, you suggest foo as a translation of bar, but in the review process it will show it as a suggestion for xyzzy!! review problem!

Talking in IRC#pootle, the following emerged: the mechanism used to associate a given suggestion is based on the #: comments (usually documenting the source code line where the string is located). This mechanism fails when there are more than one strings extracted from the same source line.

Possible solutions & workarounds 
the problem is relatively serious as it inhibits the #Opportunistic translator (or any other translator) from making suggestions (as they may get lost).
  • modify the source code to avoid having more than one gettext string per line.
  • modify the POT to avoid duplicate comments (iow, a manual process)
  • disable the suggestions
    • explore the possibility of having the offending POT in another project where suggestions are not allowed (thus avoid reviewers get bitten by it).
  • patch Pootle (or file a bug)

See also