Bityi (translating code editor)/design

From OLPC
< Bityi (translating code editor)
Revision as of 23:36, 21 August 2007 by Homunq (talk | contribs) (Current status:)
Jump to: navigation, search

I've been thinking about the design issues for coding i18n in develop. The problem is that if you want to translate identifiers at all, you immediately have to work with multiple dictionaries for all the modules you're importing. Initially I just slogged in and tried to start coding something that would keep a whole import tree in its head, and magically decide for you where any changes in the dictionary should end up. I took about a day to realize that thence lay only nasty pointed teeth (I should have realized sooner).

So after thinking about what simplifying assumptions I can make, I have a basic design that I'm kinda proud of. I'll try to explain it below, and to keep the "just LOOK at how hairy the problems are if you don't do it my way" details to a minimum. Trust me, they're hairy. Nonetheless, I know y'all need no map from me, if you see another path, or a minor modification, which still avoids the hair and teeth, by all means, suggest it. Also speak up if you see some hair on my path I've missed, of course.

Summary:

1-As discussed already, any identifier or keyword with a translation is presented in user's preferred language on screen, but in English on disk.

1a-Translation is based on the concept of alphanumeric words (start with an alpha, including _, and continue with alphanum). The only parsing prior to translation is to separate code, strings, and comments (the latter two generally untranslated, with certain exceptions based on simple ascii markup). This makes the solution relatively easy to generalize to other computer languages. (It also means that the prefixes "_" and "__" are not necessarily preserved).

2-The identifier-translating dictionary for any given file - say, "somemodule.py" - stays in a parallel file in the same directory, say ".someModule.p4n".

2a- The editor would have to understand import statements and have the ability to fine the relevant .p4n's. "from" and "as" modifiers would be ignored, except when combined, because even a single imported item could carry with it all the identifiers.

3-This dictionary ONLY contains translations for the "public interface" of somemodule.py, that is, those identifiers which are used in importer modules. It also defines a single, unchanging "preferred language" for that file, which is the assumed language for all non-translated identifiers in that file.

4-There is good UI support for creating a new translation for a word. However, the assumed user model is that words will be translated INTO a users preferred language; FROM the context of an importer module (you'd generally not add translations for a module from that module itself, since generally you wouldn't even have modules open whose preferred language is not your own); and therefore WITH an explicit user decision as to which module this translation belongs in (they want to use their language for identifier X which is in English, well, they must have had a reason to write it in English rather than their language so they presumably know what imported module it comes from.)

5-As a consequence of points 1 and 4, when you add a translation to a module whose preferred language is not English, that results in a change on-disk of the python code for that file. (Unlike the case for adding a translation for a file whose preferred language is English, which only anywhere results in safe on-screen changes). To enable the EDITOR to intelligently propagate these changes to other importers of the changed module, and the INTERPRETER to dumbly continue to work for these other importers before the editor gets to them, the changed file (and its dictionary) is given a new name (for instance " importedmodule.i18n.v1.py"). The old version is not deleted and keeps the old name.

6. Due to the notable disadvantages of point 5 (polluting the filesystem and, worse, the import/pythonpath namespace with old versions, whereas the best version of a file would always have a name like " importedmodule.i18n.v37.py"), there would be one change to the python core to facilitate cleanup. If someone deleted all the old copies and renamed the aforementioned best version to just importedmodule.py again, the default __import__ function would know how to find it when it couldn't find importedmodule.i18n.v37.py. This new feature would have no impact on any existing python code, and, to be honest, I think that its presence in the "changes in python3001" lists would be (minor but useful) propaganda for the new i18n features.

(obviously, a good delinting tool would take care of all the issues created by 5 at once.)

7. Docstrings and comments, as always, are a separate issue, but I think that they're also a soluble one.

.................

Is all this clear? Do y'all understand why it's necessary? Do you have any other ideas, or see problems with the above that I missed? Do you think I've made any intolerable or unnecessary compromises? Or do you just think that it's absolutely brilliant?

Homunq 17:14, 10 August 2007 (EDT)

Later thoughts

After discussion on email and further thought, I have decided to initially implement a two-level design.

Level 1: files with an intrinsic preferred language and no translation of internals (no private translation dict). This would work essentially as outlined above. These files would be editable only in their preferred language, though they would be importable from any language.

Level 2: files with no intrinsic preferred language and an internal/private translation dict. These are editable in any language. Any identifiers added when editing in a non-English language are tagged on-disk with the editor language when they were created, until they can be translated. When editing in non-English, untranslated English identifiers are marked as such instead of just being presented as-is.

Conversion from level 1 to level 2 would entail changing the file on disk. This is conceived as being a step that someone would take not initially upon sharing the file, but only when the module's public interface is relatively well-translated.

Current status:

See Bityi (translating code editor)