Dictionaries

From OLPC
Revision as of 10:00, 6 December 2008 by Skierpage (talk | contribs) (→‎Bundled dictionaries: mention state for 8.2.0)
Jump to navigation Jump to search

There are monolingual and bilingual dictionaries for a remarkable number of languages available on the Web or as Free Software. You are invited to create one for your language, or to contribute to an existing project. Also thesauri, dictionaries of specialized terms (medicine, Net Jargon, etc.). Dictionaries for Input methods, text-to-speech and speech recognition are not listed here. There are also tools for creating dictionaries of various kinds to use with various software, including vocabulary drill for language study.

Bundled dictionaries

Some of the core software already supports dictionaries, primarily for spell checking: XULRunner thus Browse uses hunspell, Abiword thus Write uses enchant. <trac>6104</trac> is to unify the dictionaries, probably so they all use hunspell; this is an issue for the underlying Fedora software distribution.

The eSpeak text-to-speech library has a special dictionary.

In Release 8.2.0, it seems neither Browse nor Write has spell checking, and neither has a dictionary. Only the Firefox activity has spell checking, using its own local dictionaries directory. -- skierpage 10:00, 6 December 2008 (UTC)

Other dictionaries

dicts.info has a bundled dictionary developed by Zdenek Broz, in Ar/En/Es/Fr/Pt/Ro/Ru . It has definitions in English from Princeton's WordNet; but not all are appropriate for children. It has pictures from the dicts.info picture dictionary, which are all free, but without source... we are looking for well-sourced free images to replace these.

Other dictionaries for OLPC

Here is a set of 2500-word dictionaries, for use by OLPC and any of our projects, but not currently under one of our approved licenses, thanks to Babylon:

Web

Potentially appropriately licensed dictionary files available online

[1] is a collection of many dictionaries, including a Universal Dictionary that provides word-to-word-to-word mappings for many languages, with thousands of words. This dictionary doesn't appear to be licensed in a way that allows free redistribution, but the authors objection to it is due to the potential for users to be stuck with out-of-date dictionaries. Perhaps they would relicense if OLPC asked nicely

[2] is a collection of dictionaries created by Freelang. Their license allows for redistribution, or modification, but not both, and is phrased (in English) in terms of French copyright law. [3]

[4] is Ergane, a program designed to promote Esperanto by providing a translation system that uses Esperanto as the common intermediate index language. It advertises its wordlists as being "free of copyright and can be copied, distributed and changed without legal restrictions. You can use them in any way you like, even for commercial purposes!". The wordlists are generally of vintage 2004-6, and their quality is unknown. Translating via Esperanto may or may not be very effective.

[5] is a set of translation pairs extracted from wiktionary. They are all of the form English-X. Some, like Spanish and Portuguese, are quite extensive (8,000-10,000 word pairs); others, like Urdu and Nepali, are very small. Quality is unknown. They are now over a year old, so it may be worthwhile to ask the author for the scripts and rerun them ourselves. These dictionaries are definitely distributed under an acceptable license.

[6] is the dictionary from Pythonol, a python program whose intent is to help English speakers learn Spanish. The dictionary appears to be very complete (>70000 word pairs). It is exclusively English-Spanish. The dictionary appears to be licensed under a one-off license intended for software, based on the GPL but with some unusual "anti-profit" restrictions. The license does permit redistribution with modification under copyleft-like terms, so it is likely acceptable, if unpalatable.

Free Software using dictionaries

StarDict and viewers

We are using StarDict as our default dictionary viewer. It is fine for displaying a language with definitions, or two languages with translations. It is not yet good at displaying many languages at once in a space-efficient way. (The desired use is a database with 40 languages, and 40^2 views of source-target language with words in the source and definitions in the target... with each words and definition appearing exactly once in each language.)

This list is meant to indicate the range of software available. It is in no way complete.

Typing

Aneto O. was working on a typewriter activity that never quite made a fully functional activity bundle. Some new work is starting in this area, as of December 2007.

Spelling

  • Debian Junior Writing (editors and spelling checker)
  • aspell about 40 languages
  • ispell about 35 languages
  • myspell about 40 languages

Dictionary servers

  • dict more than 50 languages
  • Serpento dict server with full Unicode support

Other

  • dict-moby-thesaurus Moby Thesaurus
  • dict-bouvier English legal dictionary for US
  • dict-foldoc Free OnLine dictionary of computing terms
  • dict-vera Computer acronyms
  • The On-Line Hacker Jargon File, version 4.4.4
  • leksbot Botany and biology
  • rhyme

Chinese / Kanji characters

  • giten Japanese Kanji dictionary
  • kanjidic Japanese Kanji dictionary
  • kiten Japanese Kanji dictionary
  • hanzim Chinese dictionary
  • pydict English/Chinese dictionary
  • stardict English/Chinese dictionary
Merge-arrows.gif
It has been suggested that this article or section be merged with [[:{{{1}}}|{{{1}}}]]. (Discuss)