Input methods

From OLPC
Revision as of 13:20, 28 August 2007 by 125.46.36.223 (talk)
Jump to: navigation, search
  english | 한국어 HowTo [ID# 62422]  +/-  


In order to input text in any particular language and writing system, we need a Unicode font to display it in, a rendering engine that knows how to display it, and a keyboard layout or Input Method Editor (IME) that provides a way to get all of the needed characters. Most alphabetic and syllabic languages can be typed on fairly simple keyboards that produce one Unicode character per key combination, using the ordinary typing keys together with Meta (usually Alt) and Compose keys. Any of several keys, including Menu and Windows keys, can be set to act as Compose. Then on Latin keyboards Compose-a-' produces á, Compose-c-, produces ç, and so on. Any accented letter that is included in Unicode in precomposed form falls within this capability. This covers letters that occur in any widely-used pre-Unicode character set, such as Latin-1 (ISO-8859-1), which supports French, German, Spanish, Italian, Scandinavian languages, and some other languages that use only the accented letters in Latin-1.

Multiple diacritics can be entered sequentially on simple keyboards of this type, while more elaborate input methods can enter more than one Unicode character code into the input buffer for each key combination. Yoruba is an example of a language that poses this choice, because it has vowel letters with an acute accent above and a dot below that are not available precomposed in Unicode.

The most elaborate IMEs are for input of CJKV characters for Chinese, Japanese, Korean, and the historical Vietnamese Chu Nomh writing. Each of these languages requires several thousand characters at a minimum, and there is a desire to have much more extensive CJKV sets available, including a number of Hong Kong characters and other recent additions, or the tens of thousands of historical characters important for scholarship.

Several hundred methods for entering CJKV characters have been invented over several decades. Among the most important (due to efficiency of use or ease of learning, or in a few cases both) are language-specific phonetic conversion systems for Chinese, Japanese, or Korean, and shape-based systems that are in principle independent of language, but in practice specific to particular countries up to now.

See also countries, languages, writing systems, fonts, locales, and keyboard layouts.

Tools

Tools for keyboard layouts, to come. loadkeys utility to load keyboard layouts.

Tools for IMEs, to come.

Input Methods

Phonetic conversion

The concept of phonetic conversion is that any CJKV language typed in any alphabet or other sound-based writing system can be converted using a combination of dictionary lookup together with grammatical and semantic analysis. The first successful phonetic conversion word processor was the Xerox 8010 J-Star, an outgrowth of the Xerox Alto computer and Smalltalk programming language in 1981. Thanks go to Alan Kay for the Alto and Smalltalk ideas, and to Joseph Becker for the language handling software. Phonetic conversion to CJKV characters exists for the following combinations, in many variations.

  • Romazi (Latin alphabet) or Zhuyin to either Traditional or Simplified Chinese hanzi 漢字
  • Romaja 로마자 (Latin alphabet) or Hangeul 한글 Korean alphabet to hanja 漢字
  • Romaji ローマ字 (Latin alphabet) or hiragana ひらがな syllabary to Japanese kanji 漢字

Phonetic conversion systems depend on a native alphabetic or syllabic representation, or on one or more Romanizations of the target language.

  • Chinese: Pinyin 拼音, Gwoyeu Romatzyh 國語羅馬字, Wade-Giles, and Yale are a few of hundreds
  • Japanese: Hepburn, Kunrei-shiki, Nippon-shiki, Yale
  • Korean: McCune-Reischauer (MR), Revised Romanization of Korean (RR), Yale

(Yes, the Yale Department of Linguistics was busy on the issue for decades.)

Dasher - gesture text entry

Dasher is an information-efficient text-entry interface, driven by natural continuous pointing gestures. It is a competitive text-entry system wherever a full-size keyboard cannot be used - for example, when operating a computer one-handed, by joystick, touchscreen, trackball, or mouse