Wordnet Activity

From OLPC
Revision as of 06:51, 4 April 2008 by Shikhar (talk | contribs) (added Category:GSoC proposals)
Jump to: navigation, search

This is a draft idea for a language-learning oriented activity for the OLPC

Short Description

Exploit WordNet to create an adapted lexically oriented English learning task


Extended Description

WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. It was initially conceived as a model for the mental lexicon and is thus packed with lexico-semantic relations that any language learner encounters.



Structure

WordNet is a dictionary based psycholinguistic principles: Currently it accounts for about: 95600 Word Forms 51500 Simple words 44000 Collocations 70000 Word Meanings The lexical information is organized as a semantic network of concepts (based on word meaning rather than word form). It can thus bee seen as a lexical system based on conceptual lookup


Improving English language skills with word net

The apprentice is invited to choose a word and explore the relations of WordNet that are presented to him via friendly and intuitive graphical interface. The main pedagogical advantage of such an activity is that the apprentice is invited to explore. Thus not only he enriches his vocabulary but implicitly familiarizes himself with the "building blocks of semantics". This will not be a language learning activity for beginners but more of a friendly and innovative way of improving ones language skills.


Example

  • Apprentice chooses to explore the words around "car"
  • He is invited to choose from a list of meanings.
  • Apprentice chooses the first meaning: "a motor vehicle with four wheels; usually propelled by an internal combustion engine"
  • The relation chosen is the "is a" relation
  • A slice of the WordNet taxonomy is presented to the apprentice where car is the "focus word", in the center of the screen. Up in the taxonomy, the apprentice sees that "car is a type of "motor vehicle", which itself is a kind of "vehicle". On the same level, he sees that there are also many other types of motor vehicles, like "truck" , "motorbike" or "go kart". One level down he sees that there are many types of car, like "ambulance", "jeep", "race car" etc.
  • By hovering over any word, its definition pops up.
  • By clicking on the word, the view shifts, and it becomes the "focus word", and the relations shown are recalculated.


Interface

In addition to the main browsing screen the user interface will present a number of secondary buttons or menus for navigation and control of the activity. A help menu will be placed on the screen and will display a message that explains to the user what the current screen is about. A simple language generation technique (canned-text type) will be thought of in order to have a word specific help message. The bottom line being that this is a fun educational activity for normal people (and not linguists or ontologists), care has to be taken to design a simple and inviting interface, while trying to present the most information possible via non verbal media (colors, shapes etc...).


History function

Some mode of keeping track of the user activity will be designed. Such a structured history function will be useful not only for immediate use (back/forward buttons) but also to generate resumes of sessions, mark fist time visits to nodes etc. The resume generation can be worked on to have it produce savable reports.


The "Link Words" game

Parallel to the main "Browsing Activity" We can use WordNet's hierarchy to implement a game-like sub-activity that takes a random pair of words and computes the shortest path in the tree which should not exceed a given number of steps. The apprentice is then invited to find this path and given goals in the form of maximum number of steps.


Example

  • Go from sun to moon in 4 steps in the "is-a" relation
  • 1. START: "Sun"
  • 2. GO UP: "star"
  • 3. GO UP: "celestial body"
  • 4. GO DOWN "satelite"
  • 5. GO DOWN "moon"
  • You won!


Implementation

Python and GTK based activity. Cairo can be used for drawing and all the fancy graphics. The application will use existing python modules that can work with word net. Careful attention will be made to the use of colors and symbols when designing the interface, in order to maintain the representations across the lexicon as homogeneous as possible. Thus once acquainted with the color/symbol codes the apprentice will effortlessly assimilate all the added semantic information that enriches the the lexical base.


UML diagrams

  • A use case diagram

Olpc wn use case.png


  • The simplest case

Olpc wn seq1.png


  • A more complex one, with some relation browsing

Olpc wn seq2.png


User interface mock up

Posselect.JPG

  • For each part of speech the related meanings are displayed and the apprentice chooses one

Meanselect.JPG

  • The main browse screen is displayed. This is the core of the activity. In this particular case the word is the verb "clean" with the selected (and displayed) meaning. Clean is a way to change something and brush, wash etc... are ways to clean. For the others parts of speech the relations are different

Relatedwords.JPG

  • For each related word the meaning is displayed when the mouse cursor is over the bubble.

Meanlokup.JPG

  • A mouse click on the related word shifts the focus and either goes one level up or down in the WordNet hierarchy. In this case one level down.

Focusshift.JPG


Deliverables

  • Source code of the application
  • Information about the bugs and problems encountered during the testing process
  • Source code documentation and user manual


Time frame

  • 1: 15.04 - 25.05 WordNet exploration, project modelization and detailed specification.
  • 2: 26.05 - 09.07 Actual coding and implementation
  • 3: 15.07 - 11.08 Testing, debugging, user manual and code documentation.


Possible extensions

  • A possible extension to the application is integration with the existing speech activities.
  • A possible integration in a reader or web browser activity allowing direct lookup of words (The apprentice right-clicks on a word in HulaHop , selects "Lookup in WN" and the word is directly looked-up in the WordNet activity) For this extension a POS disambiguation can be designed using a POS Tagger such as the TreeTagger.
  • Integration with available online resources (wikipedia, wikitionnary etc.) - This means that any word form the activity can be looked-up in an online resource
  • A possible automated web search via Google images service. - This is maybe somehow utopical - tests have to be made to see to what extent the images from Google correspond the query word, and if there are modes of selection from the context, maybe with an adapted image-oriented information extraction algorithm.


Project limitations

  • Simplifying WD in a suitable manner for children
  • WordNet is intended to be a model of the human lexicon and not a teaching tool. This implies that there are a certain number of issues to be taken into consideration.
    • 1. There is a great number of entries that can be considered as obsene/adult. They have to be removed. This can be done by projecting a list of taboo-words on the database
    • 2. The hierarchical organization of the concepts varies considerably according to POS. For example the top nodes of the verb tree are organised in domains. For the nouns there are quite a bit of abstract nodes like "psychological feature". This implies two issues.
      • 2.1 Some simplification of the hierarchy may be needed. This will probably not involve "cropping" the database itself but instead setting some nodes as "not pertinent" so that they will not be displayed
      • 2.2 The browser screen can become saturated when browsing top level nodes, as there will be many lower level elements to display. This can be addressed by a on screen scrolling system.
      • 2.3 Some visuals can be crated according to the top-level categorization of WordNet. It will be nice if the apprentice is browsing the relations of, say, the verb "dance" an illustration of the "move" domain is shown.


About Me

My name is Nikola Tulechki and I am currently doing a masters degree in computational linguistics in Toulouse France. I'm an active GNU/Linux user and have experience with using Perl and Python for NLP. In the course of my studies I was astounded of how little the general public knows of the domain of my choice and how many outlets of computational linguistics' reserch are not yet explored. I follow with interest the evolution of the OLPC for quite a while now, and find the idea and the philosophy behind the project admirable. I'm currently "playing" with the OLPC OS and Sugar (on a virtual machine) and am enthusiastic to participate in the development process for this platform.

You can contact me at [slrasta@gmail.com]