Projects/Automatic translation software

Moses toolkit

  http://www.statmt.org/moses

 1. Run a virtual machine of the OLPC (If you have an actual OLPC, forget this step). I personally use Virtual Box (by Sun/Oracle, it's free and open source)
        http://www.virtualbox.org/
     There are prepackaged VM files located here
        http://dev.laptop.org/pub/virtualbox/
     I use version 656 because I have an OLPC of the same version, but take your pick
 2. Download the model file into the olpc
        http://groups.inf.ed.ac.uk/hoang/hieu/olpc/en-ht.tgz
    and the decoder
       http://groups.inf.ed.ac.uk/hoang/hieu/olpc/moses

    This may be a bit tricky as the Web browser on the OLPC takes some getting used to. You can install wget 
      su root
      yumm install wget

 3. Make the moses decoder executable
      chmod +x moses

 3. Unzip the file and cd into the directory
       tar zxf en-ht.tgz
       cd model
  4. Run the decoder, wait for 30 secs
       ../moses -f moses.ini
      until the prompt:
         Created input-output object : [1.000] seconds

  5. Type in some English and watch it translate
          i am a doctor
          Translating: i am a doctor 

          reading bin ttable
          size of OFF_T 8
          binary phrasefile loaded, default OFF_T: -1
          Collecting options took 0.200 seconds
          Search took 0.280 seconds
          BEST TRANSLATION: mwen se yon doktè [1111]  [total=-0.520] <<0.000, -4.000, 0.000, -0.511, 0.000, 0.000, 0.000, 0.000, 0.000, -16.452, 0.000, -5.911, -0.693, -3.622, 1.000>>
          mwen se yon doktè 
          Translation took 0.280 seconds
          Finished translating

  http://www.statmt.org/matrix/

 1. Create a client-server application which will run the resource intensive application on a server. Clients will be a Web browser or a custom Pythong app.
     - Skills required: Python, Apache, C++
 2. Fork the decoder source code to enable it to run on the OLPC. Minimize memory consumption, discard code not likely to be used by the application. 
     - Skills required: C++
 3. Minimize the work the decoder has to do by using a greedy search instead of a beam search, or have a very tight beam and other threshold.
     - Skills required: C++, statistical machine translation

 4. Different language pairs
 5. Speech-to-speech translation
 6. Integrating Optical Character Recognition (OCR) with translation
 7. Enable sharing of user vocabulary via the OLPC Mesh network
 8. Distributed training of data on the OLPC

  http://www.statmt.org/moses/

  Moses Support

 430Mhz CPU. AMD geode x86 processor
 237MB RAM
 1GB flash disk
 Linux OS

Projects/Automatic translation software

Contents

Getting started

Mission Statement and Objectives

Project Ideas

Progress

12th march 2009

3rd May 2009

31st May 2009

12th Jully, 2009

Credits

Navigation menu