Talk:WikiReader

From OLPC
Revision as of 03:02, 8 May 2008 by Sj (talk | contribs) (..)
Jump to navigation Jump to search

old getting started list

Old todo list for someone getting involved, from mel:

We need a Python programmer (or a small team of Python programmers working together) to start the project off by porting the server code from Ruby to Python so it'll be easier to run on the XO. Ruby's quite easy to read and you don't have to be a Ruby programmer to do this (but it helps if you know Python). The code is very simple and short (less than 300 lines), so this should take no more than a weekend. Here's a suggested how-to-do-it procedure.

  1. Read this page to get an idea of what we're trying to do.
  2. Read the project homepage to get an overview of what the app does. Also see the google code project.
  3. Download the source code and take a look around. Notice how most of the code is either shell scripts or C, but there's a folder of ruby (rb) code. This is the stuff we want to port.
  4. (Optional but recommended): Download and install Ruby and test out the existing code so you can see the app in action. Follow the instructions in the "Getting Started" section of the README file (in the source code you just downloaded) to get a wikipedia datafile parsed and the web server running. We'd recommend using a smaller wikipedia than the English language one.
  5. Take a look at the files in the rb folder. There are four main ones to port to Python (the rest are very short "helper" files and should take just a few minutes to rewrite).
    1. bzipreader.rb (ruby interface to c/bzipreader.c; supports streaming bz2 files) - probably the most difficult, since you'll have to interface your python code with C (bzipreader.c). If someone has a tutorial or resources on how to do this, please post the link here.
    2. index.rb (generate an article-to-block index using bzipreader.rb)
    3. server.rb (Mongrel-based server for using WP dumps with a web browser) - we'd suggest using the built-in Python webserver, BaseHTTPServer, for this.
    4. xmlprocess.rb (generate stripped, XML-less file from a vanilla WP dump) -- this wouldn't have to be ported, since we could prepare the archive elsewhere and just serve it on the XO.
  6. Put the new files (bzipreader.py, index.py, server.py... etc) in a "py" folder and delete the "rb" one when you're done porting.
  7. Remember to license your work under the GPL (you must, since the original code is GPL) by putting a copy of the license in your folder (or just leaving the COPYING file from the original source in).