WikiBrowse Editing

From OLPC
Jump to navigation Jump to search
for the wikiediting framework, see also: Hyperopia


This technique allows you to edit the encyclopedia articles in an existing Wikislice, using the WikiBrowse software.

You can then prepare the edits, merging them into the compressed article database, and preparing a new '.xo' file for distribution.

This is a very rough solution -- not a finished product.

Usage

Setup the "server"

Note: You can run this on a machine for "local" editing, or on a server. In the case of a server, there is currently no authentication or security of any kind in this mode -- use it only within a safe local network.

Get the latest wikiserver code

git clone git://dev.laptop.org/users/martin/wikiserver

Unpack Wikipedia-XX.xo . In this case, we will experiment with Spanish Wikipedia-20 from http://dev.laptop.org/~cjb/eswiki/0.84/Wikipedia-20.xo

unzip Wikipedia-20.xo

Now copy the data files from Wikipedia-20.xo to the directory where the wikiserver code is. The data files usually reside in a directory with a language-country prefix. In this case, es_PE -- because the very first Wikipedia bundle was made for Perú:

cp -r Wikipedia.activity/es_PE wikiserver/

Prepare a directory to store your edits

mkdir ~/wikipediaedits/

Running and stopping the server

Now to run the server,

(python server.py es_PE/es_PE.xml.bz2 8000 ~/wikipediaedits/ 2>&1 ) | tee wikiserver.log

To stop the server, hit control-C . In every run, it will write a logfile (wikiserver.log). If you hit a bug or a problem, please include the logfile in the report.

Edit content

If you are running the above on a network server, open your webbrowser and go to http://<name-or-IP-of-server>:8080/ .

If you are running it on a single machine, and use http://localhost:8080/

With each page you will see an 'edit' link, leading to a form. Edit and submit your changes. The UI is extremely simple

Review changes

There is not webbased UI for change review at the moment. However you can review the changes from the commandline:

diff -ur ~/wikipediaedits/wiki.orig ~/wikipediaedits/wiki

Prepare/install the merge/update tools

Compile the tools -- this is only required once.

sudo yum install rubygem-RubyInline ruby \
     bzip2-devel automake autotools make gcc
cd woip/c
./bootstrap.sh
make lsearcher bzipreader blocks
cd ../../locate.freebsd
make all

Merge edits into data files

We will create a new set of datafiles, based on the old datafiles + your edits

# create a destination directory
mkdir es_PE_edited
# run the merge, this can take a long time
bzcat es_PE/es_PE.xml.bz2.processed  | \
   tools/mergeupdates.py ~/wikiedits | bzip2 -c \
   > es_PE_edited/es_PE.xml.bz2.processed

This process takes a while. As it runs through the file, it will indicate when it is overriding a particular content file, for example:

Merging ~/wikiedit/Andorra
Merging ~/wikiedit/Física

Once the main data file (.processed) is ready, reindex.

Now run

./woip/sh/process-update es_PE_edited/es_PE.xml.bzip2.processed

The resulting files will be in es_PE_edited

Create a new Wikipedia.xo

  • Replace the files in Wikipedia.activity/es_PE with the files from your es_PE_edited directory.
  • Update the version string in activity.info -- please check with developers on devel@lists.laptop.org to pick a version number that will not conflict.
  • Use zip to re-create the bundle file
  • Test...

Workflow

The use of this editing facility with a group of users can be coordinated with a webbased spreadsheet such as Google Docs. You can import the file es_PE.xml.bz2.index.txt into a spreadsheet to have an listing of all the pages to review.

Development notes

The software works as is. Motivated programmers might be interested in tackling this informal TO DO list

  • Implement HTTP Auth 'Basic' for simple user/password protection
  • Use git for history.
    • Init a git repo, commit to it on every edit.
    • We will have to add files opportunistically -- it is a huge cost to git add the whole dataset.
    • Show file history via a gitweb/cgit cgi
  • Better UI
  • Track Seen/audited status for all pages for more integrated workflow