Hyperopia

From OLPC
Revision as of 02:56, 28 April 2011 by 108.20.250.205 (talk) (Creating large snapshots)
Jump to: navigation, search

Hyperopia is a planned wikieditor framework for editing a subset of a wiki while offline. It will support synchronizing with source wikis when online again - pushing any local changes made and/or pulling new updates from it.

Related bugs

Fixing the wikibrowse toolchain so creating new wikislices works
<trac>10510</trac> - make wikipedia.xo work on F11 and F14
<trac>10526</trac> - sync mwlib with the latest version upstream.


Default format

In theory, hyperopia could work with multiple formats. The current format being used is that of Wikibrowse. Documents are stored in wikimarkup to simplify updates and changes. Initial support is provided for MediaWiki, including templates and math markup.


Updates

Updates will be posted to the source wiki using three-way diffs to simplify the process. When this seems too complicated, the update can be posted to a new page, and a link to the diff b/t it and the latest revision posted to the artucke talk page.

'complicated' is a customizable concept; in conservative cases this can mean 'when an intervening edit has occurred'; at the other end of the spectrum 'when a merge conflict cannot be reasonably resolved'.


Creating a new snapshot

There are few complete tools for creating new snapshots. Part of the Hyperopia framework will be simple methods for generating these in a suitable format.

Currently the Collections extension for mediawiki makes it easy to create a Zim export of a set of articles. That is a good example of an interface/workflow for compiling and downloading a snapshot, but does not yet export to a format that would support lossless editing and republishing of changes.


Creating large snapshots

Snapshots such as Wikipedia for Schools, or WikiBrowse, or Wikipedia 1.0, are 100M to 10+G in size.

We need a better workflow for creating these sorts of snapshots - which teams of people currently spend a lot of time creating, partly by hand and with one-off scripts. A sample interface might include the following options:

snapshot source material
"snapshot type" (wiktionary, abridged wikipedia, wikipedia by
category, wikisource, other/custom ...)

Here one could include some v. specific custom options; "1000 articles every <PROJECTNAME> should have", &c. An option to browse existing snapshots could replace this choice and the manual choice of parameters.

snapshot parameters
"language[s]"
"articles"  (trusted only, by popularity, by wp1.0 score, all)
"article stubs" (yes, no, only popular ones)
"article length" (1st para, lede, summary, full)
"image size" (none, thumbnails, full)
"target size"  (<50M, 200M, 1G, 4G, 16G, 64G, any size)
"image % of total"  (none, 20%, 50%, 80%)
"templates" (yes, no, oh please no)
export format[s]
"export format"  (zim, wikireader, woip, mw-xml, pdf, odt)

Here pdf and odt would simply be very long, somewhat unorganized collections, like a traditional encyclopedia; with autogenerated metadata - TOCs, page numbers, &c. There could be more specific export formats, such as an XO format which wrapped a woip or mw-xml export into the directory structure and zipfile needed for a new .xo file.


Some of the choices above would limit the selection available for the others.

'WP by category' could include some of the larger sorts of snapshots that can currently be generated as books - especially if one can update those automatically with page-scoring and wikitrust data. New custom snapshots could start from existing snapshots, combining them or extending them to a different set of languages.

See also