Hyperopia: Difference between revisions

From OLPC
Jump to navigation Jump to search
Line 1: Line 1:
'''Hyperopia''' is a planned wikieditor framework for editing a subset of a wiki while offline. It will support synchronizing with source wikis when online again - pushing any local changes made and/or pulling new updates from it.
'''Hyperopia''' is a planned wikieditor framework for editing a subset of a wiki while offline. It will support synchronizing with source wikis when online again - pushing any local changes made and/or pulling new updates from it.

== Related bugs ==

;Fixing the wikibrowse toolchain so creating new wikislices works
: <trac>10510</trac> - make wikipedia.xo work on F11 and F14
: <trac>10526</trac> - sync mwlib with the latest version upstream.



== Trajectory ==
== Trajectory ==

Revision as of 05:57, 16 April 2013

Hyperopia is a planned wikieditor framework for editing a subset of a wiki while offline. It will support synchronizing with source wikis when online again - pushing any local changes made and/or pulling new updates from it.

Trajectory

  • Design a web-based landing page interface with access to repositories of offline content for download and local installation. The web page can be demonstrated prior to the complete availability of binary executables for all major platforms.
  • Consolidate the bzipped archive and plaintext index into a single file. The .hype file will be composed of the archive followed by the index followed by the offset. (At some point, this script should be run server-side.)
  • The entire executable will simply run a webserver on an unused port and request the landing page be opened in the default web browser.

Bounties

  • OpenZIM library support in python is a high-priority bounty, though until editability concerns with the format are resolved, OpenZIM archives will be read-only in Hyperopia.
    Pyzim already exists, see pediapress work; file bugs on zimlib if necc.


  • On the subject of bounties, they should be highlighted on the project page. There could be more substantial instruction on the git repository.

Default format

In theory, hyperopia could work with multiple formats. The current format being used is that of Wikibrowse. Documents are stored in wikimarkup to simplify updates and changes. Initial support is provided for MediaWiki, including templates and math markup.


Updates

Updates will be posted to the source wiki using three-way diffs to simplify the process. When this seems too complicated, the update can be posted to a new page, and a link to the diff b/t it and the latest revision posted to the artucke talk page.

'complicated' is a customizable concept; in conservative cases this can mean 'when an intervening edit has occurred'; at the other end of the spectrum 'when a merge conflict cannot be reasonably resolved'.


Creating a new snapshot

There are few complete tools for creating new snapshots. Part of the Hyperopia framework will be simple methods for generating these in a suitable format.

Currently the Collections extension for mediawiki makes it easy to create a Zim export of a set of articles. That is a good example of an interface/workflow for compiling and downloading a snapshot, but does not yet export to a format that would support lossless editing and republishing of changes.


Creating large snapshots

Snapshots such as Wikipedia for Schools, or WikiBrowse, or Wikipedia 1.0, are 100M to 10+G in size.

We need a better workflow for creating these sorts of snapshots - which teams of people currently spend a lot of time creating, partly by hand and with one-off scripts. A sample interface might include the following options:

snapshot source material
"snapshot type" (wiktionary, abridged wikipedia, wikipedia by
category, wikisource, other/custom ...)

Here one could include some v. specific custom options; "1000 articles every <PROJECTNAME> should have", &c. An option to browse existing snapshots could replace this choice and the manual choice of parameters.

snapshot parameters
"language[s]"
"articles"  (trusted only, by popularity, by wp1.0 score, all)
"article stubs" (yes, no, only popular ones)
"article length" (1st para, lede, summary, full)
"image size" (none, thumbnails, full)
"target size"  (<50M, 200M, 1G, 4G, 16G, 64G, any size)
"image % of total"  (none, 20%, 50%, 80%)
"templates" (yes, no, oh please no)
export format[s]
"export format"  (zim, wikireader, woip, mw-xml, pdf, odt)

Here pdf and odt would simply be very long, somewhat unorganized collections, like a traditional encyclopedia; with autogenerated metadata - TOCs, page numbers, &c. There could be more specific export formats, such as an XO format which wrapped a woip or mw-xml export into the directory structure and zipfile needed for a new .xo file.


Some of the choices above would limit the selection available for the others.

'WP by category' could include some of the larger sorts of snapshots that can currently be generated as books - especially if one can update those automatically with page-scoring and wikitrust data. New custom snapshots could start from existing snapshots, combining them or extending them to a different set of languages.

See also