Bundle project

From OLPC
Revision as of 15:19, 21 November 2007 by Sj (talk | contribs) (scope)
Jump to: navigation, search

guidelines for bundling

  • scripts: capture and publish any scripts used to create the bundle. Others who see missing metadata or other elements will want to be able to rebuild the bundle with different parameters. Taking care with documenting these scripts is the easiest way to guarantee compatibility with attribution and other licensing as well.
  • licenses: note licensing and attributino as granularly as the original creators did. every image in a collectino of images, every article in a set of articles, every definition in a set of definitions. If there is a simple way to pass on the aggregate history of collaborative works, include that; else include a link to the source history for the work (or a script that has options for extracting history, latest-author, date, and similar in the format of the original archive).
  • other metadata: see the #metadata section below. capture the original URL or source, and as many of the intervening authors, uploaders, and upload dates as possible, to help accurately identify the provenance of a work.
  • check source archives for APIs for gathering such data. Many sites, including modern mediawiki sites, have an API that will directly give you most information you need without #screenscraping.

screenscraping

...and regular expressions

extracting licenses from Wikimedia Commons

:%s/<a href="[^h][^>]*>\([^<]*\)<\/a>/\1/gc

:%s/<hr\_p\{-}

.*<\/p>//c (rm gfdl-template excess)

topics and scope

Try to pick a topic that can be covered elegantly in a compact bundle. Most laptop bundles should be under 10M in size. If you think you have a topic that can't possibly be covered this way, consider covering a smaller scope, the same scope with less depth, or the same topic at a different level of abstraction. Larger collections (up to 1G in size) can be packaged for a school library.