Bundle project

From OLPC
Revision as of 18:52, 21 November 2007 by Sj (talk | contribs) (bundling project)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

guidelines for bundling

  • scripts: capture and publish any scripts used to create the bundle. Others who see missing metadata or other elements will want to be able to rebuild the bundle with different parameters. Taking care with documenting these scripts is the easiest way to guarantee compatibility with attribution and other licensing as well.
  • licenses: note licensing and attributino as granularly as the original creators did. every image in a collectino of images, every article in a set of articles, every definition in a set of definitions. If there is a simple way to pass on the aggregate history of collaborative works, include that; else include a link to the source history for the work (or a script that has options for extracting history, latest-author, date, and similar in the format of the original archive).
  • other metadata: see the #metadata section below. capture the original URL or source, and as many of the intervening authors, uploaders, and upload dates as possible, to help accurately identify the provenance of a work.
  • check source archives for APIs for gathering such data. Many sites, including modern mediawiki sites, have an API that will directly give you most information you need without #screenscraping.

screenscraping

...and regular expressions

extracting licenses from Wikimedia Commons

:%s/<a href="[^h][^>]*>\([^<]*\)<\/a>/\1/gc