Content bundle making script

From OLPC
Revision as of 14:18, 14 May 2008 by Cjl (talk | contribs) (add inadvertant copyright infringement warning)
Jump to: navigation, search

Intro

Quick hack of a script that takes an index page on the web and pulls it, and all the pages it references, into a library bundle. Useful for things like Wikislices.

At the end of running the script (it is quiet and may take several minutes as it downloads files) you should have a bundlename.xol file in the directory you ran the script on, ready to go on the XO.

Caution

This script is only suitable for use from starting pages where the link-topology is well understood. Even going one layer deep from a Wikipedia page will possibly retrieve some copyrighted content and make the resulting bundle unsuitable for redistribution by OLPC. This is an experimental tool and the resulting bundles should not be deposited in the OLPC Library without careful review of potential copyright issues.

Improvements

These are things I know should be fixed in the script - if you have the time and inclination, please do them.

  • test the script on several index-type sites and try out the .xol files generated on your XO, and see if they work. Post results to the talk page.
  • make the library.info file generation less horribly hackish... right now it just appends all the needed information to the file, but the implementation is very brittle in that the library.info format and default values are hard-coded into the script right now.
  • make the script clean up after itself. It leaves a bunch of cruft hanging around that isn't the .xol file.
  • if this is sufficiently useful, consider asking for Project hosting for the script - please cc me in on the request if you do this!

Usage

Copy the script into a directory on your machine and make it executable. usage is bundlemaker url bundlename languagecode

where

  • url is the url you want to turn into your index page
  • bundlename is the name you want your bundle to have (bundle.xol)
  • languagecode is the 2-letter language code the materials are written in; for instance, en, es, pt

example:

./bundlemaker http://www.nlm.nih.gov/medlineplus/healthtopics.html healthbundle en

Script

#!/bin/sh
# usage: bundlemaker <url> <bundlename> <language>
# $1 - url to download from
# $2 - folder name to download to, also name of bundle
# $3 - language code this content is in (en, es, pt, etc.)

# make sure there are 3 arguments
if [ $# -ne 3 ]; then
    echo 1>&2 Usage: bundlemaker URL BUNDLENAME LANGUAGECODE
    exit 127
fi

# get the files from <url>
# place them in a folder called <bundlename> in cwd
# create a log at <bundlename>-log for debugging
wget -rp -nH -l1 -o ${2}-log -P $2 $1

# create the metadata
cd $2
mkdir library
cd library
# note: this is a stupid script! please fix it!
# see http://wiki.laptop.org/go/Sample_library.info_file
echo [Library] >> library.info
echo name = $2 >> library.info
echo global_name = $2 >> library.info
echo long_name = $2 >> library.info
echo library_version = 1 >> library.info
echo host_version = 1 >> library.info
echo l10n = false >> library.info
echo locale = $3 >> library.info

# zip and rename as bundle
cd ../..
zip -9 -rq $2 $2
mv $2.zip $2.xol

# delete extra files
# TODO: implement
exit 0