Afghanistan School Server
Jump to navigation
Jump to search
Introduction
This page is an overview of the School Server work that has been done in Afghanistan. Our objective is to maximise the content available in limited connectivity situations and to minimize bandwidth requirements (if any).
For that we have a few key systems:
- Webdump - Based on HTTrack - is a small PHP script designed to be run on remote servers. It copies websites for offline use and then zips them so that they can be downloaded in one .tar.gz file . We can select some good educational websites and then keep local copies on each school server.
- Nutch - An Open Source search engine based on Lucene by Apache - it can build a search index of all the content that is on the school server including HTML files, media files, PDFs, etc.
- Joomla - this is designed to make a simple school portal. Teachers can make announcements, school news, galleries etc.
This stuff is mostly work in progress!!! Documentation is also somewhat work in progress... We will as soon as possible put a school server running on the Internet so that everyone can try it out...
Webdump
Docs to be put here....
Nutch
Rough Notes here. FYI Localizing is not quite 100% straightforward - please look in the Nutch Wiki where I have made notes on this topic...
- Extract in /opt
- Make a blank text file in /opt/nutch-1.0/urls e.g. /opt/nutch-1.0/urls/schoolserver with the line http://schoolserver/afbase/ - This is a list of URLs to scan to begin the crawl
- Add +http://schoolserver/ to /opt/nutch-1.0/conf/crawl-urlfilter.txt
- Copy nutch-1.0.war to apache-tomcat-dir/webapps
- Under the extracted folder in webapps modify the properties to point to the crawl folder - see http://zillionics.com/resources/articles/NutchGuideForDummies.htm
- Modify conf/nutch-site.xml to include crawler properties - see http://zillionics.com/resources/articles/NutchGuideForDummies.htm
- Modify conf/nutch-default.xml and change plugin.includes to have msword and pdf - see http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09580.html -
- Modify conf/nutch-default.xml so that the entire page gets downloaded - change all content.limit properties
<property> <name>http.content.limit</name> <value>-1</value> <description>The length limit for downloaded content, in bytes. If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all. </description> </property>