Simple Digital Library Index: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
No edit summary
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<noinclude>{{Google Translations}}{{TOCright}}</noinclude>
===Introduction===
===Introduction===


Simple Digital Library Index (SDLI) is designed to make it quick and easy to assemble digital libraries for schools in a format that is accessible, fast and easy to add content in bulk, and can be easily replicated to make it accessible in multiple locations even if there are asynchronous or very slow connections.
Simple Digital Library Index (SDLI) is designed to make it quick and easy to assemble digital libraries for schools in a format that is accessible, fast and easy to add content in bulk, and can be easily replicated to make it accessible in multiple locations even if there are asynchronous or very slow connections. It provides a GUI for the library maker that makes it simple to build the library. Does not need server side components and generates a master index XML file.


[[Image:sdliss1.png|SDLI Browsing Interface]] [[Image:sdliss2.png|SDLI Creator GUI]]
The system is programmed using Java and XSL - just assemble a folder with the contents however one wishes to organize them and the meta data will be extracted from the files according to the format type (e.g. using meta tags in HTML files, tags from audio files, document properties info from word documents, etc).


==Features==
The appearance and interface can be very simply customized by editing the HTML in XSL sheets.

===How It Works===

* Pass 1 will generate an XML file of Dublin Core Meta Data using a class implementing an interface to extract Meta Data from the file
* Pass 2 will generate HTML files to server as a browsing index system.

===Features===


* Generates a plain old HTML browsing interface - no MySQL, PHP, etc. required
* Generates an index of all the meta data and files in the library - you can download this and use it the same way as a repository
* Generates a basic Javascript search system that works even offline
* Very simple to add content - just assemble folders of content and tag the files using their own format if required.
* Very simple to add content - just assemble folders of content and tag the files using their own format if required.
* Very easy to replicate - static HTML files are generated; therefor it can be easily copied to any medium, made accessible by being put into a webserver (e.g. Apache) directory
* Very easy to replicate - static HTML files are generated; therefor it can be easily copied to any medium, made accessible by being put into a webserver (e.g. Apache) directory
* Suitable for offline use or use on a school server that does not have an internet connection.
* Suitable for offline use or use on a school server that does not have an internet connection.
* Can be simply localized by using an XML dictionary file. Will look and see if this can be generated from po files etc. HTML pages will be automatically generated for each language as specified in the config file.
* Can be simply localized by using an XML dictionary file. Will look and see if this can be generated from po files etc. HTML pages will be automatically generated for each language as specified in the config file.
* Can be made searchable by feeding the index page to Nutch [http://lucene.apache.org/nutch]
* Can be made full text searchable by feeding the index page to Nutch [http://lucene.apache.org/nutch]

==Roadmap==

This is under active development - the key things we're working on:

* Adding HTML 5 Manifests to each generated entry page so selecting it for offline use would download all required components
* Adding to the reader activity so that one can browse the index and then select downloads
* Add Ogg Video support
* Add ODF document support
* Add wizard for tagging HTML content / offline websites

==Getting Started==

You can download the distribution http://www.paiwastoon.af/otherdownloads/sdli-0.01-dist.tar.gz . Make sure you have JRE installed and then just double click the Jar file (on windows) or cd to the directory and run java -jar SimpleLibraryIndexSystem.jar

Source available from: http://dev.laptop.org/git/projects/sdli

==Website Downloads==

Recommend to use Webdump / httrack to download the website - then make a dummy page with the meta data as per the sample files.

==File Formats==

* Ogg Audio - will extract meta data (Title, Subject from Genre)
* PDF - Will extract title, subject, etc.
* HTML - Looks through dublin core meta data.
* Ogg Video - Work in progress

Adding File formats is easy - there's a Java interface that defines what needs done - see existing ones in dcxmlgenerators

==Sample of Dublin Core XML File used==

<pre>
<?xml version="1.0" encoding="UTF-8"?>

<!--
Document : dcxml.xml
Created on : July 7, 2009, 3:42 PM
Author : mike
Description:
Purpose of the document follows.
-->

<entry xmlns="http://olpc.af/ns/olpclibentry" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>Library Base Folder</dc:title>
<dc:subject>Mathematics</dc:subject>
<dc:subject>Physics</dc:subject>
<dc:type>Audio</dc:type>
<dc:description>Folder of stuff</dc:description>
<dc:language>en</dc:language>
<dc:language>ps</dc:language>
<dc:language>fa</dc:language>
<dc:contributor>ERTV</dc:contributor>
</entry>
</pre>

* Run the index creator by going to the file with the jar and running java -jar SimpleLibraryIndexSystem.jar (where this is the name of the jar file)

* Open up the generated index.html in the browser

===Alternatives===

Greenstone seems to be the most famous; however this was quite complex / non standard to setup on Linux, required quite some effort to tag all the data into a collection, and finally the search system didn't include the text being searched for... This could have been quite nice for one big online collection; but for distributed collections it didn't seem quite what we were looking for. Nevertheless some very nice content is being made available this way.

Various systems have ways of making galleries etc. - but most require PHP, MySQL, or some server stack. We wanted something as simple as possible to replicate and copy by any means.

Moodle was more designed as a learning management system - not as a library...


===Current Status===


[[Category:Content ideas]]
Currently has support for Ogg audio files and HTML files. Is work in progress in early testing in OLPC Afghanistan. Currently requesting project hosting space to upload source etc. Will be released under a GPL license.
[[Category:Content Repository]]
[[Category:Software ideas]]

Latest revision as of 19:16, 1 November 2009

Introduction

Simple Digital Library Index (SDLI) is designed to make it quick and easy to assemble digital libraries for schools in a format that is accessible, fast and easy to add content in bulk, and can be easily replicated to make it accessible in multiple locations even if there are asynchronous or very slow connections. It provides a GUI for the library maker that makes it simple to build the library. Does not need server side components and generates a master index XML file.

SDLI Browsing Interface SDLI Creator GUI

Features

  • Generates a plain old HTML browsing interface - no MySQL, PHP, etc. required
  • Generates an index of all the meta data and files in the library - you can download this and use it the same way as a repository
  • Generates a basic Javascript search system that works even offline
  • Very simple to add content - just assemble folders of content and tag the files using their own format if required.
  • Very easy to replicate - static HTML files are generated; therefor it can be easily copied to any medium, made accessible by being put into a webserver (e.g. Apache) directory
  • Suitable for offline use or use on a school server that does not have an internet connection.
  • Can be simply localized by using an XML dictionary file. Will look and see if this can be generated from po files etc. HTML pages will be automatically generated for each language as specified in the config file.
  • Can be made full text searchable by feeding the index page to Nutch [1]

Roadmap

This is under active development - the key things we're working on:

  • Adding HTML 5 Manifests to each generated entry page so selecting it for offline use would download all required components
  • Adding to the reader activity so that one can browse the index and then select downloads
  • Add Ogg Video support
  • Add ODF document support
  • Add wizard for tagging HTML content / offline websites

Getting Started

You can download the distribution http://www.paiwastoon.af/otherdownloads/sdli-0.01-dist.tar.gz . Make sure you have JRE installed and then just double click the Jar file (on windows) or cd to the directory and run java -jar SimpleLibraryIndexSystem.jar

Source available from: http://dev.laptop.org/git/projects/sdli

Website Downloads

Recommend to use Webdump / httrack to download the website - then make a dummy page with the meta data as per the sample files.

File Formats

  • Ogg Audio - will extract meta data (Title, Subject from Genre)
  • PDF - Will extract title, subject, etc.
  • HTML - Looks through dublin core meta data.
  • Ogg Video - Work in progress

Adding File formats is easy - there's a Java interface that defines what needs done - see existing ones in dcxmlgenerators

Sample of Dublin Core XML File used

<?xml version="1.0" encoding="UTF-8"?>

<!--
    Document   : dcxml.xml
    Created on : July 7, 2009, 3:42 PM
    Author     : mike
    Description:
        Purpose of the document follows.
-->

<entry xmlns="http://olpc.af/ns/olpclibentry"  xmlns:dc="http://purl.org/dc/elements/1.1/">
    
    <dc:title>Library Base Folder</dc:title>
    <dc:subject>Mathematics</dc:subject>
    <dc:subject>Physics</dc:subject>
    <dc:type>Audio</dc:type>
    <dc:description>Folder of stuff</dc:description>
    <dc:language>en</dc:language>
    <dc:language>ps</dc:language>
    <dc:language>fa</dc:language>
    <dc:contributor>ERTV</dc:contributor>
</entry>
  • Run the index creator by going to the file with the jar and running java -jar SimpleLibraryIndexSystem.jar (where this is the name of the jar file)
  • Open up the generated index.html in the browser

Alternatives

Greenstone seems to be the most famous; however this was quite complex / non standard to setup on Linux, required quite some effort to tag all the data into a collection, and finally the search system didn't include the text being searched for... This could have been quite nice for one big online collection; but for distributed collections it didn't seem quite what we were looking for. Nevertheless some very nice content is being made available this way.

Various systems have ways of making galleries etc. - but most require PHP, MySQL, or some server stack. We wanted something as simple as possible to replicate and copy by any means.

Moodle was more designed as a learning management system - not as a library...