OurStoriesXML: Difference between revisions

From OLPC
Jump to navigation Jump to search
(more fields)
(..)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__FORCETOC__

== About ==
This page documents the mark-up language used to keep meta data for interviews recorded by the [[Our_Stories]] activity.
This page documents the mark-up language used to keep meta data for interviews recorded by the [[Our_Stories]] activity.


Some fields will be automatically generated during interview recording, ie date and file reference (and probably language and country of origin / geo-data, as those could be automatically appended by the school servers). Other fields, such as description fields and keyword tags, are optionally entered from the interviewers, or filled in by reviewers or teachers going over the stories afterwards.
The fields would be useful for categorization and searching of the collected audio clips.

some of the fields would be automatically generated during interview recording, ie date and file reference (and probably language and country of origin / geo-data, as those could be automatically appended by the school servers). Other fields, such as description fields and keyword tags would be optionally entered from either the children recording the interviews, or filled in by the teachers or other volunteers reviewing the stories.

== Quick Example ==


__TOC__
== Quick Example (v0.1) ==
<?xml version="1.0" ?>
<?xml version="1.0" ?>
<ourstories version="0.1">
<ourstories version="0.1">
<story date="2007-10-29 07:13:04"
<story date="2007-10-29 07:13:04"
oggfile="/home/olpc/ourstories/story_filename.ogg"
filename="story_filename.ogg"
title="story_title"/>
title="story_title"
country="country"
city="city"
source="UNICEF"
speaker="Ioanna P."/>
</ourstories>
</ourstories>


== Example (v0.2) ==


<?xml version="1.0" ?>
Desired fields as of 12/2007 include (quoting Curtis Chen):
<ourstories version="0.2">
* Unique Identifier - this title will be used for the link text on the landing pages; if no title is available, we use "Story By <first name of speaker>;" if no name is available, we use "UNTITLED"
<filesystem directory="/home/olpc/ourstories/" />
* Name of Speaker(s) - if none available, we use "Anonymous"
<story metalanguage="en"
* Name of Audio File - please avoid using non-alphanumeric characters in filenames (e.g., punctuation and spaces; dashes and underscores are okay)
source="St. Noah Girls school"
* Name of Photo - if available; same filename notes as above
title="swimming lessons, by Marie"
* Source - should always be "UNICEF"
thumbnailfile="marie.jpg"
* Country
thumbnailtitle="marie with microphone"
* City/Town
language="lg"
* Short description
date="2007-10-29 12:10:07 UTC">
<description>
Joba interviews Marie about her experiences with swimming.
</description>
<location country="Uganda"
city = "Entebbe"
latlong = "00.04N, 32.28E" />
<media mediatype="audio" format="ogg"
language="lg"
recorddate="2007-10-29 07:13:04 UTC"
filename="marie.ogg"
filetitle="joba interviewing marie"
people="Joba Buturo, Marie Frey"
title="swimming lessons" />
<media mediatype="image"
recorddate="2007-10-29 09:11:50 UTC"
filename="marie2.jpg"
filetitle="joba and marie"
people="Joba Buturo, Marie Frey"
title="photo together" />
</story>
</ourstories>



== Fields ==
Desired tags and fields as of 2/2008:
* Fields to be added in next revision: language, country of origin, generic description text field, keyword tags
* tag: '''Ourstories'''
** '''version''' - see below; version of the xml format being used. Currently 0.2

<font color=gray>
* tag: '''Filesystem'''
** '''directory''' - the directory in which files are stored
</font>

* tag: '''Story''' (usually just one, with data in english; could have one per localized language)
** <font color=gray>'''metalanguage''' - optional, default "en". the language in which the metadata in this xml file is recorded. </font>
** '''source''' - name of school/organization which collected the recording, e.g., "UNICEF", "Khairat School"
** '''title''' - the title for link text on landing pages; if no title is available, we use "Story By <first name of speaker>;" if no name is available, we use "UNTITLED"
** '''thumbnailfile''' - the name of the image file to be used as a photo or thumbnail. Please avoid using non-alphanumeric characters in filenames (e.g., punctuation and spaces; hyphens and underscores are ok).
** '''thumbnailtitle''' - alternate text for the thumbnail
** '''language''' - the language used to sort and display the story in searches; an ISO-639 code
** '''date''' - yyyy-mm-dd hh:mm:ss TZN - the date used to sort and display the story; usually the last-recording date or the upload date
* '''description''' - a short description
* <font color=gray>'''keywords''' - optional, keyword tags</font>

* tag: '''location''' (usually just one per story; the first location given is used to sort onto the world map)
** '''country'''
** '''city'''
** <font color=gray>'''latlong''' - latitude, longitude of the location of recording; as precise as anonymity allows, or can be later matched to the latlong of the server used to upload. </font>

* tag: '''media''' (often just one recording per story, audio or video)
** <font color=gray>'''mediatype''' - one of (audio, video, image, text) </font>
** <font color=gray>'''format''' - optional; the file format used. e.g., .ogg or .wav or .mpeg </font>
** '''language''' - the primary language used for the interview, by ISO-639 language code. (TODO: how to handle multiple langs? comma-separated?)
** '''recorddate''' - yyyy-mm-dd hh:mm:ss TZN
** '''people''' - people involved in / recorded in the story, names separated by commas. If none are given, we use "Anonymous"
** '''filename''' - the name of the media file (alphanumeric)
** '''filetitle''' - the title of the file, a brief description


All fields should contain UTF-8 text strings.

Other fields to be added:
* Name of server used to upload
* multiple authors
** Transcribers
** Translators
** Dubbing/translation notes

=== Other data to gather on OurStories ===
* Narrative flow, unstructured groupings
* External tags / additions to the above (cf. how Connexions does it)


== Parsing ==
== Parsing ==
Line 35: Line 103:


== Versioning ==
== Versioning ==
=== version 0.2 - 2/25/2008 ===
* Updated/unified xml fields.
* Separated fields into subtags based on the possibility of multiple values
* Added metalocalization of xml data

=== version 0.1 - 10/29/2007 ===
=== version 0.1 - 10/29/2007 ===
* Initial attempt to define a XML style meta-data mark-up language.
* Initial attempt to define a XML style meta-data mark-up language.
* Fields available: date, oggfile, title as attributes of a story
* Fields available: date, oggfile, title as attributes of a story

== Interested participants ==
* Allan Doyle
* John Huang
* ...

Latest revision as of 19:28, 7 April 2008

This page documents the mark-up language used to keep meta data for interviews recorded by the Our_Stories activity.

Some fields will be automatically generated during interview recording, ie date and file reference (and probably language and country of origin / geo-data, as those could be automatically appended by the school servers). Other fields, such as description fields and keyword tags, are optionally entered from the interviewers, or filled in by reviewers or teachers going over the stories afterwards.

Quick Example (v0.1)

<?xml version="1.0" ?>
<ourstories version="0.1">
  <story date="2007-10-29 07:13:04" 
        filename="story_filename.ogg" 
        title="story_title"
        country="country"
        city="city"
        source="UNICEF"
        speaker="Ioanna P."/>
</ourstories>

Example (v0.2)

<?xml version="1.0" ?>
<ourstories version="0.2">
  <filesystem directory="/home/olpc/ourstories/" />
  <story metalanguage="en"
         source="St. Noah Girls school"
         title="swimming lessons, by Marie"
         thumbnailfile="marie.jpg"
         thumbnailtitle="marie with microphone"
         language="lg"
         date="2007-10-29 12:10:07 UTC">
     <description>
         Joba interviews Marie about her experiences with swimming.
     </description>
     <location country="Uganda"
               city = "Entebbe"
               latlong = "00.04N, 32.28E" />
     <media mediatype="audio" format="ogg"
            language="lg"
            recorddate="2007-10-29 07:13:04 UTC" 
            filename="marie.ogg" 
            filetitle="joba interviewing marie"
            people="Joba Buturo, Marie Frey"
            title="swimming lessons" />
     <media mediatype="image"  
            recorddate="2007-10-29 09:11:50 UTC" 
            filename="marie2.jpg" 
            filetitle="joba and marie"
            people="Joba Buturo, Marie Frey"
            title="photo together" /> 
  </story>
</ourstories>


Desired tags and fields as of 2/2008:

  • tag: Ourstories
    • version - see below; version of the xml format being used. Currently 0.2

  • tag: Filesystem
    • directory - the directory in which files are stored

  • tag: Story (usually just one, with data in english; could have one per localized language)
    • metalanguage - optional, default "en". the language in which the metadata in this xml file is recorded.
    • source - name of school/organization which collected the recording, e.g., "UNICEF", "Khairat School"
    • title - the title for link text on landing pages; if no title is available, we use "Story By <first name of speaker>;" if no name is available, we use "UNTITLED"
    • thumbnailfile - the name of the image file to be used as a photo or thumbnail. Please avoid using non-alphanumeric characters in filenames (e.g., punctuation and spaces; hyphens and underscores are ok).
    • thumbnailtitle - alternate text for the thumbnail
    • language - the language used to sort and display the story in searches; an ISO-639 code
    • date - yyyy-mm-dd hh:mm:ss TZN - the date used to sort and display the story; usually the last-recording date or the upload date
  • description - a short description
  • keywords - optional, keyword tags
  • tag: location (usually just one per story; the first location given is used to sort onto the world map)
    • country
    • city
    • latlong - latitude, longitude of the location of recording; as precise as anonymity allows, or can be later matched to the latlong of the server used to upload.
  • tag: media (often just one recording per story, audio or video)
    • mediatype - one of (audio, video, image, text)
    • format - optional; the file format used. e.g., .ogg or .wav or .mpeg
    • language - the primary language used for the interview, by ISO-639 language code. (TODO: how to handle multiple langs? comma-separated?)
    • recorddate - yyyy-mm-dd hh:mm:ss TZN
    • people - people involved in / recorded in the story, names separated by commas. If none are given, we use "Anonymous"
    • filename - the name of the media file (alphanumeric)
    • filetitle - the title of the file, a brief description


All fields should contain UTF-8 text strings.

Other fields to be added:

  • Name of server used to upload
  • multiple authors
    • Transcribers
    • Translators
    • Dubbing/translation notes

Other data to gather on OurStories

  • Narrative flow, unstructured groupings
  • External tags / additions to the above (cf. how Connexions does it)

Parsing

Versioning

version 0.2 - 2/25/2008

  • Updated/unified xml fields.
  • Separated fields into subtags based on the possibility of multiple values
  • Added metalocalization of xml data

version 0.1 - 10/29/2007

  • Initial attempt to define a XML style meta-data mark-up language.
  • Fields available: date, oggfile, title as attributes of a story

Interested participants

  • Allan Doyle
  • John Huang
  • ...