Annotation

From OLPC
Jump to navigation Jump to search

We want to support annotation of any document, in a generalized way that can be supported by a unified aggregation and sharing system (where annotations/comments are similar to other objects in the object store). Media that should support annotation include documents and images; perhaps also any webpage or item viewed through a browser. In the extreme one can imagine adding notes to any moment in time using a laptop; associated as well as possible with a specific item with its own identifier, or a specific activity, or at least a combination of timestamp and screenshot and context.

We should support elegant libraries for displaying aggregated notes; levels of publicity (and perhaps ways to change this after the fact for clusters of notes) and ways to highlight annotations and reviews as they take place.

See content stamping for a specific kind of annotation that supports reviewing.

Desired Features

Aggregation

When annotations are separate from the underlying work, one can see a constellation of notes from many people. A few views which we want to readily support:

  • no comments
  • my own comments
  • comments from a group (myself/class/teachers)
  • all comments
  • new comments

Types of annotation to support:

  • point-and-click annotation associated with a spot on an image or page
  • selection annotation associated with a string in a document or region in an image
  • block annotation associated with a paragraph or block in a document or region in an image
  • document-level annotation such as tags or reviews

API

Atom

(Per discussion with Ian Bicking and Joshua Gay)

A proposal for an Atom representation of an annotation based on the Atom syndication standard.

Here's an example from that document:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  
  <title>Example Feed</title> 
  <link href="http://example.org/"/>
  <updated>2003-12-13T18:30:02Z</updated>
  <author> 
    <name>John Doe</name>
  </author> 
  <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
  
  <entry>
    <title>Atom-Powered Robots Run Amok</title>
    <link href="http://example.org/2003/12/13/atom03"/>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <summary>Some text.</summary>
  </entry>
  
</feed>

An annotation is a single Atom entry:

<entry
  xmlns:ann="http://wiki.laptop.org/go/Annotation">
  <title></title>
  <link href="{document being commented on}" />
  <id>urn:uuid:blahblahblah</id> 
  <updated>YYYY-MM-DDTHH:MM:DD</updated>
  <source>{uri}</source>
  <content type="html">
    Delete this term
  </content>
  <category term="copyedit" scheme="http://laptop.org" />
  <category term="{tag}" scheme="?" />
  <ann:selected-text>Some text that was highlighted</ann:selected-text>
  <ann:pointer>/html[1]/body[1]/div[2]</ann:pointer> 
  <author>
    <uri>{open ID URI}</uri>
  </author>
</entry>

Entries can be posted with the Atom Publishing Protocol another IETF standard.

This is based on a APP Collection. This is some base URI, e.g.,:

 http://localhost/APP/

You POST to this URI with this entry as the content body. The server will respond with a Location header that indicates where the entry has been placed. This value will be put into the <source> tag, indicating where the entry is now stored. Later updates to the comment are done by PUTting to this URI, with the new entry. Removing the comment is done by DELETE'ing the URI.

When you do GET /APP/ you will get an Atom feed. This is basically a set of entries enclosed in a <feed> element.

When you load a page, you would load any comment feeds in which you have indicated interest. This would happen asynchronously -- locally-hosted or cached comments will come up quickly, but potentially other comments would come up more slowly. This also involves fetching data from other domains, which is currently barred in Javascript with XMLHttpRequest; we'll have to create an exception to the permissions.

Threading Comments

Annotations can have comments; effectively annotations on the annotation. This can be done using the in-reply-to extension. Lets say you leave a comment on the previous example entry, you'd have:

 <entry
  xmlns:ann="http://wiki.laptop.org/go/Annotation"
  xmlns:thr="http://purl.org/syndication/thread/1.0">
  <title></title>
  <link href="{source uri from other entry; or to original document?}" />
  <id>urn:uuid:04f9820d03</id> 
  <thr:in-reply-to ref="urn:uuid:blahblahblah"
   href="{source uri from other entry}"
   type="application/atom+xml" />
  <updated>YYYY-MM-DDTHH:MM:DD</updated>
  <content type="html">
    This annotation is dumb.
  </content>
  <author>
    <uri>{open ID URI}</uri>
  </author>
</entry>

Redundancy and Threading

If an annotation is attached to another annotation that isn't available (maybe it hasn't been uploaded, maybe it's been deleted, maybe it's private, etc), then the tree-like threading of the comments starts to fall apart. To make this more reparable, some redundant information will be included in the Atom entry. A link to the most-parent document will be retained through all entries. In the case of an annotation that is attached to a specific piece of text, that text will also be copied. Then the UI may place the orphaned comment someplace appropriate. The orphaned comment should still look lost -- it's not in its proper context, and ideally it would be moved or deleted or edited to make it more appropriate.

XSS Security

We will be injecting other people's HTML into content. We must be sure this HTML does not contain dangerous stuff, like Javascript that itself calls XMLHttpRequests. We must be sure to scrub the HTML carefully. It is difficult to do this in Javascript, but that would be most secure (on the client when loading the comments). We could require XHTML, embedded in the Atom, to do this. Or, we could rely on server-side filtering of the HTML.