Annotation: Difference between revisions
Ian Bicking (talk | contribs) (→Atom) |
m (Reverted edits by 211.39.150.104 (Talk) to last version by Sj) |
||
(40 intermediate revisions by 12 users not shown) | |||
Line 1: | Line 1: | ||
{{Translations}} |
|||
We want to support annotation of any document, in a generalized way that can be supported by a unified aggregation and sharing system (where annotations/comments are similar to other objects in the object store). Media that should support annotation include documents and images; perhaps also any webpage or item viewed through a browser. In the extreme one can imagine adding notes to any moment in time using a laptop; associated as well as possible with a specific item with its own identifier, or a specific activity, or at least a combination of timestamp and screenshot and context. |
|||
We want to support '''annotation''' of any document, in a generalized way that can be supported by a unified aggregation and sharing system (where annotations/comments are similar to other objects in the object store). Media that should support annotation include documents and images; perhaps also any webpage or item viewed through a browser. In the extreme one can imagine adding notes to any moment in time using a laptop; associated as well as possible with a specific item with its own identifier, or a specific activity, or at least a combination of timestamp and screenshot and context. |
|||
We should support elegant libraries for displaying aggregated notes; levels of publicity (and perhaps ways to change this after the fact for clusters of notes) and ways to highlight annotations and reviews as they take place. |
We should support elegant libraries for displaying aggregated notes; levels of publicity (and perhaps ways to change this after the fact for clusters of notes) and ways to highlight annotations and reviews as they take place. |
||
In October 2008, there was discussion on sugar@ about '''[[#Annotation in Browse|annotation in Browse]]''' in particular, and its interaction with the [[Journal]]. |
|||
See [[content stamping]] for a specific kind of annotation that supports reviewing. |
|||
== Types of annotation == |
|||
An '''annotation''' is any kind of data imposed onto another page/document/object. Generally you do not need the permission of the author to add these comments or discussion. You may share your annotations with other users, or they may be private. |
|||
An annotation ''may'' be: |
|||
* A comment that applies to a specific range of text |
|||
* Something directed at a coordinate location in a PDF or image |
|||
* A comment applied to a document generally |
|||
* A comment applied to another annotation (forming a threaded discussion) |
|||
* A rating or recommendation |
|||
* A copyedit intended for the author |
|||
* ''No'' comment, but simply the highlighting of a range of text or a pointer to something in a PDF (indicating a vague sense of "this is important or interesting") |
|||
As a result there are many optional aspects to an annotation -- the comment text is optional, the text range is optional, tags are optional, ratings are optional, etc. |
|||
=== Ratings and tags === |
|||
[[Del.icio.us]] is a quick example. |
|||
=== Inline comments and notes === |
|||
Heat maps such as '''co-ment''' and '''stet''', like [http://svnbook.red-bean.com/en/1.0/re02.html svn blame] for text, allows a quick overview of thousands of granular comments within the context of a larger work. |
|||
=== Reviews === |
|||
See '''[[content stamping]]''' for a specific kind of annotation that supports reviewing. |
|||
Other reviews include traditional Reviews : long essays on a reasonably long work. |
|||
== Desired Features == |
== Desired Features == |
||
=== Aggregation === |
=== Aggregation === |
||
It is useful to aggregate annotations. In the simplest case, we want to retrieve annotations from several sources. |
|||
Automatically aggregated annotations can also be useful. An aggregator may pull together annotations from many sources and republish a selection of the annotations. For example, the aggregator may drop what it judges to be spam, or only republish what it judges to be the most interesting annotations. |
|||
=== Querying === |
|||
A standard method of querying annotation feeds is necessary for the interaction of aggregators and clients. We identified the following aspects of annotations where querying would be useful: |
|||
# '''Annotation title''' |
|||
# '''Annotation body''' |
|||
# '''Target URL''' - Clients query using with this term to find annotations for a specific URL. |
|||
# '''Target Content-Type''' - Useful for differentiating between annotations on images, videos, text, etc. |
|||
# '''In-reply-to''' - Return annotations replying to an annotation. |
|||
# '''Author''' - Find annotations from an author. E-Mail and name. |
|||
# '''Updated/creation date''' - Show entries updated or created during specific time periods. |
|||
# '''Feed''' - Show entries from a specific origin feed. |
|||
=== Specifying the Content Being Annotated === |
|||
We weren't able to find any existing protocols for specifying target content, so we identified the two main use cases: |
|||
# Annotating a page as a whole (Digg-like). |
|||
# Annotating specific sections of a page. |
|||
These are of course related. |
|||
By specifying the original publishing URL of the entry as the annotaiton target, one can ''annotate an annotation''. |
|||
=== Threading === |
|||
[http://www.ietf.org/rfc/rfc4685.txt RFC4685] covers ATOM threading in detail. |
|||
=== Rating === |
|||
A simple optional value between 0 and 5 indicating the posters rating of the target. |
|||
We settled on adding an <tt><ann:rating>N</ann:rating></tt> equivalent, which gives a user rating for the target page. |
|||
[http://microformats.org/wiki/hreview hReview] was considered, but it seemed overkill for simply adding a rating. But a possible idea from hReview: a rating on a category could be used, like <tt><category term="history" ann:rating="5" /></tt>, to indicate a rating for some particular kind of criteria (e.g., this is a very good history text). |
|||
=== Tagging/Categorisation === |
|||
Tagging/categorisation is not fundamental to annotation, but the advantages it brings to the exploration and discovery of new content are significant and worthwhile. |
|||
There are [http://microformats.org/wiki/rel-tag several] [http://microformats.org/wiki/xfolk tagging] formats. We couldn't identify any significant advantage of using these formats over the <tt>atom:category</tt> element. [http://plasmasturm.org/log/452/ Others] have a similar opinion, though obviously there is no consensus. |
|||
=== Publishing === |
|||
=== Viewing Annotations === |
|||
When annotations are separate from the underlying work, one can see a constellation of notes from many people. A few views which we want to readily support: |
When annotations are separate from the underlying work, one can see a constellation of notes from many people. A few views which we want to readily support: |
||
* no comments |
* no comments |
||
Line 15: | Line 96: | ||
* new comments |
* new comments |
||
We also want to limit the types of annotation viewed to an area of interest: |
|||
* |
* Point-and-click annotation associated with a spot on an image or page |
||
* |
* Selection annotation associated with a string in a document or region in an image |
||
* |
* Block annotation associated with a paragraph or block in a document or region in an image |
||
* |
* Document-level annotation such as tags or [[content reviews|reviews]] |
||
== Annotation in Browse == |
|||
[[Category:Content ideas]] |
|||
(See also the [[Talk:Annotation#in Browse|discussion on the talk page]].) |
|||
Browse can use plugins to view pdfs and media files. At the same time, it can track annotations made during that interaction, and can store the last point or page viewed or read. This should be stored somehow in the Journal, and available on resuming that interaction with the same file. |
|||
== API == |
|||
: Question : should you be able to annotate a document and store the annotation locally when you don't have the document at hand and only saw it in passing? If so, how? |
|||
=== Atom === |
|||
: Question : is there a reasonably reliable way to have at hand a set of related annotation even when looking at a different but similar file? Say two editions of the same work, a later revision of the same image or page, &c. This depends on how flexibly documents are identified (whether there is a metric on identification to allow a notion of similarity between docs) and how flexibly annotations are linked to specific parts of documents (whether their validity is clear when the original subpart they refer to changes or disappears). |
|||
(Per discussion with Ian Bicking and Joshua Gay) |
|||
A proposal for an Atom representation of an annotation based on the [http://www.atomenabled.org/developers/syndication/atom-format-spec.php Atom syndication standard]. |
|||
= Implementation ideas = |
|||
Here's an example from that document: |
|||
== API Proposals == |
|||
Here are two proposals. |
|||
# [[Original Annotation API Proposal]] by Ian Bicking and Joshua Gay |
|||
<?xml version="1.0" encoding="utf-8"?> |
|||
# [[Comment Anywhere Annotation Protocol Proposal]] by Alec Thomas and Alan Green |
|||
<feed xmlns="http://www.w3.org/2005/Atom"> |
|||
<title>Example Feed</title> |
|||
<link href="http://example.org/"/> |
|||
<updated>2003-12-13T18:30:02Z</updated> |
|||
<author> |
|||
<name>John Doe</name> |
|||
</author> |
|||
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id> |
|||
<entry> |
|||
<title>Atom-Powered Robots Run Amok</title> |
|||
<link href="http://example.org/2003/12/13/atom03"/> |
|||
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id> |
|||
<updated>2003-12-13T18:30:02Z</updated> |
|||
<summary>Some text.</summary> |
|||
</entry> |
|||
</feed> |
|||
=== XSS Security === |
|||
An annotation is a single Atom entry: |
|||
<entry |
|||
xmlns:ann="http://wiki.laptop.org/go/Annotation"> |
|||
<title><!-- stupid title has to exist, but we don't have to actually |
|||
put in text --></title> |
|||
<link href="{document being commented on}" /> |
|||
<id>urn:uuid:blahblahblah</id> <!-- ID for *this* entry --> |
|||
<updated>YYYY-MM-DDTHH:MM:DD</updated> |
|||
<source>{uri}</source> |
|||
<content type="html"> |
|||
Delete this term |
|||
</content> |
|||
<category term="copyedit" scheme="http://laptop.org" /> |
|||
<category term="{tag}" scheme="?" /> |
|||
<ann:selected-text>Some text that was highlighted</ann:selected-text> |
|||
<ann:pointer>/html[1]/body[1]/div[2]</ann:pointer> <!-- page-wide comments and annotations would not have a pointer --> |
|||
<author> |
|||
<uri>{open ID URI}</uri> |
|||
</author> |
|||
</entry> |
|||
We will be injecting other people's HTML into content. We must be sure this HTML does not contain dangerous stuff, like Javascript that itself calls XMLHttpRequests. We must be sure to scrub the HTML carefully. It is difficult to do this in Javascript, but that would be most secure (on the client when loading the comments). We could require XHTML, embedded in the Atom, to do this. Or, we could rely on server-side filtering of the HTML. |
|||
Entries can be posted with the Atom Publishing Protocol [http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-14.html another IETF standard]. |
|||
This is based on a APP Collection. This is some base URI, e.g.,: |
|||
=== References === |
|||
http://localhost/APP/ |
|||
* Server-side HTML filtering in [http://codespeak.net/svn/lxml/branch/html/src/lxml/html/clean.py lxml.html.clean] -- [[User:Ian Bicking|Ian Bicking]] |
|||
You POST to this URI with this entry as the content body. The server will respond with a Location header that indicates where the entry has been placed. This value will be put into the <code><source></code> tag, indicating where the entry is now stored. Later updates to the comment are done by PUTting to this URI, with the new entry. Removing the comment is done by DELETE'ing the URI. |
|||
* We're working on a Atom store for tagging (a related kind of annotation) called [https://svn.openplans.org/svn/TaggerStore/trunk TaggerStore] -- it's in an early stage still -- [[User:Ian Bicking|Ian Bicking]] |
|||
When you do <code>GET /APP/</code> you will get an Atom feed. This is basically a set of entries enclosed in a <code><feed></code> element. |
|||
= Prior Work = |
|||
When you load a page, you would load any comment feeds in which you have indicated interest. This would happen asynchronously -- locally-hosted or cached comments will come up quickly, but potentially other comments would come up more slowly. This also involves fetching data from other domains, which is currently barred in Javascript with XMLHttpRequest; we'll have to create an exception to the permissions. |
|||
There's a lot of prior work in this area which is worth learning from. For example: |
|||
==== Threading Comments ==== |
|||
Straightforward document commenting interfaces |
|||
Annotations can have comments; effectively annotations on the annotation. This can be done using the [http://www.ietf.org/rfc/rfc4685.txt in-reply-to extension]. Lets say you leave a comment on the previous example entry, you'd have: |
|||
* Stet (used to display [http://gplv3.fsf.org/comments/gplv3-draft-2.html comments on the GPL v3 draft]) |
|||
* [http://www.djangobook.com/about/comments/ Django book comment system] |
|||
Annotation systems |
|||
<entry |
|||
* [http://www.w3.org/2001/Annotea/ Annotea] |
|||
xmlns:thr="http://purl.org/syndication/thread/1.0"> |
|||
<title></title> |
|||
<link href="{source uri from other entry; or to original document?}" /> |
|||
<id>urn:uuid:04f9820d03</id> <!-- ID for *this* entry --> |
|||
<thr:in-reply-to ref="urn:uuid:blahblahblah" |
|||
href="{source uri from other entry}" |
|||
type="application/atom+xml" /> |
|||
<updated>YYYY-MM-DDTHH:MM:DD</updated> |
|||
<content type="html"> |
|||
This annotation is dumb. |
|||
</content> |
|||
<author> |
|||
<uri>{open ID URI}</uri> |
|||
</author> |
|||
</entry> |
|||
Complete transliterature projects and descriptions |
|||
==== Redundancy and Threading ==== |
|||
* [http://xanadu.com Project Xanadu] / [http://transliterature.com Transliterature] / Transquoting |
|||
*: Good motivation and wild diagrams, for a quite comprehensive reworking of links between texts and metadata and annotations. |
|||
Annotation scripts |
|||
If an annotation is attached to another annotation that isn't available (maybe it hasn't been uploaded, maybe it's been deleted, maybe it's private, etc), then the tree-like threading of the comments starts to fall apart. To make this more reparable, some redundant information will be included in the Atom entry. A link to the most-parent document will be retained through all entries. In the case of an annotation that is attached to a specific piece of text, that text will also be copied. Then the UI may place the orphaned comment someplace appropriate. The orphaned comment should still ''look'' lost -- it's not in its proper context, and ideally it would be moved or deleted or edited to make it more appropriate. |
|||
* [http://www.geof.net/code/annotation Annotation], Commentary |
|||
Specific metadata-gathering projects |
|||
=== XSS Security === |
|||
* [http://bitzi.com/bitpedia/ Bitzi Bitpedia] |
|||
*: What is most relevant here? Their [http://bitzi.com/about/metadata readings] don't indicate much of substance to learn from, and though they seem to care about matching files to specific fingerprints in an intelligent way and to have some academic good intentions, I don't see any interfaces that allow for finding or clustering related works or versions of the same work, and little success in dealing with comments reviews and similar annotations. (Plus their actual implementation is crippled by ads.) |
|||
* Open Library & Wikicite |
|||
[[Category:Annotation]] |
|||
We will be injecting other people's HTML into content. We must be sure this HTML does not contain dangerous stuff, like Javascript that itself calls XMLHttpRequests. We must be sure to scrub the HTML carefully. It is difficult to do this in Javascript, but that would be most secure (on the client when loading the comments). We could require XHTML, embedded in the Atom, to do this. Or, we could rely on server-side filtering of the HTML. |
|||
[[Category:Content ideas]] |
Latest revision as of 06:12, 17 December 2008
We want to support annotation of any document, in a generalized way that can be supported by a unified aggregation and sharing system (where annotations/comments are similar to other objects in the object store). Media that should support annotation include documents and images; perhaps also any webpage or item viewed through a browser. In the extreme one can imagine adding notes to any moment in time using a laptop; associated as well as possible with a specific item with its own identifier, or a specific activity, or at least a combination of timestamp and screenshot and context.
We should support elegant libraries for displaying aggregated notes; levels of publicity (and perhaps ways to change this after the fact for clusters of notes) and ways to highlight annotations and reviews as they take place.
In October 2008, there was discussion on sugar@ about annotation in Browse in particular, and its interaction with the Journal.
Types of annotation
An annotation is any kind of data imposed onto another page/document/object. Generally you do not need the permission of the author to add these comments or discussion. You may share your annotations with other users, or they may be private.
An annotation may be:
- A comment that applies to a specific range of text
- Something directed at a coordinate location in a PDF or image
- A comment applied to a document generally
- A comment applied to another annotation (forming a threaded discussion)
- A rating or recommendation
- A copyedit intended for the author
- No comment, but simply the highlighting of a range of text or a pointer to something in a PDF (indicating a vague sense of "this is important or interesting")
As a result there are many optional aspects to an annotation -- the comment text is optional, the text range is optional, tags are optional, ratings are optional, etc.
Ratings and tags
Del.icio.us is a quick example.
Inline comments and notes
Heat maps such as co-ment and stet, like svn blame for text, allows a quick overview of thousands of granular comments within the context of a larger work.
Reviews
See content stamping for a specific kind of annotation that supports reviewing.
Other reviews include traditional Reviews : long essays on a reasonably long work.
Desired Features
Aggregation
It is useful to aggregate annotations. In the simplest case, we want to retrieve annotations from several sources.
Automatically aggregated annotations can also be useful. An aggregator may pull together annotations from many sources and republish a selection of the annotations. For example, the aggregator may drop what it judges to be spam, or only republish what it judges to be the most interesting annotations.
Querying
A standard method of querying annotation feeds is necessary for the interaction of aggregators and clients. We identified the following aspects of annotations where querying would be useful:
- Annotation title
- Annotation body
- Target URL - Clients query using with this term to find annotations for a specific URL.
- Target Content-Type - Useful for differentiating between annotations on images, videos, text, etc.
- In-reply-to - Return annotations replying to an annotation.
- Author - Find annotations from an author. E-Mail and name.
- Updated/creation date - Show entries updated or created during specific time periods.
- Feed - Show entries from a specific origin feed.
Specifying the Content Being Annotated
We weren't able to find any existing protocols for specifying target content, so we identified the two main use cases:
- Annotating a page as a whole (Digg-like).
- Annotating specific sections of a page.
These are of course related.
By specifying the original publishing URL of the entry as the annotaiton target, one can annotate an annotation.
Threading
RFC4685 covers ATOM threading in detail.
Rating
A simple optional value between 0 and 5 indicating the posters rating of the target.
We settled on adding an <ann:rating>N</ann:rating> equivalent, which gives a user rating for the target page.
hReview was considered, but it seemed overkill for simply adding a rating. But a possible idea from hReview: a rating on a category could be used, like <category term="history" ann:rating="5" />, to indicate a rating for some particular kind of criteria (e.g., this is a very good history text).
Tagging/Categorisation
Tagging/categorisation is not fundamental to annotation, but the advantages it brings to the exploration and discovery of new content are significant and worthwhile.
There are several tagging formats. We couldn't identify any significant advantage of using these formats over the atom:category element. Others have a similar opinion, though obviously there is no consensus.
Publishing
Viewing Annotations
When annotations are separate from the underlying work, one can see a constellation of notes from many people. A few views which we want to readily support:
- no comments
- my own comments
- comments from a group (myself/class/teachers)
- all comments
- new comments
We also want to limit the types of annotation viewed to an area of interest:
- Point-and-click annotation associated with a spot on an image or page
- Selection annotation associated with a string in a document or region in an image
- Block annotation associated with a paragraph or block in a document or region in an image
- Document-level annotation such as tags or reviews
Annotation in Browse
(See also the discussion on the talk page.)
Browse can use plugins to view pdfs and media files. At the same time, it can track annotations made during that interaction, and can store the last point or page viewed or read. This should be stored somehow in the Journal, and available on resuming that interaction with the same file.
- Question : should you be able to annotate a document and store the annotation locally when you don't have the document at hand and only saw it in passing? If so, how?
- Question : is there a reasonably reliable way to have at hand a set of related annotation even when looking at a different but similar file? Say two editions of the same work, a later revision of the same image or page, &c. This depends on how flexibly documents are identified (whether there is a metric on identification to allow a notion of similarity between docs) and how flexibly annotations are linked to specific parts of documents (whether their validity is clear when the original subpart they refer to changes or disappears).
Implementation ideas
API Proposals
Here are two proposals.
- Original Annotation API Proposal by Ian Bicking and Joshua Gay
- Comment Anywhere Annotation Protocol Proposal by Alec Thomas and Alan Green
XSS Security
We will be injecting other people's HTML into content. We must be sure this HTML does not contain dangerous stuff, like Javascript that itself calls XMLHttpRequests. We must be sure to scrub the HTML carefully. It is difficult to do this in Javascript, but that would be most secure (on the client when loading the comments). We could require XHTML, embedded in the Atom, to do this. Or, we could rely on server-side filtering of the HTML.
References
- Server-side HTML filtering in lxml.html.clean -- Ian Bicking
- We're working on a Atom store for tagging (a related kind of annotation) called TaggerStore -- it's in an early stage still -- Ian Bicking
Prior Work
There's a lot of prior work in this area which is worth learning from. For example:
Straightforward document commenting interfaces
- Stet (used to display comments on the GPL v3 draft)
- Django book comment system
Annotation systems
Complete transliterature projects and descriptions
- Project Xanadu / Transliterature / Transquoting
- Good motivation and wild diagrams, for a quite comprehensive reworking of links between texts and metadata and annotations.
Annotation scripts
- Annotation, Commentary
Specific metadata-gathering projects
- Bitzi Bitpedia
- What is most relevant here? Their readings don't indicate much of substance to learn from, and though they seem to care about matching files to specific fingerprints in an intelligent way and to have some academic good intentions, I don't see any interfaces that allow for finding or clustering related works or versions of the same work, and little success in dealing with comments reviews and similar annotations. (Plus their actual implementation is crippled by ads.)
- Open Library & Wikicite