Experiments with unordered paths: Difference between revisions

From OLPC
Jump to navigation Jump to search
(→‎To come:: link tagcd draft code)
(→‎Random Links: Add links to research)
Line 25: Line 25:
* Statistics and experience reports!
* Statistics and experience reports!


== Random Links ==
== Random links ==
* [http://lists.laptop.org/pipermail/sugar/2008-September/008599.html Tagged Journal Proposal], based on this work
* [http://lists.laptop.org/pipermail/sugar/2008-September/008599.html Tagged Journal Proposal], based on this work
* [http://lists.laptop.org/pipermail/sugar/2008-September/008432.html Earlier Ephiphany discussion] (thanks, Eduardo!)
* [http://lists.laptop.org/pipermail/sugar/2008-September/008432.html Earlier Ephiphany discussion] (thanks, Eduardo!)
Line 32: Line 32:
* [http://lucene.apache.org/java/2_3_2/fileformats.html#Per-Index Files Apache Lucene's index formats]
* [http://lucene.apache.org/java/2_3_2/fileformats.html#Per-Index Files Apache Lucene's index formats]
* [http://web.archive.org/web/20010711201252/hotwired.lycos.com/webmonkey/templates/print_template.htmlt?meta=/webmonkey/97/16/index2a_meta.html Roll your own search engine (in perl)]
* [http://web.archive.org/web/20010711201252/hotwired.lycos.com/webmonkey/templates/print_template.htmlt?meta=/webmonkey/97/16/index2a_meta.html Roll your own search engine (in perl)]
=== Other journal-like interfaces ===
* [http://www.iola.dk/nemo/ Nemo]
* [http://live.gnome.org/PaperBox Paperbox]

=== Desktop search ===
* [http://strigi.sourceforge.net/ Strigi]
* [http://en.wikipedia.org/wiki/Tracker_(desktop_search_software) Tracker]
* [http://www.freedesktop.org/wiki/Specifications/shared-filemetadata-spec Shared file metadata spec]
* [http://www.lesbonscomptes.com/recoll/usermanual/index.html Recoll]
* [http://www.gnome.org/~seth/storage/ GNOME Storage] -- ambitious, and [http://en.wikipedia.org/wiki/GNOME_Storage dead]
* [http://beagle-project.org/About Beagle] (oink)
* [http://www.gnome.org/projects/tracker/faq.html Tracker]
* [[Olpcfs]]
==== Comparisons ====
* [http://www.wikinfo.org/index.php/Comparison_of_desktop_search_software Comparison of desktop search software]
* [http://mail.gnome.org/archives/tracker-list/2007-January/msg00171.html Additional comparison]
* [http://www.freesoftwaremagazine.com/columns/desktop_search_tools_gnu_linux_tracker_recoll_strigi_deskbar Desktop search wars]
=== Object-Relational Mappers ===
* [http://www.sqlalchemy.org/ SQL Alchemy]
* [http://www.sqlobject.org/ SQL Object]
* [http://jystewart.net/process/2008/02/using-the-django-orm-as-a-standalone-component/ Using Django]
=== Database tools ===
* [] [http://xappy.org/docs/0.5/introduction.html Xappy]

Revision as of 21:11, 27 September 2008

The Journal -- and many "Web 2.0" applications -- are built around the idea of tag search. In discussions about extending the Journal to more traditional file management tasks -- how should mounted USB keys appear in the Journal? how should the Journal appear if mounted as a filesystem -- I have always taken as an article of faith that "ordered tags" would be necessary to translate the directory tree metaphor into tag search. In filesystems, a/b is not the same file as b/a; in tag sets "a b" is exactly the same search as "b a".

I was challenged by Eben and Eduardo, among others, who were unconvinced by my intuition that ordering was important in filesystem paths. Their intuition told them that additional context was all that was necessary -- additional tags in the search. Sure Bach/Disc1 was a different directory from Beethoven/Disc1, but it was the "Bach" and "Beethoven" tags which were important, not the ordering. Bach/Disc1 and Disc1/Bach might be the same thing, and that's okay.

I decided to actually do the experiment. I wrote a short script which went through all the files on my laptop -- crammed to the brim with stuff from the past decade, legacy code, various organizational strategies -- and try to prove that path component ordering was important. Surely this search would come up with some compelling examples of different directories that were identical if you ignored the order of the path components.

My first search found no ambiguities. My mind exploded.

...

Later, I found a bug in my script. Now I could find a handful of existing directories that were made ambiguous by ignoring the path ordering, but nothing compelling. Only 21 such directories in among the 900,000 files present in my home directory! It turns out that repeated components are important -- x/y/x is different than x/y -- but not ordering.

Further more, only about 3 unique tags were necessary to reach any directory in my home. Instead of:

$ cd ~/Projects/OLPC/git/sugar-toolkit/sugar/graphics

I ought to be able to use the tags "OLPC graphics" instead -- much shorter!

On this page I will collect some of my further experiments with "unordered paths", attempting to get some experience using a system structured in this fashion to inform the redesign of the Journal for 9.1.0.

To come:

  • A "cd" replacement that uses tags instead of paths, implements intelligent tab-completion, and offers suggestions for how to reach places faster in the future. (draft source code)
  • Links to Eduardo's walkthrough of the "dynamic tag" system in Epiphany, and how that might inform the next-gen Journal
  • Implementing fast tag search and completion
  • What this might look like as a filesystem
  • Security considerations in an world with unordered paths (User:Mstone ought to help here!)
  • Statistics and experience reports!

Random links

Other journal-like interfaces

Desktop search

Comparisons

Object-Relational Mappers

Database tools