Experiments with unordered paths: Difference between revisions

From OLPC
Jump to navigation Jump to search
(minor edits)
(→‎Random Links: link to journal proposal and tagged journal)
Line 26: Line 26:


== Random Links ==
== Random Links ==
* [http://lists.laptop.org/pipermail/sugar/2008-September/008599.html Tagged Journal Proposal], based on this work
* [http://lists.laptop.org/pipermail/sugar/2008-September/008432.html Earlier Ephiphany discussion] (thanks, Eduardo!)
* [http://plg.uwaterloo.ca/~claclark/fast2005.pdf Security implications of search]
* [http://plg.uwaterloo.ca/~claclark/fast2005.pdf Security implications of search]
* [http://www.perl.com/pub/a/2003/02/19/engine.html?page=2 Building a vector space search engine in perl]
* [http://www.perl.com/pub/a/2003/02/19/engine.html?page=2 Building a vector space search engine in perl]

Revision as of 17:39, 26 September 2008

The Journal -- and many "Web 2.0" applications -- are built around the idea of tag search. In discussions about extending the Journal to more traditional file management tasks -- how should mounted USB keys appear in the Journal? how should the Journal appear if mounted as a filesystem -- I have always taken as an article of faith that "ordered tags" would be necessary to translate the directory tree metaphor into tag search. In filesystems, a/b is not the same file as b/a; in tag sets "a b" is exactly the same search as "b a".

I was challenged by Eben and Eduardo, among others, who were unconvinced by my intuition that ordering was important in filesystem paths. Their intuition told them that additional context was all that was necessary -- additional tags in the search. Sure Bach/Disc1 was a different directory from Beethoven/Disc1, but it was the "Bach" and "Beethoven" tags which were important, not the ordering. Bach/Disc1 and Disc1/Bach might be the same thing, and that's okay.

I decided to actually do the experiment. I wrote a short script which went through all the files on my laptop -- crammed to the brim with stuff from the past decade, legacy code, various organizational strategies -- and try to prove that path component ordering was important. Surely this search would come up with some compelling examples of different directories that were identical if you ignored the order of the path components.

My first search found no ambiguities. My mind exploded.

...

Later, I found a bug in my script. Now I could find a handful of existing directories that were made ambiguous by ignoring the path ordering, but nothing compelling. Only 21 such directories in among the 900,000 files present in my home directory! It turns out that repeated components are important -- x/y/x is different than x/y -- but not ordering.

Further more, only about 3 unique tags were necessary to reach any directory in my home. Instead of:

$ cd ~/Projects/OLPC/git/sugar-toolkit/sugar/graphics

I ought to be able to use the tags "OLPC graphics" instead -- much shorter!

On this page I will collect some of my further experiments with "unordered paths", attempting to get some experience using a system structured in this fashion to inform the redesign of the Journal for 9.1.0.

To come:

  • A "cd" replacement that uses tags instead of paths, implements intelligent tab-completion, and offers suggestions for how to reach places faster in the future.
  • Links to Eduardo's walkthrough of the "dynamic tag" system in Epiphany, and how that might inform the next-gen Journal
  • Implementing fast tag search and completion
  • What this might look like as a filesystem
  • Security considerations in an world with unordered paths (User:Mstone ought to help here!)
  • Statistics and experience reports!

Random Links