Journal and Overlays

Revision as of 16:36, 1 June 2007 by Mcfletch (talk | contribs) (School Server Issues)
Jump to: navigation, search

This is the beginning of working up how to integrate Union File Systems and the Journal. Using AUFS-type overlays it should be possible to support both Sugar-specific and legacy applications in a fairly natural way as far as integrating into the Journal mechanism.

Journal Population (automated)

on start (pre-launch) create an overlay tree like so:

  • core-system image
  • application's installation image
  • opened file r/o image (empty initially)
  • current r/w image (empty initially)

on individual file open:

  • if present on the system already
    • hard link the individual file from origin into r/o image
  • otherwise acquire and store in the r/o image directory
  • file name choice
    • auto-resolve (original + suffix if necessary)?
    • or on conflict choose overwrite/rename (allows you to "revert" to a previous version of the file)

on open journal (project name-space):

  • load all files from journal record in r/o layer of image
    • allow for conflict resolution
  • then allow selection of the particular file to open

on file save:

  • metadata entry optional (tags for journal)
  • stores as a name within the journal's "project space"
  • no special instrumentation required for legacy applications

on application close (handled by overlay/chroot manager):

  • create Journal entry (with application information)
    • record referenced file uris (files loaded into space)
    • record (real, underlying) filenames in journal for new/changed files
    • do not move or alter files on disk
  • does cut/copy and paste produce a "reference"?
  • allow Journal notes/metadata and the like

Backup of Journal

on backup

  • record backup uri/uuid for the file + local file
  • record encryption key for file (encrypted w/ the uri)
  • upload encrypted package with metadata and changed files
  • record backup (tentative) flag (should have a signed return-receipt from the top-level backup server before we trust that it's really "safe")

Sharing Journal Entries

The Journal should allow for sharing individual files or Journal entries (project name-spaces) with individuals, groups (e.g. classes or groups of friends or project-mates) or the whole world.

Sharing Operation

  • if local, share file data + uri + document key
  • if not local, share uri + document key
  • default no share
    • "class" share option (need to find your class)
  • sign request to share for backup
  • encrypt sharing certificate for target users if not public
  • what about the underlying files?
    • do we want one uuid and key per file? yes
    • select which files to share rather than the whole environment
  • journal entry is itself a sharable file
    • share referenced resources? (when we own them) question on share
    • what about derivative works? Does the original source need to sign off on sharing?
  • publish sharing certificate via whatever means
    • email
    • direct share via IM
    • publish on server
      • personal log channels (tags)
    • references fields
    • pass to someone else to forward


  • references field in certificate
  • normally text payload, but allow any content type
  • share resource as normal

Blog/Syndication Interfaces

  • display public postings
  • allow filter by tag and date
  • comments inserted by web "user" into separate account with references fields
  • private comments shared directly with user

Server Considerations

Backup Issues

  • needs to know who, how much,etc
  • should only give to:
    • backup server (signed cert)
    • if desperate, someone who sees a backup server regularly
    • requires a mechanism to track connections

School Server Backup Issues

Small issues needing to be decided for the school server.

  • how to signal laptop that we are ready to go?
    • don't want 300 uploads at 08:00 and none at 08:01
    • especially as users will be starting working
    • do backup when we have local bandwidth spare (prefer, but get it done within X time)
  • how to prioritize uploads/downloads?
    • how urgent is your upload?
    • how much have you transferred already?
  • how to serve files for sharing/cache?
    • https server?
    • uploads as https file uploads?
  • quotas need to be dynamic
    • on cache full no more accepted
    • who maintains on failure?
  • "I have 25MB for backup, what do you want to store?"

Download Issues

  • signed sharing cert
  • uuid of already shared (fail if not avail)
  • public-log-view
  • uuid needs to include account information

Global Access Mechanism

  • where does the root server go to get the data for a request?
    • same place it went to put it in, it must have the password to the gmail account to be recording the data in the first place
    • the fact that one server is storing those passwords is a possible issue
  • filter out those for whom it doesn't sharing certificates
  • every machine should have their data backup account password
    • central server for "outside" access
    • (plus) every machine have access software for self-account

Teacher's View of Logs

  • that shared with the teacher (just as with any other person)
  • if not public it's private, so the child doesn't automatically give access to the teacher
  • comments to child are text notes in the teacher's journal shared with child
  • marks and the like may or may not be shared
  • default tag-set for each course (to allow children to subscribe to a given rss feed for each course)
  • default "friends" group for each course

Scarce-resource Optimizations

We can't back up everything forever. Students have 2 or 3 GB on the GMailFS and they can produce GB files with the video camera. Need some mechanisms for letting the Journal specify when to use a different strategy for backup and retention.

That same mechanism needs to be able to operate on the laptops themselves so that they know what to retain when they run out of space and what to prioritize when uploading backups.

Alternate Storage Strategies

  • some simple way to suggest a particular strategy
    • no-backup/backup-until/backup-until-read-by
  • server-level
    • backup when possible, but drop "replaced" versions of files when full
  • default in certain types to quick deletion or until-read
    • e.g. for voicemail, IM the default would be to drop relatively easily
    • what about abuse potential (share a nasty-gram and have it disappear after X period)?
  • take file size into account for choosing a backup-format strategy
    • diff-based storage (really big files that change by adding (e.g. traditional mbox files))
    • replacing storage (really big files that are whole-state storage)
    • versioned storage (default, small files that change substantially)
  • timeout hints
    • automated stale-dating of data-flows, such as RSS feeds and the like, dropped unless marked important and/or referenced/published

Personal Importance Level

Not all files are created equal. The picture of your foot you shot when you were bored is less important than the picture of your baby brother's first steps. The system cannot guess at this, so if we get into scarce-resource situations we'd like to have a hint as to how important something is.

  • automatically give precedence to last version of same resource within my uri space
    • (saved to same file as reference)
  • references increase priority if not explicitly set

Journal Based File Open

  • choose whole journal entry (namespace)
    • then choose file from it
  • browse into entry to choose single file to import into new space
  • browse journal to find old versions
  • filter journal to find items (tag, type, text, etc)