Journal and Overlays

From OLPC
Jump to: navigation, search
  english | spanish HowTo [ID# 271233]  +/-  


This is the beginning of working up how to integrate Union File Systems and the Journal. Using AUFS-type overlays it should be possible to support both Sugar-specific and legacy applications in a fairly natural way as far as integrating into the Journal mechanism.

It should be noted that this proposal rejects some of the assumptions in the OLPC_Bitfrost#P_DOCUMENT Bitfrost document-storage description. In particular, we do not assume that all applications running under Bitfrost are going to use an abstracted file storage API.

Instead we explicitly allow for use of open() using the features of the union file system's Copy-on-Write semantics to provide us with a convenient "trap" holding all written data from a particular session with an application. This should allow for legacy applications to participate more fully in the Journaling process and should thus speed application porting. This same mechanism makes the system more reasonable to deploy outside of the OLPC environment, making it more likely that other systems might adopt the approach in order to improve their own security.

We do, however, still assume that the standard "file open" dialogs in the various GUI environments will be replaced with a stub that causes an out-of-process Journal browsing application to be launched (and focus to switch to that "activity" briefly) which can arrange with the overlay-management system to provide access to the selected files. This is assumed to be less invasive than rewriting applications' file-access operations to use a non-standard file-access mechanism.

Journal Population (automated)

on start (pre-launch) create an overlay tree like so:

  • core-system image
  • application's installation image
  • opened/referenced file r/o image (empty initially)
  • current r/w image (empty initially)

on individual file open:

  • if present on the system already
    • hard link the individual file from origin into r/o image
  • otherwise acquire and store in the r/o image directory
  • file name choice
    • auto-resolve (original + suffix if necessary)?
    • or on conflict choose overwrite/rename (allows you to "revert" to a previous version of the file)

on open journal (project name-space):

  • load all files from journal record in r/o layer of image
    • allow for conflict resolution
  • then allow selection of the particular file to open

on file save:

  • metadata entry optional (tags for journal)
  • stores as a name within the journal's "project space"
  • no special instrumentation required for legacy applications

on application close (handled by overlay/chroot manager):

  • create Journal entry (with application information)
    • record referenced file uris (files loaded into space)
    • record (real, underlying) filenames in journal for new/changed files
    • do not move or alter files on disk
  • does cut/copy and paste produce a "reference"?
  • allow Journal notes/metadata and the like
  • if the application or computer crashes we can recover simply by reading the directory that is left open and adding it to the journal (though likely minus an accurate "end time").

Backup of Journal

on backup

  • record backup uri/uuid for the file + local file
  • record encryption key for file (encrypted w/ the uri)
  • upload encrypted package with metadata and changed files
  • record backup (tentative) flag (should have a signed return-receipt from the top-level backup server before we trust that it's really "safe")

Sharing Journal Entries

The Journal should allow for sharing individual files or Journal entries (project name-spaces) with individuals, groups (e.g. classes or groups of friends or project-mates) or the whole world.

Sharing Operation

  • if local, share file data + uri + document key
  • if not local, share uri + document key
  • default no share
    • "class" share option (need to find your class, same problem as with teacher's "role" setup on day one)
  • sign request to share for backup server (so that the server will allow access to the encrypted bytes)
  • encrypt sharing certificate for target users if not public
  • what about the underlying files?
    • do we want one uuid and key per file? yes
    • select which files to share rather than the whole environment
  • journal entry is itself a sharable file
    • share referenced resources? (when we own them) question on share
    • what about derivative works? Does the original source need to sign off on sharing?
  • publish sharing certificate via whatever means
    • email
    • direct share via IM
    • publish on server
      • personal log channels (tags)
    • references fields
    • pass to someone else to forward

Comments

  • references field in certificate
  • normally text payload, but allow any content type
  • share resource as normal

Blog/Syndication Interfaces

Each child would have a "virtual blog" with the ability to have "channels" or "tags". The whole set would be served off a single server running off of the central storage interface (the gmail gateway). Children would only blog that which they declare "shared with world". There should be UI warnings about sharing too much (education about when/how to blog).

  • display "public" (shared-with-world) postings (with interpretation to make it a regular rss/html file)
  • allow filter by tag and date (to allow the channels)
  • comments would be inserted by web "user" into separate account with references fields, just as if someone with a laptop had commented on the resource
  • private comments shared directly with user, again, as a text document shared with the user by the "web" user

Server Considerations

Backup Issues

  • needs to know who, how much,etc
  • should only give to:
    • backup server (signed cert)
    • if desperate, someone who sees a backup server regularly
      • requires a mechanism to track connections, see University of Toronto System's group (Jing Su's) work on similar operations

School Server Backup Issues

Small issues needing to be decided for the school server.

  • how to signal laptop that we are ready to go?
    • don't want 300 uploads at 08:00 and none at 08:01
    • especially as users will be starting working
    • do backup when we have local bandwidth spare (prefer, but get it done within X time)
  • how to prioritize uploads/downloads?
    • how urgent is your upload?
    • how much have you transferred already?
  • how to serve files for sharing/cache?
    • https server?
    • uploads as https file uploads?
  • quotas need to be dynamic
    • on cache full no more accepted
    • who maintains on failure?
  • "I have 25MB for backup, what do you want to store?"

Download Issues

  • signed sharing cert
  • uuid of already shared (fail if not avail)
  • public-log-view
  • uuid needs to include account information

Global Access Mechanism

  • where does the root server go to get the data for a request?
    • same place it went to put it in, it must have the password to the gmail account to be recording the data in the first place
    • the fact that one server is storing those passwords is a possible issue
  • filter out those for whom it doesn't sharing certificates
  • every machine should have their data backup account password
    • central server for "outside" access
    • (plus) every machine have access software for self-account

Teacher's View of Logs

  • that shared with the teacher (just as with any other person)
  • if not public it's private, so the child doesn't automatically give access to the teacher
  • comments to child are text notes in the teacher's journal shared with child
  • marks and the like may or may not be shared
  • default tag-set for each course (to allow children to subscribe to a given rss feed for each course)
  • default "friends" group for each course

Scarce-resource Optimizations

We can't back up everything forever. Students have 2 or 3 GB on the GMailFS and they can produce GB files with the video camera. Need some mechanisms for letting the Journal specify when to use a different strategy for backup and retention.

That same mechanism needs to be able to operate on the laptops themselves so that they know what to retain when they run out of space and what to prioritize when uploading backups.

Alternate Storage Strategies

  • some simple way to suggest a particular strategy
    • no-backup/backup-until/backup-until-read-by
  • server-level
    • backup when possible, but drop "replaced" versions of files when full
  • default in certain types to quick deletion or until-read
    • e.g. for voicemail, IM the default would be to drop relatively easily
    • what about abuse potential (share a nasty-gram and have it disappear after X period)?
  • take file size into account for choosing a backup-format strategy
    • diff-based storage (really big files that change by adding (e.g. traditional mbox files))
    • replacing storage (really big files that are whole-state storage)
    • versioned storage (default, small files that change substantially)
  • timeout hints
    • automated stale-dating of data-flows, such as RSS feeds and the like, dropped unless marked important and/or referenced/published

Personal Importance Level

Not all files are created equal. The picture of your foot you shot when you were bored is less important than the picture of your baby brother's first steps. The system cannot guess at this, so if we get into scarce-resource situations we'd like to have a hint as to how important something is.

  • automatically give precedence to last version of same resource within my uri space
    • (saved to same file as reference)
  • references increase priority if not explicitly set

Journal Based File Open

  • choose whole journal entry (namespace)
    • then choose file from it
  • browse into entry to choose single file to import into new space
  • browse journal to find old versions
  • filter journal to find items (tag, type, text, etc)