Union File Systems

From OLPC
Jump to navigation Jump to search

The plan in BitFrost is to use a Union File System (likely AUFS) in order to provide recovery and similar operations. This is a straw-man proposal describing a use-pattern for AUFS within BitFrost. See also Journal and Overlays for discussion of using a Union File System to support Journal operations for activities.

In all cases, unless otherwise stated the trees would be stored on the Flash drive (as regular directories). I'm here allowing for multiple-machine-user scenarios to hopefully get researchers outside of the project interested in working on the system.

Experience would suggest that libraries, services and applications are often requiring updates to patch holes in their security (and to extend functionality). Further, those updates occasionally fail and cause damage themselves. The approach here proposes usage of the union file system to allow for reasonably frequent updates even to core systems and to allow for recovery from a botched update.

Core System Software Tree

System software in this sense includes everything required to launch a new application, from the initrd loader on the boot partition up through the Sugar Shell (application launching user interface).

The primary goal here is to allow for stable "rollback" to previous versions of the system. The normal system-updating code would have to initiate updates using a protected API for creating new system overlays.

Note: The system-updating service will need to be protected by running solely from the root file-system (no update images involved). As such the system-updating service will need to be very stable.

  • r/o base image (P_SF_CORE protected images)
  • r/o system COW-generated update images (P_SF_CORE protected images)
    • These would likely be a chain of updates
      • e.g. system security/functionality updates for the laptop could be rolled back in the event of failure
      • Over time we would want to be able to migrate these changes into the r/o base image e.g. system updates > 2 months old would merge down to the base image
    • In other words, update process "SECURITY_UPDATE_2007_05_20" would create a new COW file system branch directly on the core r/o file system, do the updates using RPMs/scripts or whatever, and then remount the COW as a read-only branch
    • Note: the overlay management service must run from the P_SF_CORE area, they should not be run with user overlays to prevent users from overriding the rules built into it until they know the implications
  • r/o "user" (root) system update images (P_SF_RUN protected images)
    • I'd assume we'd want these to use date-based backups (See the user data activity area below for an idea on that)
    • Unclear how this works on a multi-user machine, but I guess that's something for external operations to figure out
    • Unclear whether we can apply these on top of an updated core system; it should be technically possible, but there might be conflicts in the semantics of the result
  • Not sure whether we can have a "sensitive data" overlay for things that should only be readable by the system (the idea being that we simply *not* provide access to that plane to applications' chroot)
    • Thinking of password files, any machine-level encryption keys and the like
    • Bitfrost suggests that we'll only copy in individual libraries that the application claims it needs, but that requires altering the installation mechanisms of the application, where it would be faster and less porting work to simply "wrap" the installers of standard packages. i.e. a standard RPM-based GUI installer (click-and-run) could be wrapped so that it requests creation of the overlay, and runs the standard installer in the chroot

Temporary Storage Area

We will have a temporary data file system in RAM rather than on Flash RAM, probably not unioned, possibly sharable?

Application Installation Image

As per BitFrost, the application will be installed to a read-only file-system overlay on top of the core file-system (minus any "sensitive-data overlay"). This will allow the application to update shared libraries or otherwise perform "invasive" installations without affecting other applications or the system itself.

Will need to address how this applies in multiple-user scenarios. If user K installs an "OpenOffice" which is malicious how does the other user know it is not the OpenOffice they expected?

  • r/o system image
  • r/o installation image
    • Generated during installation using a r/w COW on top of the core system image
      • Allow for new versions of dependencies or other system updates
    • r/o update/patch/plugin images
      • Not sure about utility of these, I think most platforms use full replacement for application updates these days
      • Plugins installation could be handled this way, but it doesn't really address any issues of plugin safety other than being able to remove them cleanly and the registration process would be constrained somewhat...
    • Installation should be *without* access to system's sensitive data overlay if possible

Application Session Images

This section discusses how to produce the run-time chroot images in which a particular session of a particular activity will run. The chroot is intended to implement many of the BitFrost requirements for data-safety while still allowing for legacy applications to be run with a simple recompile for the platform.

Journal Approach

This is the "pure" OLPC approach outlined in Journal and Overlays, it assumes that we can overwrite the standard File Open dialogues in every activity running on the laptop to trigger a Journal browser that will run out-of-process and allow us to select a file for linking into the system.

  • System Software
  • Application Installation Image
    • User Customisation Layer (For IDEs and the like)
  • Read-only referenced-file layer
    • Files/journal environments link referenced values into this layer
  • Read-write created-file layer
    • All file access queries link/generate files in this layer
    • Journal queries might include such things as "all images" then add all of those images to the read-only layer
  • Possible "volatile" layer mounted in a particular area for applications/users to use for volatile files (e.g. databases or similar files that should be backed up only in a single version)

Legacy (Versioned) Approach

This is a less pure approach that does not assume the ability to override all File Open dialogues or the like. Instead of having a "clean" environment for each run of an activity, the activity keeps a collection of all files created by the activity over time in a versioned file-system tree.

This has the distinct disadvantage that the environment tends to get cluttered with things you don't need, while making importing of files into the environment from another application a bit of a pain (non-standard operation).

  • Software: r/o core system (base, security, user overlays), r/o installation image (+ possible update images)
  • Software-as-Data: r/w user customisation image (probably only for "develop"-active activities and likely only per-user)
  • Data: r/o previous state images (data directories), r/w current state image (data directories)
  • r/o previous-state images (versioned)
    • monthly, weekly and daily backup snapshots
    • migration/coalescing over time so that after a week daily snapshots move into a weekly snapshot and after a month the weekly snapshots move into the core image for the activity
  • r/w current-working image
    • preferably instrumented to allow for monitoring changes and registering created/altered files in the journal
      • the file version registered in the journal would, however, only exist until coalescing, so the backup mechanism would need to have encrypted and uploaded the file to the backup service before coalescing occurred
      • in the simple case, when you close an activity, examine the file-system overlay for the session and see if there are files in the r/w overlay, if there are, something has changed, add it to the journal (the problem being activities that are left open for months won't get into the journal in that case, so the "per-day" version overlay replacement would need to do the journaling as well)
    • it would be elegant if we could allow sharing among applications by having the journal-based file selection make a file visible within the current-working image by hard-linking it into the appropriate "previous-state" image
    • That is, when you select a 120MB recording from the "video tutorial activity" in the journal that you want to present to your teacher in the "presentation activity" the journal would find the file-on-disk (or in backups, or whatever) and would hard-link it into the appropriate day's overlay for the *current* application. The current application would then have COW access to the file. Problems occur with naming conflicts in the current directory (particularly where you've already deleted a file of that name in a later session) might need to have a check before copying in and automatic renaming, which could be confusing. Almost want it to be a special r/o layer just below the r/w layer to make it work reliably there.
  • r/w unversioned image (sub-tree, e.g. "~/volatile and /var/volatile")
    • for storage of things like databases where legacy applications are storing history and the like internally in a big file that contains all of the user's work
  • For each user's "profile" (personal data such as general encryption key, backup/log encryption keys, identifying photo and the like)
    • Versioned storage, as for an activity, but only available (directly) to the system's blessed user-profile-editor activity

Services Required

Overlay Manager

About the only software that should run outside the core chroot. This is the software that knows how to create and tear down overlay file systems.

  • Update software from trusted repositories (system images)
    • Generate and register new system overlays for the core system images automatically
      • Should automatically assign a user-friendly descriptive name for the selection box
      • Should automatically assign a directory name
      • Should automatically record date applied
      • Would likely be some form of simple standardised format for describing the overlays along with standard locations for the various overlays to be stored
    • Migrate overlays into core over time
  • Provide "play-spaces" for user overlays
    • Generate and register new user overlays for the core system images
    • Allow user to trigger merges down into base user overlay (or automatically merge if using a time-based version system)
  • Provide introspection/listing tools for finding overlays
    • Provide way to take given list of overlays and request subset of them as a mounted filesystem (to abstract away issues such as using unionfs or aufs)
    • Register user's last-selected set of system/user filesystem overlays (to make them the default for the next time)
  • Activity Installation
    • Register activity (permissions and the like)
    • Create activity's overlay filesystem
    • Install the activity inside the chroot filesystem with permission restrictions
    • Switch the filesystem to read only mode
  • Activity Instantiation
    • Lookup the activity, create the unioned filesystem
      • core system image (including user overlays)
      • r/o referenced file overlay
      • r/w created/altered file overlay (COW)
    • chroot the activity
    • Provide a file-open-request (dbus) for loading journal-based file(s) into namespace
      • Provide linking of files into the activity's r/o referenced file plane

User interface is required as well. During boot, the user needs to be able to escape normal boot sequence to respecify the set of file-planes to enable in the image. UI use cases:

  • To roll back to a previous day's state after playing (and breaking), un-check your last day's change plane
  • To undo a failed official system update, un-check the system update
    • Likely have to automatically un-check the system updates *since* then as well to maintain consistency
    • Some way to configure permanent disabling/removal of a file-system plane, if someone leaves an overlay disabled until it would normally be integrated
    • What do we do with the user's customisations if there's an intermediate layer removed? Can we reliably detect conflicts?

Time-versioned Storage

In cases where we would like a time-versioned storage system, it is possible to construct the image using a series of r/o "previous data" images and a r/w current image. Note, however, that time-versioned storage is probably sufficiently less fine-grained and easy-to-understand than "session-versioned" storage that we probably want to avoid time-versioned as much as possible.

  • R/W COW system should version on write, that is, be available at all times "latest revisions" and create a new COW if there were changes older than X period on the latest revisions
    • System-level mechanism to replace current-day overlay with next-day overlay (if there is anything in the previous-day overlay, otherwise it just becomes the today-overlay)
  • Mechanism to coalesce older overlays into the core image (likely triggered by the backup scripts for the laptop so that we don't lose information when doing the coalescing (only coalesce if the versions that would be lost are already backed up))
    • Likely require ability to do user-triggered coalescing as well, in order to allow for resolving "out-of-space" conditions

See Journal and Overlays for further exploration...

Issues

  • Performance (IIUC we would wind up with dozens of extra "stats" per normal access to a file) and issues related to versioning/semantic conflicts between overlays when intervening overlays are removed
  • "Library Overlays" also pops up (e.g. a numpy overlay that would be shared among dozens of activities, but isn't necessarily a part of the "core" system)
  • Heuristics for cleanup and backup prioritisation (see Journal and Overlays for some discussion of this).
  • Providing a way for applications such as email clients or browsers to have a persistent cross-instance data-store (e.g. the mail repository, or the cookie-set of the browser). Although that introduces a major attack vector, legacy applications will require the functionality.