UBIFS on XO

From OLPC
Revision as of 05:39, 10 October 2008 by Dsaxena (talk | contribs)
Jump to: navigation, search

Problem Statement

The current file system used on the XO ([JFFS2]) was developed in the days of 32/64MiB NOR flash devices and does not scale well to the 1GiB NAND devices we are using today and certainly will not deal well with larger devices in gen 1.5 and gen 2 systems. The main issues we have seen with JFFS2 are as follows:

  • JFFS2 needs to scan all blocks to mount the filesystem and the time do so scales linearly as the file system fills up. This can can result in up to 30s mount times on very full devices. (Note: Need to quantify this)
  • The JFFS2 garbage collection algorithm increasingly consumes CPU cycles as the system fills up to the point of making the system unusable.
  • JFFS2 does not provide a method to enable/disable compression on a per-inode basis leading to the CPU wasting cycles in an attempt to compress already compressed data such as multimedia (mp3, mpg4, ogg) and web-downloaded data (activity bundles, tar.gz, etc).

In the time since the XO was initially developed, several NAND file system alternatives have matured to the point of being considered viable alternatives. Those of primary interest are:

  • YAFFS2 : This file system has been around for several years and in active use in the embedded space. It does not provide compression out of the box as the devices that use it often carry already compressed data such as MP3 and MPG files. YAFFS is not upstream and the primary developer has not expressed interest in pushing it upstream. While YAFFS has been proven on the field, the lack of compression and the fact that is not upstream are major demerits towards it replacing JFFS2 on the XO.
  • UBIFS: A new file system recently merged into the kernel designed for use on large NAND flash devices. UBIFS supports a per-inode compression flag, fast mount times, power-failure safe updates, and other features that are important for the XO. UBIFS was developed by Nokia and IBM and is actively maintained in the upstream kernel.org tree.

UBI and UBIFS Overview

Unlike JFFS2, which sits on top of the kernel's block layer via the mtdblock driver, UBIFS sits on top of the Unsorted Block Image ([1]) layer which does not present itself as a block device. UBI was designed to handle wear-leveling and bad blocks on modern NAND devices efficiently and to release the file system above from any knowledge of these operations. UBI deals with flash in units of Physical Erase Blocks (PEBs).

Erase Block Mapping and Wear Leveling

An MTD device may contain one or more UBI volumes, each of which may hold a UBI aware filesystem (UBIFS), a non-UBI flash file system (JFFS2, YAFFS, CRAMFS), or simply a static binary data. In the case of non-UBI filesystems, UBI provides an MTD glue layer that exports the UBI device as an MTD device. Each volume within the device appears as a set of contiguous Logical Erase Blocks (LEBs) that are mapped and remaped to any PEB on the device. In the following example, a MTD device has been divided into ROMFS, JFFS2, and UBIFS volumes and as the red arrows show, the file system access to a given LEB are remapped throughout the device. By mapping PEBs this way, UBI can implement wear leveling across the whole device. With a pure MTD approach, where the device was simply broken up into 3 MTD partitions, each with a file system on top, each region of the device would be managed as if it were a physically different device. If one FS was erasing much more than another, its region of flash would run into erase limits while the rest of the flash would still be usable.

UBI Overview

Bad Erase Block Handling

In addition to cross device wear leveling, UBI also handles bad blocks transparently by reserving a pool (1% by default) of the available PEBs on a device for this purpose. If a bad EC is found, UBI will transparently move the contents to one of the reserved blocks.

Write Back Support

Unlike JFFS2, UBIFS supports write-back operation (enabled by default). This has performance benefits as it allows the system to delay I/O access so they are not on the critical path; however, it changes the way the underlying filesystem behaves from an application developer's perspective. System level programmers are encouraged to read the documentation and this email post.

Compression

UBIFS, like JFFS2, supports LZO and ZLIB compression. Unlike JFFS2, it can be enabled and disabled on a per-inode basis via a "chatter -c" shell command or a call to the FS_IOC_GETFLAGS ioctl.