UBIFS on XO
Executive Summary
An overview of the UBIFS file system and the project to use it as a replacement to JFFS2. UBIFS and JFFS2 are file systems appropriate for MTD (memory technology device) such as the NAND flash in the XO.
Problem Statement
The current file system used on the XO ([JFFS2]) was developed in the days of 32/64MiB NOR flash devices and does not scale well to the 1GiB NAND devices we are using today and certainly will not deal well with larger devices in gen 1.5 and gen 2 systems. The main issues we have seen with JFFS2 are as follows:
- JFFS2 needs to scan all blocks to mount the filesystem and the time do so scales linearly as the file system fills up. This can can result in up to 30s mount times on very full devices. (Note: Need to quantify this)
- The JFFS2 garbage collection algorithm increasingly consumes CPU cycles as the system fills up to the point of making the system unusable.
- JFFS2 does not provide a method to enable/disable compression on a per-inode basis leading to the CPU wasting cycles in an attempt to compress already compressed data such as multimedia (mp3, mpg4, ogg) and web-downloaded data (activity bundles, tar.gz, etc).
In the time since the XO was initially developed, several NAND file system alternatives have matured to the point of being considered viable alternatives including YAFFS2, UBIFS, AXFS, and the currently in development BtrFS. UBIFS is primarily of interest for the XO b/c it is upstream, has an active community of users, and is actively maintained/funded by a major cell communications device vendor. AXFS is not upstream and may not meet our needs. BtrFS is close to being integrated into kernel.org but as experimental, in-development code. YAFFS2 is deployed in quantity but the maintainer has not expressed interest in pushing it upstream and it does not support compression.
UBI and UBIFS Overview
Unlike JFFS2, which sits on top of the kernel's block layer via the mtdblock driver, UBIFS sits on top of the Unsorted Block Image ([1]) layer which does not present itself as a block device. UBI was designed to handle wear-leveling and bad blocks on modern NAND devices efficiently and to release the file system above from any knowledge of these operations. UBI deals with flash in units of Physical Erase Blocks (PEBs).
Erase Block Mapping and Wear Leveling
An MTD device may contain one or more UBI volumes, each of which may hold a UBI aware filesystem (UBIFS), a non-UBI flash file system (JFFS2, YAFFS, CRAMFS), or simply a static binary data. In the case of non-UBI filesystems, UBI provides an MTD glue layer that exports the UBI device as an MTD device. Each volume within the device appears as a set of contiguous Logical Erase Blocks (LEBs) that are mapped and remaped to any PEB on the device. In the following example, a MTD device has been divided into ROMFS, JFFS2, and UBIFS volumes and as the red arrows show, the file system access to a given LEB are remapped throughout the device. By mapping PEBs this way, UBI can implement wear leveling across the whole device. With a pure MTD approach, where the device was simply broken up into 3 MTD partitions, each with a file system on top, each region of the device would be managed as if it were a physically different device. If one FS was erasing much more than another, its region of flash would run into erase limits while the rest of the flash would still be usable.
Bad Erase Block Handling
In addition to cross device wear leveling, UBI also handles bad blocks transparently by reserving a pool (1% by default) of the available PEBs on a device for this purpose. If a bad EC is found, UBI will transparently move the contents to one of the reserved blocks.
Write Back Support
Unlike JFFS2, UBIFS supports write-back operation (enabled by default). This has performance benefits as it allows the system to delay I/O access so they are not on the critical path; however, it changes the way the underlying filesystem behaves from an application developer's perspective. System level programmers are encouraged to read the documentation and this email post.
Compression
UBIFS, like JFFS2, supports LZO and ZLIB compression. Unlike JFFS2, it can be enabled and disabled on a per-inode basis via a "chattr -c" shell command or a call to the FS_IOC_GETFLAGS ioctl.
UBIFS Testing
Even with the known issues in JFFS2, changing the underlying file system on the XO requires much testing before it can be deployed. Specifically, the following areas need to be looked at:
- Performance
- How does the file system perform under our usage scenarios. This includes raw I/O performance as well as CPU overhead, memory footpring, and boot time. How do these numbers change across various UBI and UBIFS options such as write back, compression, % of PEBs reserved for bad block handling, etc.
- Compliance
- Run LTP and other compliance tests to ensure that UBIFS behaves as expected to higher levels of the software stack.
- Reliability
- Run full life cycle I/O simulations and other stress tests to see how well it handles. How does UBI, UBIFS, and the rest of the stack behave once the device starts degrading? How does it act on power failure?
- Power Draw
- How does running our usage patterns and above tests on UBIFS compare to JFFS2 in terms of power draw?
UBIFS Impact
If it is determined that UBIFS can indeed replace JFFS2, the XO software stack will need several changes:
- Partitioning
- The current XO laptop is using the whole MTD device as a single JFFS2 partition but this will have to change. OFW does not support UBIFS so files it needs for boot need to be in a JFFS2 or ROMFS partition.
- File System Layout
- The current file system layout will have to change as documented in Early boot and tools that update fs content such as olpc-update will need to be modified to be aware of the new partition scheme. In addition, the method to deal with alternate boot for both the XO and OFW will have to change.
- Build
- The build system will need to be able to build UBIFS images.
- Upgrade tools
- We may need to provide a way to transition an existing JFFS2 based install to UBIFS when updating from 8.2.0 to 9.1.0. We may not want to do this as we will loose JFFS2 wear leveling information unless we can translate that to something that UBI can use. Need to analyze pros and cons on both options.
Next Steps
- Make minimal changes required to get bootable UBI images out of build system.
- Flesh out test items and start executing tests to make informed decision on whether to switch or not.
- Outline full set of software stack changes in detail and implement them if we decide to switch.