UBIFS initial experiments
Introduction
In this page I document my steps in creating a release 8.2.0 image running on top of UBIFS for the purpose of initial experimentation with that file system and point out some of my initial findings and questions that have popped up. If you are just interested in downloading and running the final image, just jump down to Image Download and Installation.
Flash Layout
Open Firmware does not support reading from UBIFS so to deal with this, the flash is partitioned into a 32MiB JFFS2 partition with the remainder of the space left over for use by UBI. OFW loads the kernel and initrd from the JFFS2 partition and the initrd handles mounting the UBIFS. Note that one flash erase block dedicated to the RedBoot partition table ("FIS directory" in RedBoot speak).
MTD Partition | Location |
---|---|
FIS directory. | 0x00000000-0x00020000 (128KiB) |
boot | 0x00020000-0x02020000 (32Mib) |
system | 0x02020000-0x3ffc0000 (991.785 MiB) |
JFFS2 Partition
The JFFS partition simply contains a boot/ directory with three files: olpc.fth OFW boot script, vmlinuz compressed kernel binary, and olpcrd.img ramdisk image. The partition image was created with the following command:
/usr/sbin/mkfs.jffs2 -n lzo -e 128KiB -r boot/ -o boot_jffs2.img
olpc.fth
The olpc.fth script simply had the following modification applied to pass the proper boot parameters to the kernel and initrd:
@@ -77,7 +77,7 @@ then " nand" dn-buf count sindex 0>= if - " root=mtd0 rootfstype=jffs2" + " ubi.mtd=system root=ubi0:rootfs rootfstype=ubifs" else " root=LABEL=OLPCRoot rootfstype=ext3" then
The parameter "ubi.mtd=system" tells the UBI layer to attach to the named MTD device. This creates a new UBI device, ubi0 which contains the root filesystem volume, called "rootfs" The "root=ubi:rootfs" option tells the kernel to mount this volume. (This parameter is actually unused as the initrd handles the mounting of the root filesystem).
olpcrd
The olpcrd is identical to that in the 8.2 release except for the following change to initutil.py:
@@ -134,11 +134,12 @@ # when partitioned, expect bootpath like: # /pci/nandflash@c:root,\boot\vmlinuz//jffs2-file-system:\boot\vmlinuz if p is None: p = 0 # unpartitioned by default - if type(p) is int: - dev = 'mtd%d' % p - else: - dev = 'mtd:%s' % p - extra = ['-t','jffs2'] + # if type(p) is int: + # dev = 'mtd%d' % p + # else: + # dev = 'mtd:%s' % p + dev = 'ubi0:rootfs' + extra = ['-t','ubifs'] else: # we're running under emulation # these modules only needed if we're running in qemu from stat import S_IFBLK
Note that this is not a permanent solution as we probably want to handle booting the same release on both a JFFS2 and UBIFS layout.
Kernel
The kernel used is available at here. It is composed of the OLPC kernel used for the 8.2 release merged with the linux-2.6.25 UBI backport tree. The kernel is built with UBI and UBIFS linked in as we are moving away from modules for required features to reduce boot time (see LWN article).
UBI Partition
The UBI system partition contains a single UBI volume named rootfs that covers the full partition minus overhead due to UBI overhead.
The UBIFS image is created via the following command:
/usr/local/bin/mkfs.ubifs -m 2KiB -e 124KiB -x lzo -c 7849 -d system/ -o system_ubifs.img
Where:
- -m 2KiB
- The minimum I/O size of the underlying UBI and MTD devices. In our case, we are running the flash with no sub-page writes, so this is a 2KiB page.
- -e 124KiB
- Erase Block Size: UBI requires 2 minimum I/O units out of each Physical Erase Block (PEB) for overhead: 1 for maintaining erase count information, and 1 for maintaining the Volume ID information. The PEB size for the XO flash is 128KiB, so this leads to each Logical Erase Block (LEB) having 124KiB available for data.
- -x lzo
- Use LZO compression
- -c 7849
- The maximum size, in LEBs, of this file system. See calculation below for how this number is determined.
- -d system
- Use the contents of the system/ directory to generate the initial file system image. In this case the system directory is composed of the contents of the 767 JFFS image.
- -o system_ubifs.img
- Output file
The output of the above command, system_ubifs.img is fed into the ubinize program to wrap it into a UBI image:
/usr/local/bin/ubinize -o system_ubi.img -m 2KiB -p 128KiB -s 2KiB ubinize.cfg
Where:
- -o system_ubi.img
- Output file
- -m 2KiB
- Minimum flash I/O size of 2KiB page
- -s 2KiB
- Mininum I/O size used for UBI headers. Since we do not do sub-page writes, this is the same as -m
- ubinize.cfg
- Configuration file
The configuration file contents:
# Section header [rootfs] # Volume mode (other option is static) mode=ubi # Source image image=system_ubifs.img # Volume ID in UBI image vol_id=0 # Volume size vol_size=973312KiB # Allow for dynamic resize vol_type=dynamic # Volume name vol_name=rootfs # Autoresize volume at first mount vol_flags=autoresize
The UBIFS image starts out at 266MiB and the "autoresize" flag tells the kernel to expand the volume (and the filesystem above) to fill up all 973312KiB of the usable flash.
Usable Size Calculation
As documented here, UBI reserves a certain amount of space for management and bad PEB handling operations. Specifically:
- 2 PEBs are used to store the UBI volume table
- 1 PEB is reserved for wear-leveling purposes;
- 1 PEB is reserved for the atomic LEB change operation;
- a % of PEBs is reserved for handling bad EBs. The default for NAND is 1%
- UBI stores the erase counter (EC) and volume ID (VID) headers at the beginning of each PEB. 1 min I/O unit is required for each of these.
To calculate the full overhead, we need the following values:
Symbol | Meaning | Value for XO test case |
---|---|---|
SP | PEB Size | 128KiB |
SL | LEB Size | 128KiB - 2 * 2KiB = 124 KiB |
P | Total number of PEBs on the MTD device | 991.625MiB / 128KiB = 7933 |
B | Number of PEBs reserved for bad PEB handling | 79(1%) |
O | The overhead related to storing EC and VID headers in bytes, i.e. O = SP - SL | 4KiB |
UBI Overhead = (B + 4) * SP + O * (P - B - 4) = (79 + 4) * 128Kib + 4 KiB * (7933 - 79 - 4) = 42024 KiB = 329.3125 PEBs (round to 329)
This leaves us with 7604 PEBs or 973312KiB available for user data.
Note that I used "-c 7849" in the above mkfs.ubifs command line to specify the maximum filesystem size, not "-c 7604" The reason for this is that mkfs.ubifs operates in terms of LEB size (124 KiB), not PEB size (128Kib). 973312KiB / 124 Kib = 7849. I found this very confusing and it took a few re-readings of the examples to grok it. Note that in reality this number can be > 973312KiB as it only tells UBI/UBIFS the maximum volume size. If the file system is installed on a UBI volume smaller than this value, UBI will simply expand it to fit the volume. To support UBI root on both 1GiB and 4GiB devices, we simply need to create one UBI image that will resize automatically to the MTD device size.
Image Building
The jffs2 and UBI image blobs, boot_jffs2.img and system_ubi.img were fed into a modified version of User:Erik_Garrison Erik Garrison's olpc-image-builder script. The modified script was run as follows to generate the data.img and nand.img files for use with the OFW NAND FLASH Updater.
olpc-image-builder -d data.img -f nand.img -p "boot_jffs2.img boot 32MiB system_ubi.img system -1"
The resulting data.img file was edited to remove the "cleanmarkers" command for the UBI partition as this command is specific to JFFS2 partitions. The final build tool used by OLPC to build the OFW NAND update images will need to be be UBI aware and know not to add this command.
Initial Findings, Oddities, and Thoughts
- UBI is taking an extremely long time (~50s) to attach to the MTD device on the XO I am currently using for this testing. UBI attach time does scale linearly w.r.t flash size, however the 50s seems wrong. According to this, we should only take about 2 seconds. At attach time I see a warning "UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling, reserved 74, need 79" b/c my system partition has 5 bad PEBs in it and this may be related. Update: This ended up being related to having debug messages enabled for UBI which would spew data to syslog on every access. Disabling this reduced mount time to < 2s.
- After initial boot, 'df' shows the device as only being 822MiB in size. According to the FAQ and docs, UBIFS makes some conservative assumptions about free space based on the fact that it compresses and may have write back buffers queued up. However, I am seeing decrease size, not free space, so there is either a configuration issue or extra UBIFS overhead that I don't yet understand. Update: See http://lists.infradead.org/pipermail/linux-mtd/2008-October/023236.html and http://lists.laptop.org/pipermail/devel/2008-October/020304.html for ongoing discussion on this topic.
- I made a mistake in my calculations above by rounding 329.3125 down when I should be rounding it up as this is the overhead. This did not seem to impact UBIs ability to attach to the device our mount the filesystem.
- I was concerned about the status of bind mounts on top of UBIFS but everything seems to be working OK.
- By default, we're reserving 1%, or about 9MiB of the device for handling bad blocks. As discussed in this email, it would be interesting to know what are usage patterns are and simulate the expected lifecycle of the device to see if this is too little or too much.
- If we can enable sub-page writes on our flash device, we can recover some extra space from UBI.
Image Download and Installation
- Make sure your XO has security disabled
- Make sure your XO is running the latest OFW. The best way to do this is to update it to 8.2.0.
- Download the following files to a USB stick:
http://dev.laptop.org/~dsaxena/ubi_test/data.img http://dev.laptop.org/~dsaxena/ubi_test/nand.img
- Boot the laptop with USB stick and escape into the OFW prompt.
- Run:
ok dev nand : write-blocks write-pages ; dend write-blocks isn't unique # You can ignore this ok update-nand u:\data.img
- At this point OFW will erase the flash and copy the contents of the nand.img file to flash. When complete you can simply reboot the system.