UBIFS initial experiments: Difference between revisions

From OLPC
Jump to navigation Jump to search
 
(6 intermediate revisions by one other user not shown)
Line 168: Line 168:
The resulting data.img file was edited to remove the "cleanmarkers" command for the UBI partition as this command is specific to JFFS2 partitions. The final build tool used by OLPC to build the OFW NAND update images will need to be be UBI aware and know not to add this command.
The resulting data.img file was edited to remove the "cleanmarkers" command for the UBI partition as this command is specific to JFFS2 partitions. The final build tool used by OLPC to build the OFW NAND update images will need to be be UBI aware and know not to add this command.


=== Initial Findings and Oddities ===
=== Initial Findings, Oddities, and Thoughts ===


* UBI is taking an extremely long time (~50s) to attach to the MTD device on the XO I am currently using for this testing. UBI attach time does scale linearly w.r.t flash size, however the 50s seems wrong. According to [http://www.linux-mtd.infradead.org/doc/ubi.html#L_scalability this], we should only take about 2 seconds. At attach time I see a warning "UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling, reserved 74, need 79" b/c my system partition has 5 PEBs in it and this may be related. TODO: Test on another XO.
* UBI is taking an extremely long time (~50s) to attach to the MTD device on the XO I am currently using for this testing. UBI attach time does scale linearly w.r.t flash size, however the 50s seems wrong. According to [http://www.linux-mtd.infradead.org/doc/ubi.html#L_scalability this], we should only take about 2 seconds. At attach time I see a warning "UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling, reserved 74, need 79" b/c my system partition has 5 bad PEBs in it and this may be related. Update: This ended up being related to having debug messages enabled for UBI which would spew data to syslog on every access. Disabling this reduced mount time to < 2s.


* After initial boot, 'df' shows the device as only being 822MiB in size. According to the [http://www.linux-mtd.infradead.org/faq/ubifs.html#L_df_report FAQ] and [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_spaceacc docs], UBIFS makes some conservative assumptions about free space based on the fact that it compresses and may have write back buffers queued up. However, I am seeing decrease size, not free space, so there is either a configuration issue or extra UBIFS overhead that I don't yet understand. TODO: Look at UBIFS code and docs more to understand where this space is going.
* After initial boot, 'df' shows the device as only being 822MiB in size. According to the [http://www.linux-mtd.infradead.org/faq/ubifs.html#L_df_report FAQ] and [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_spaceacc docs], UBIFS makes some conservative assumptions about free space based on the fact that it compresses and may have write back buffers queued up. However, I am seeing decrease size, not free space, so there is either a configuration issue or extra UBIFS overhead that I don't yet understand. Update: See http://lists.infradead.org/pipermail/linux-mtd/2008-October/023236.html and http://lists.laptop.org/pipermail/devel/2008-October/020304.html for ongoing discussion on this topic.


* I made a mistake in my calculations above by rounding 329.125 down when I should be rounding it up as this is the overhead. This did not seem to impact UBIs ability to attach to the device our mount the filesystem.
* I made a mistake in my calculations above by rounding 329.3125 down when I should be rounding it up as this is the overhead. This did not seem to impact UBIs ability to attach to the device our mount the filesystem.

* I was concerned about the status of bind mounts on top of UBIFS but everything seems to be working OK.

* By default, we're reserving 1%, or about 9MiB of the device for handling bad blocks. As discussed in this [http://lists.laptop.org/pipermail/devel/2008-October/020164.html email], it would be interesting to know what are usage patterns are and simulate the expected lifecycle of the device to see if this is too little or too much.

* If we can enable sub-page writes on our flash device, we can recover some extra space from UBI.


=== Image Download and Installation ===
=== Image Download and Installation ===
Line 191: Line 197:


ok dev nand : write-blocks write-pages ; dend
ok dev nand : write-blocks write-pages ; dend
write-blocks isn't unique
write-blocks isn't unique # You can ignore this
ok update-nand u:\data.img
ok update-nand u:\data.img



Latest revision as of 00:53, 15 October 2008

Introduction

In this page I document my steps in creating a release 8.2.0 image running on top of UBIFS for the purpose of initial experimentation with that file system and point out some of my initial findings and questions that have popped up. If you are just interested in downloading and running the final image, just jump down to Image Download and Installation.

Flash Layout

Open Firmware does not support reading from UBIFS so to deal with this, the flash is partitioned into a 32MiB JFFS2 partition with the remainder of the space left over for use by UBI. OFW loads the kernel and initrd from the JFFS2 partition and the initrd handles mounting the UBIFS. Note that one flash erase block dedicated to the RedBoot partition table ("FIS directory" in RedBoot speak).

MTD Partition Location
FIS directory. 0x00000000-0x00020000 (128KiB)
boot 0x00020000-0x02020000 (32Mib)
system 0x02020000-0x3ffc0000 (991.785 MiB)

JFFS2 Partition

The JFFS partition simply contains a boot/ directory with three files: olpc.fth OFW boot script, vmlinuz compressed kernel binary, and olpcrd.img ramdisk image. The partition image was created with the following command:

/usr/sbin/mkfs.jffs2 -n lzo -e 128KiB -r boot/ -o boot_jffs2.img

olpc.fth

The olpc.fth script simply had the following modification applied to pass the proper boot parameters to the kernel and initrd:

@@ -77,7 +77,7 @@
    then

    " nand"  dn-buf count  sindex  0>=   if
-      " root=mtd0 rootfstype=jffs2"
+      " ubi.mtd=system root=ubi0:rootfs rootfstype=ubifs"
    else
       " root=LABEL=OLPCRoot rootfstype=ext3"
    then

The parameter "ubi.mtd=system" tells the UBI layer to attach to the named MTD device. This creates a new UBI device, ubi0 which contains the root filesystem volume, called "rootfs" The "root=ubi:rootfs" option tells the kernel to mount this volume. (This parameter is actually unused as the initrd handles the mounting of the root filesystem).

olpcrd

The olpcrd is identical to that in the 8.2 release except for the following change to initutil.py:

@@ -134,11 +134,12 @@
                     # when partitioned, expect bootpath like:
                     # /pci/nandflash@c:root,\boot\vmlinuz//jffs2-file-system:\boot\vmlinuz
                     if p is None: p = 0 # unpartitioned by default
-                    if type(p) is int:
-                        dev = 'mtd%d' % p
-                    else:
-                        dev = 'mtd:%s' % p
-                    extra = ['-t','jffs2']
+                    # if type(p) is int:
+                    #    dev = 'mtd%d' % p
+                    # else:
+                    #    dev = 'mtd:%s' % p
+                   dev = 'ubi0:rootfs'
+                   extra = ['-t','ubifs']
             else: # we're running under emulation
                 # these modules only needed if we're running in qemu
                 from stat import S_IFBLK

Note that this is not a permanent solution as we probably want to handle booting the same release on both a JFFS2 and UBIFS layout.

Kernel

The kernel used is available at here. It is composed of the OLPC kernel used for the 8.2 release merged with the linux-2.6.25 UBI backport tree. The kernel is built with UBI and UBIFS linked in as we are moving away from modules for required features to reduce boot time (see LWN article).

UBI Partition

The UBI system partition contains a single UBI volume named rootfs that covers the full partition minus overhead due to UBI overhead.

The UBIFS image is created via the following command:

/usr/local/bin/mkfs.ubifs -m 2KiB -e 124KiB -x lzo -c 7849 -d system/ -o system_ubifs.img

Where:

-m 2KiB
The minimum I/O size of the underlying UBI and MTD devices. In our case, we are running the flash with no sub-page writes, so this is a 2KiB page.
-e 124KiB
Erase Block Size: UBI requires 2 minimum I/O units out of each Physical Erase Block (PEB) for overhead: 1 for maintaining erase count information, and 1 for maintaining the Volume ID information. The PEB size for the XO flash is 128KiB, so this leads to each Logical Erase Block (LEB) having 124KiB available for data.
-x lzo
Use LZO compression
-c 7849
The maximum size, in LEBs, of this file system. See calculation below for how this number is determined.
-d system
Use the contents of the system/ directory to generate the initial file system image. In this case the system directory is composed of the contents of the 767 JFFS image.
-o system_ubifs.img
Output file

The output of the above command, system_ubifs.img is fed into the ubinize program to wrap it into a UBI image:

/usr/local/bin/ubinize -o system_ubi.img -m 2KiB -p 128KiB -s 2KiB ubinize.cfg

Where:

-o system_ubi.img
Output file
-m 2KiB
Minimum flash I/O size of 2KiB page
-s 2KiB
Mininum I/O size used for UBI headers. Since we do not do sub-page writes, this is the same as -m
ubinize.cfg
Configuration file

The configuration file contents:

# Section header
[rootfs]
# Volume mode (other option is static)
mode=ubi
# Source image
image=system_ubifs.img
# Volume ID in UBI image
vol_id=0
# Volume size
vol_size=973312KiB
# Allow for dynamic resize
vol_type=dynamic
# Volume name
vol_name=rootfs
# Autoresize volume at first mount
vol_flags=autoresize

The UBIFS image starts out at 266MiB and the "autoresize" flag tells the kernel to expand the volume (and the filesystem above) to fill up all 973312KiB of the usable flash.

Usable Size Calculation

As documented here, UBI reserves a certain amount of space for management and bad PEB handling operations. Specifically:

  • 2 PEBs are used to store the UBI volume table
  • 1 PEB is reserved for wear-leveling purposes;
  • 1 PEB is reserved for the atomic LEB change operation;
  • a % of PEBs is reserved for handling bad EBs. The default for NAND is 1%
  • UBI stores the erase counter (EC) and volume ID (VID) headers at the beginning of each PEB. 1 min I/O unit is required for each of these.

To calculate the full overhead, we need the following values:

SymbolMeaningValue for XO test case
SPPEB Size128KiB
SLLEB Size128KiB - 2 * 2KiB = 124 KiB
PTotal number of PEBs on the MTD device991.625MiB / 128KiB = 7933
BNumber of PEBs reserved for bad PEB handling79(1%)
OThe overhead related to storing EC and VID headers in bytes, i.e. O = SP - SL4KiB


UBI Overhead = (B + 4) * SP + O * (P - B - 4) 
             = (79 + 4) * 128Kib + 4 KiB * (7933 - 79 - 4)
             = 42024 KiB 
             = 329.3125 PEBs (round to 329)

This leaves us with 7604 PEBs or 973312KiB available for user data.

Note that I used "-c 7849" in the above mkfs.ubifs command line to specify the maximum filesystem size, not "-c 7604" The reason for this is that mkfs.ubifs operates in terms of LEB size (124 KiB), not PEB size (128Kib). 973312KiB / 124 Kib = 7849. I found this very confusing and it took a few re-readings of the examples to grok it. Note that in reality this number can be > 973312KiB as it only tells UBI/UBIFS the maximum volume size. If the file system is installed on a UBI volume smaller than this value, UBI will simply expand it to fit the volume. To support UBI root on both 1GiB and 4GiB devices, we simply need to create one UBI image that will resize automatically to the MTD device size.

Image Building

The jffs2 and UBI image blobs, boot_jffs2.img and system_ubi.img were fed into a modified version of User:Erik_Garrison Erik Garrison's olpc-image-builder script. The modified script was run as follows to generate the data.img and nand.img files for use with the OFW NAND FLASH Updater.

olpc-image-builder -d data.img -f nand.img -p "boot_jffs2.img boot 32MiB system_ubi.img system -1"

The resulting data.img file was edited to remove the "cleanmarkers" command for the UBI partition as this command is specific to JFFS2 partitions. The final build tool used by OLPC to build the OFW NAND update images will need to be be UBI aware and know not to add this command.

Initial Findings, Oddities, and Thoughts

  • UBI is taking an extremely long time (~50s) to attach to the MTD device on the XO I am currently using for this testing. UBI attach time does scale linearly w.r.t flash size, however the 50s seems wrong. According to this, we should only take about 2 seconds. At attach time I see a warning "UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling, reserved 74, need 79" b/c my system partition has 5 bad PEBs in it and this may be related. Update: This ended up being related to having debug messages enabled for UBI which would spew data to syslog on every access. Disabling this reduced mount time to < 2s.
  • I made a mistake in my calculations above by rounding 329.3125 down when I should be rounding it up as this is the overhead. This did not seem to impact UBIs ability to attach to the device our mount the filesystem.
  • I was concerned about the status of bind mounts on top of UBIFS but everything seems to be working OK.
  • By default, we're reserving 1%, or about 9MiB of the device for handling bad blocks. As discussed in this email, it would be interesting to know what are usage patterns are and simulate the expected lifecycle of the device to see if this is too little or too much.
  • If we can enable sub-page writes on our flash device, we can recover some extra space from UBI.

Image Download and Installation

  • Make sure your XO is running the latest OFW. The best way to do this is to update it to 8.2.0.
  • Download the following files to a USB stick:
 http://dev.laptop.org/~dsaxena/ubi_test/data.img
 http://dev.laptop.org/~dsaxena/ubi_test/nand.img
  • Boot the laptop with USB stick and escape into the OFW prompt.
  • Run:
ok dev nand   : write-blocks write-pages ;  dend
write-blocks isn't unique # You can ignore this
ok update-nand u:\data.img
  • At this point OFW will erase the flash and copy the contents of the nand.img file to flash. When complete you can simply reboot the system.