NAND Testing

From OLPC
Jump to navigation Jump to search

Intro

The non-volatile storage subsystem of the XO has limited design lifetime. It uses an ASIC (the CaFE) to provide an interface to a NAND Flash device. The CaFE is limited in Flash page size, making it unsuitable for future generations of NAND Flash devices. As part of the search for a replacement, OLPC is testing a variety of solutions to gauge their performance.

The goals of the storage subsystem testing are as follows:

  1. Evaluate the Flash wear leveling algorithms
  2. Evaluate the storage error rate of the devices
  3. Evaluate the relative access latency of the devices

Wear Leveling Algorithms Testing

A common flaw in early Flash wear leveling algorithms was only leveling across the remaining unused blocks. The test for this is to fill up most of the disk, then continue to write/erase repeatedly, forcing the write/erase cycles to use the small number of remaining free blocks.

Assume we fill all but 5 MB of the media (leaving 2.5K blocks). We can continue to write at approx. 250 blocks (0.5 MB) per second. Assuming no wear leveling, this should result in a write failure in approx. 200 thousand seconds (100K cycle lifetime). Assuming naively simple wear leveling, a failure should occur in around one million seconds (100K cycle lifetime), or slightly over a week.

Assuming some percent withheld

Managed NAND devices (such solid-state drives, SD cards, and newer single chip NAND devices) typically set aside between 4 and 8% of the media for wear leveling and bad block replacement. This complicates the test somewhat, but is ameliorated by the reduced W/E cycle lifetime expected with newer NAND Flash devices.

Assume we fill all but 1MiB of the media (6% of 4GiB is roughly 250 MiB, leaving up to 251 MiB/125 KBlocks actually free). Assume a maximum write rate of 500 blocks per second. Assuming naively simple wear leveling, a failure should occur after 5K write cycles of all free blocks, or roughly 750M block writes. This will require around 18 days of continuous writing to trigger.

LBA Test as Implemented

In the case of the LBA-NAND parts, we fill all but 32 MB of the media (leaving 16K blocks). Assume the device withholds 6% of the blocks for wear leveling and bad block replacement (120K blocks). Assuming naive wear leveling, this should result in a write failure in approx. 136K x 5K or 680M block writes (1.4 TB writen). We can continue to write at approx. 350 blocks (0.7 MB) per second, giving a time to failure of 22 days.

The current test program only writes in step 4.2, giving 20 MB/45 sec., or 220 blocks per second. This gives a times to failure of 35 days. But at the same time it is performing storage error rate testing at 42K blocks/sec --- checking the entire 4 GiB device 40 times a day.

JFFS2 Test as Implemented

In the case of the XOs with raw NAND and a JFFS2 management layer, we fill all but around 32 MB of the media (leaving 16K blocks). While the device doesn't withhold any blocks for wear leveling, we expect better than naive wear leveling from JFFS2. Given a 1 GiB device (512K blocks) and W/E lifetime of 100K cycles, we might expect that it will take be 50 billion block write cycles (100 TB of data written) before we start seeing significant errors.

The current test writes 10K blocks/35 seconds, giving us an expected time of 5.5 years of testing before we see failure due to write fatigue.

Storage Error Rate Testing

There is concern that the error rate of MLC devices is not acceptable for use as the primary storage for Linux. As all of the devices being tested are MLC parts, we have an opportunity to evaluate the error rate of the devices.

If we assume that read errors dominate, we can test about 780 passes of an entire device per machine per week. This can be done in conjunction with other tests (c.f. wear leveling), reducing the coverage/speed but not affecting the results.

Unfortunately, NAND manufacturers indicate that write disturbances are a larger problem than read errors, so error testing can't be this simple. I am proposing to verify the consistency of data stored on the vast majority of the media, while writing to the remainder of the media. Note that since there is at least one level of indirection between the test program and the media, it is difficult to simplify the consistency check to blocks possibly affected by a write disturb error.

Access Latency Testing

If the wear leveling algorithm is actually functioning, the latency required to terminate a write may vary widely. Information about this timing may be gathered as part of other tests.

Unfortunately, obtaining realistic timing requires that the disk be realistically fragmented...

Error Rate Assumptions

The stated write/erase cycle lifetime for the devices we are currently using in the XO is 100K cycles -- OLPC has not verified these claims.

The error rate for newer storage devices varies. Toshiba claims that its SLC parts have a 10K cycle lifetime, and its MLC parts have a 5K cycle lifetime.

Timing assumptions

Time estimates in this document are made using the following information, obtained by Mitch Bradley:

JFFS2 reads at between 5.6 and 12 MB/sec (data-dependent, note c), using 100% of the CPU (real time == system time).

Current test show similar bandwidth (with similar large variance!) --wad

LBA-NAND reads at 5.2 MB/sec, using <1% CPU (real time >> system time).

Current tests show closer to 4 MB/s --wad

JFFS2 writes at 760 kB/sec, using 100% CPU.

Current tests slow closer to 0.9 MB/s, but again large variance. --wad

LBA-NAND writes at 1.25 MB/sec, using <2% CPU.

Actual measurements seem closer to 0.7 MB/sec... --wad

Test Plan

The best laid schemes o' mice an' men... --John Steinbeck

Samples under Test

These are the storage media and access methods currently being tested:

  • Control: This is a conventional SATA disk drive, on a desktop computer
  • JFFS2: Five laptops using existing raw NAND plus JFFS2 software Flash translation
  • LBA-NAND: We have eight laptops with a 4GB Toshiba LBA-NAND installed
  • SD cards: Four laptops are testing SanDisk Extreme III (Class 6) SD cards

We are actively working to get additional devices into the mix, such as:

  • UbiFS: The upgrade to JFFS2. Not in testing yet.
  • eMMC NAND: Basically an MMC card without the wrapper, available from multiple vendors.
  • IDE/NAND controllers: Available cheaply from at least two companies. Phison makes the SSD controller used in both the Acer Aspire and the Asus EEE.

Wear & Error Test

This will be a combined test which will try to test the wear leveling mechanism of the storage device, while also regularly checking for errors in accessing stored data.

The plan is:

  1. While executing from a separate storage device
  2. Format as much of the media as possible as a single ext2 partition. The JFFS2 test case will use a JFFS2 partition, and the UBIFS test case will use a UBIFS partition.
  3. Create test data filling up all but 32MB of the partition. This test data will be pseudo-random in nature (white noise), and will be duplicated on the storage device. It has been suggested to instead record signatures of the test data. Since the data files are large (multiple media blocks in size), there is little danger of dual-failure (in both files) causing a comparison to give a false negative.
  4. Start a test script which continuously alternates between:
    1. Reading a file and its duplicate from the stored data, reporting any differences.
    2. Reading a file and its duplicate from the "hot" data, reporting any differences, then overwriting both files with new data.

The test software should log errors onto a storage device other than the device under test.

Step 4.1 is walking through a data set too large to fit into the kernel page cache. Naively done, however, Step 4.2 isn't effective if the kernel page cache is working, as the files being read were recently written to the storage media. The fix (available in newer kernels) is to flush the disk cache before comparing the files (see http://linux-mm.org/Drop_Caches):

echo 1 > /proc/sys/vm/drop_caches

This was properly added to version 1.2 of the test program.

Testing

These are notes detailing the implementation of the testing on the different platforms.

Common

Some elements of the testing are common to most test platforms:

Test Scripts

In order to minimize the runtime support needed for the testing, both the test and initialization scripts are written in Bourne shell. Sources are available from the OLPC git repository.

The following scripts are provided:

  • test.sh - the script which actually performs the test
  • parselogs.py - the script which takes one or more logs and produces statistics
  • fill.sh - a script for filling a partition with matched sets of random data
  • fill_random.sh - another script for generating the random data
  • fill_jffs.sh - the script actually used to fill the JFFS2 devices
  • fill_cp.sh - the script actually used to fill the LBA-NAND devices

The following are necessary only on LBA-NAND test laptops:

  • boot - a directory containing the OS used for the LBA-NAND tests
  • setup.sh - a script for setting up LBA-NAND laptops (deprecated, as it is now /etc/init.d/rc.usbnandtest in the boot ramdisk)

Logging

In most cases, logging is done to an external USB device. In some systems under test (JFFS2 and UbiFS laptops), this is the only storage media other than the device under test. It was used instead of logging the serial console of a laptop due to previous experience trying to collect and maintain serial logs from tens of machines --- the USB bus or serial/USB adapters would occasionally hiccup for unknown reasons and cause the logging to halt.

Logs may be processed using the parselogs.py script. It either takes a list of log files as arguments or processes all log files in the current directory if none are specified. It outputs statistical and error information aggregated from all log files processed.

Logs are being aggregated at http://dev.laptop.org/~wad/nand/. A summary of each machines status is shown, with a link to individual log files. A summary aggregating all logs for a device type is also available.

Control

Coming soon, the destruction of a SATA drive through continuous writing...

JFFS2

There are five XOs at 1CC running the tests on top of JFFS2. Build 8.2-760 was freshly installed on the laptops using [[Open_Firmware Open Firmware's] copy-nand command.

I ran into a slight glitch as all five crashed overnight on 9/22 (about ten hours into the testing, according to the logs), three definitely with the same kernel error (#8615), one with a dark screen, and one with a white screen (not hardware). Three (JFFS1, JFFS2, and JFFS4) were restarted with a console serial port attached and being logged.

The second problem is that JFFS2 might start a test, but after a couple of hundred write/erase cycles, it has run out of disk space for further writes. On these machines, I have gradually been deleting read data as disk space decreases (is consumed by fragmentation ?)

The current test rates are roughly 10 sec/test step 4.1, and 25 sec/test step 4.2. This translates into a 6.5 MByte/s read rate, and a 0.9 MByte/s write rate.

LaptopSerial #TestTotal Written
JFFS1CSN748003DBWear & Error222
JFFS2CSN74805706Wear & Error132
JFFS3SHF80702F53Wear & Error175
JFFS4SHF7250022FWear & Error42
JFFS5SHF725004D4Wear & Error27

Total Written refers to the total amount of data written to date to the storage device in an attempt to test wear levelling and W/E lifetime, in GiB. For the current tests, each pass is 0.02 GiB.

JFFS2 Setup Notes

If this is the first time, see the next section. If restarting a test, boot the laptop, and insert a USB key containing the test.sh script. Then simply type:

/usb/test.sh

A new logfile will automatically be created on the USB key (in /usb/logfile-xxxxx).

JFFS2 Initialization

Note: For these tests to have a valid effect, the storage device should not be re-formatted or re-initialized for the duration of the wear leveling test!

Install a fresh copy of release 8.2-760 from a USB key using Open Firmware:

copy-nand u:\os760.img.  Boot, and insert a USB key containing several scripts:
  • fill_jffs.sh - a script for filling the NAND with random data
  • fill_random.sh - an alternative script for filling the disk
  • random - a directory containing over 400 MB of random data, in 32 MiB files (optional)
  • test.sh - a script for running the wear leveling and error checking test

If using an earlier OLPC build (say 656), you will have to install the cmp utility:

yum install diffutils

Create a link from the mount point for the USB key to /usb:

ln -s /media/<USB_KEY_NAME> /usb

Now you need to fill the NAND Flash partition ("/" on the stock XO build). This can be done using the same method used for LBA-NAND devices. If the random directory is provided on the USB key, type:

/usb/fill_jffs.sh

An alternative. slower approach to filling the NAND with data, which doesn't require pre-computed random data on the USB key, is to manually:

mkdir /setA
cd /setA
/usb/fill_random.sh 11
cp -r /setA /setB

UbiFS

I will gladly add UbiFS equipped XOs to the test array, if someone provides a kernel and initrd supporting it. --wad

LBA

There are eight XOs at 1CC modified with a 4GB LBA-NAND part. Mitch Bradley has prepared a kernel that has the drivers for the LBA-NAND connected through the CaFE chip. He also has a BusyBox initrd which supports partitioning, ext2 formatting, and testing of the parts. We now have scripts support the testing described above. Testing started 9/22/08.

The current test rates are 14-16 sec/test step 4.1 and 13-16 sec/test step 4.2. This translates into roughly a 4 MByte/s read rate, and 0.7 MByte/s write rate (this test version wrote 10MB, and did no read testing). This is verified by later tests with a 34 sec. mean time for step 4.2, when both reading back 20 MiB of data and writing 20 MiB of data.

LaptopSerial #TestTotal Written
LBA1CSN74700D03Wear & Error248
LBA2CSN74702D30Wear & Error231
LBA3SHF808021E4Wear & Error236
LBA4CSN749013AFWear & Error213
LBA5CSN75001985Wear & Error223
LBA6CSN74702A8EWear & Error240
LBA7CSN748040B6Wear & Error245
LBA8CSN74900B3CWear & Error203

Total Written refers to the total amount of data written to date to the storage device in an attempt to test wear levelling and W/E lifetime, in GiB. For the current tests, each pass is 0.02 GiB.

LBA-NAND Setup Notes

Boot with a USB stick containing two directories:

  • boot
  • random - a directory containing over 2GB of random data, in 32 MiB files
  • test.sh - a script for running the wear leveling and error checking test
  • fill_cp.sh - a script for filling the NAND with random data

After the laptop boots, type the following to mount the USB disk for the first time:

mount /usb

At this point, some dangerous sounding error messages will result. Ignore them. If this is the first time, see the next section. If restarting a test, now simply type:

/usb/test.sh

A new logfile will automatically be created on the USB key (in /usb).

fsck

Occasionally, the ext2 filesystem on the NAND device becomes corrupted. You can repair it using:

umount /nand
/sbin/fsck.ext2 /dev/lba1
mount /dev/lba1 /nand

LBA-NAND Initialization

Note: For these tests to have a valid effect, the storage device should not be re-formatted or re-initialized for the duration of the wear leveling test! Do not run fill_cp.sh unless you are starting the tests for the first time!

The /etc/init.d/rc.usbnandtest script attempts to mount the storage device at boot time. Unmount it with:

umount /nand

Repartition the storage device using:

fdisk /dev/lba

Delete any existing partitions, and create a single partition using all available space. Hit this series of keys: d <CR> n <CR> p <CR> 1 <CR> <CR> <CR> w <CR>.

Then format the device using:

mke2fs -m 0 /dev/lba1

Now you can mount it and start filling it:

mount /dev/lba1 /nand
/usb/fill_cp.sh

As the kernel provided doesn't include support for /dev/urandom, the method used was to provide the random data on a USB key. fill_cp.sh just copies it from /usb/random. The USB key was previously initialized with sufficient random data using the fill_random.sh command. This command takes a number of 32 MB random data files to generate as an argument (65 files is sufficient for 4 GiB devices):

mkdir /Volumes/USBKEY/random
cd /Volumes/USBKEY/random
~/NANDtest/fill_random.sh 65

SD Cards

There are four XOs at 1CC running the tests on a SanDisk Extreme III SD card. Build 8.2-760 was freshly installed on the laptops.

The current test rates are roughly 3.9 sec/test step 4.1, and 5 sec/test step 4.2. This translates roughly into a 17 MByte/s read rate, and a 5.7 MByte/s write rate.

LaptopSerial #TestTotal Written
SAN1SHF ?Wear & Error
SAN2SHF ?Wear & Error
SAN3SHF80600A54Wear & Error151
SAN4CSN74902B22Wear & Error

Total Written refers to the total amount of data written to date to the storage device in an attempt to test wear levelling and W/E lifetime, in GiB. For the current tests, each pass is 0.02 GiB.

SD Card Setup Notes

If this is the first time, see the next section. If restarting a test, boot the laptop, with a USB stick containing the test.sh script, and type:

/usb/test.sh

A new logfile will automatically be created on the USB key (in /usb/logfile-xxxxx).

SD Card Initialization

Note: For these tests to have a valid effect, the storage device should not be re-formatted or re-initialized for the duration of the wear leveling test!

Install a fresh copy of release 8.2-760 from a USB key using Open Firmware:

copy-nand u:\os760.img.

Boot, and insert a USB key containing several scripts:

  • fill_jffs.sh - a script for filling the NAND with random data
  • fill_random.sh - an alternative script for filling the disk
  • test.sh - a script for running the wear leveling and error checking test
  • random - a directory containing over 400 MB of random data, in 32 MiB files (only needed for initialization, and optional even then)

Go to the Journal and unmount the SD card.

You will need to create a link from the mount point for the USB key to /usb:

ln -s /media/<USB_KEY_NAME> /usb

Repartition the storage device using:

fdisk /dev/mmcblk0

Delete any existing partitions, and create a single partition using all available space. Hit this series of keys: d <CR> n <CR> p <CR> 1 <CR> <CR> <CR> w <CR>. If you get an error while re-reading the device partition table, reboot at this point.

Then format the device using:

mke2fs -m 0 /dev/mmcblk0p1

Now, mount the device as /nand, and start filling it with random data:

mkdir /nand
mount /dev/mmcblk0p1 /nand
/usb/fill_cp.sh
umount /nand
rmdir /nand

Reboot, and create a link from the mount point for the SD card to /nand:

ln -s /media/<SD_CARD_NAME> /nand

You are ready to start the testing, with:

/usb/test.sh