Nandblaster for XO-1: Difference between revisions

From OLPC
Jump to navigation Jump to search
(due to a bug q2f14 and later cannot complete nandblaster tasks)
 
(36 intermediate revisions by 8 users not shown)
Line 1: Line 1:
<noinclude>{{Google Translations}}{{TOCright}}
== OFW Multicast Wireless NAND Updater ==
[[Category:Network]]
[[Category:Firmware]]
[[Category:Update paths]]</noinclude>


== Quick Start ==
OFW Q2E19 [http://wiki.laptop.org/go/OLPC_Firmware_q2e19] has experimental support for receiving multicasted NAND FLASH updates via wireless.


=== XO-1 ===
It's in a fairly raw state at the moment (it needs performance, security, and user interface improvements), but it basically works.


How to install on one XO-1 and then clone to a set of XO-1s:
To use it, say


* install the build you want into one laptop, then turn it off, then get it to the [[Ok]] prompt without starting the operating system, then type ''nb-clone'' and press enter, ... it will start sending that build for the other laptops,
ok wifi ''myessid''
* on each of the other laptops, as fast or as slow as you like, turn them on, but with the four game keys held down, ... that laptop will start receiving the spray and install it.
ok mcastnand


Once all the laptops are installed, they can be rebooted, and the sending laptop can be shutdown with the power button.
The default value for ''myessid'' is OLPCOFW , so you could set your wireless AP to that SSID and omit the wifi command. That works with an open wireless network. You can supply authentication keys with one of:


== OFW Multicast Wireless NAND Updater (NANDblaster)==
ok wep HHHHHHHHHH \ 10 hex digits
ok wep HHH...HHH \ 26 hex digits
ok pmk HHH...HHHH \ 64 hex digits


[[OLPC Firmware q2e24]] up to [[OLPC Firmware q2f13]] can update the NAND FLASH via wireless. One XO-1 is the sender. Any number of other XO-1s can receive simultaneously. The sender data comes either from a NAND image file (USB or SD) or from the sending machine's own NAND FLASH (cloning). The NAND image can be partitioned or not, signed or unsigned. The receiving machine can be secure or unsecure.
PMK is the pairwise master key for WPA.


== Starting the Sender ==
On the server, you do this:


The sending machine must be unsecure so you can type commands at the ok prompt.
git clone git://git.infradead.org/mtd-utils.git
cd mtd-utils
make serve_image
./serve_image 239.255.1.2 12345 myimage.img 131072 100


=== NANDblasting an Unsigned NAND Image File ===
Replace "myimage.img" with the name of a NAND image file.


To send an unsigned NAND image from a file, put the NAND image file on a USB key as "'''/fs.img'''". Then you will need a so-called "control file"; a .plc file, which is a fairly new addition to the XO image ecology. It can be created by unleashing the image-digestor program on your intended image. Get it from:
The "100" parameter is the multicast data rate in Kbytes/sec. 100 is about the best you can do with garden-variety wireless access points, as
<pre>git clone git://dev.laptop.org/users/erik/image-digestor</pre>
it corresponds to a bit rate of about 1 MBit/sec, which is the built-in multicast rate limit for a lot of access points. If you can configure your access point to multicast faster than that, you can bump the rate parameter up to 500. If you go any faster than that, OFW will start to lose packets and the overall performance will suffer (improving this is high on my list).
And run image-digestor.sh with your image as only argument. At time of writing, supplying a link won't work, cause the script won't follow it. Rename the resulting file to fs.plc and put it in the root of the usb key, as in: "'''/fs.plc'''". Insert the USB key into the sending XO and type:


ok nb-update
If you don't have a fast-multicasting access point, you can test the 500 rate with unicasting: (XXX this is currently broken in q2e19; it fails with "Can't open net" message. It's fixed in q2e19c and later.)


=== NANDblasting a Signed NAND Image File ===
ok boot rom:mnandcast X.Y.Z.S,,X.Y.X.C 12345 /nandflash


To send a signed NAND image from a file, put the NAND image file on a USB key as "'''/fs.img'''". Put the signature bundle (the .zip that contains a control file and a signature file) on the USB key as "'''/fs.zip'''". Insert the USB key into the sending XO and type:
X.Y.Z.S is the server's IP address, X.Y.X.C is the client's.


ok nb-secure
./serve_image X.Y.Z.C 12345 myimage.img 131072 500


When sending from a file, the progress report is text.
== Performance ==


=== Cloning the Sender's NAND ===
At a rate of 100, it will take about 10 seconds per megabyte, so 100 MBytes would take 1000 seconds/16 minutes, and 400 MBytes would take about an hour. That clearly sucks. The limitation at this level is the access point; the solutions are either to spec better access points or, preferably, work out how to use an XO as the sender (will require that OFW learns how to listen to the mesh). Sending to the mesh with TTL=1 is essentially the same as multicast.


Where an image file is not available, it is possible though risky to clone the sender's own NAND FLASH:
At 500, it would take about 12 minutes. (Writing the FLASH takes a couple of minutes, so say 15 minutes for a typical OS image at 500). The limitation at the 500 level is OFW's USB Wlan driver, which does a lame job of bulk-in buffer management. With some work, I should be able to double that rate; 8 minutes for a 400 MByte image is my goal for the moment.


ok nb-clone
Keep in mind that you can have a lot of XOs doing this at once, so you can start a few hundred and by the time you have finished starting them all, the first ones will probably be finished. So maybe the 10 to 15 minute time range is acceptable.


When cloning, the sender displays a graphic progress report showing which of its NAND FLASH blocks is currently being sent.
== Security ==


(Why is it risky? If you have booted the operating system on the XO-1 after installing it but before using it with ''nb-clone'', there will be problems such as activity collaboration failure. See [[Imaging/Side_effects]] for how to fix. The fix is best done before ''nb-clone'' so that it does not need to be repeated on every laptop. Cloning already booted laptops is not supported by OLPCA's deployment support team. However, it sometimes works in limited testing.)
Currently there is none. The plan to have the server also periodically send a signature file to validate the .img file. It will be basically the same set of files that we currently use for secure filesystem updates via the 4-button salute.


== User interface ==
=== Stopping ===


You can stop the sender by typing the ESC key or by powering off. If you type any other key, the sender will pause and ask you if you want to stop. If you then type 'y', it will stop; other keys will resume the sending.
Could do something with the 4-button salute, but I'm also thinking of turning on the OFW GUI at some point. The 4-button approach has the advantage of being faster - turn on the power, hold the buttons, go to the next machine.

=== "No quiet channels" - Forcing the Channel ===

The sender chooses a wireless channel automatically by scanning to find the least-busy channel. If they are all busy, i.e. they all have received signal strength values exceeding some threshold, the sender tell you which one has the least signal strength and ask for confirmation before proceeding on that channel.

You can force it to use a specific channel by appending either "1", "6", or "11" to one of the commands above, for example:

ok nb-update6

If you force the channel, the sender won't check if the channel is busy, it will just start sending.

=== Changing the Redundancy ===

You can change the redundancy percentage before you execute one of the send commands. For example, to set the sender redundancy to 12% (the default is 20%), type:

ok d# 12 to redundancy

The redundancy controls the number of extra error correction packets that the sender sends for each erase block worth of source data.

== Starting the Receiver ==

=== ... with Game Buttons ===

Start reception by powering on then holding down the four game buttons above the power button (circle, square, check, X). That button combination runs the updater; the NANDblaster function is one of the update choices. The updater first looks for fs.zip on USB, then on SD, then NANDblaster, and finally tries to associate with an access point and update from an HTTP server.

This works for both secure and non-secure systems. If the system is secure, the sender must be sending a signed image; otherwise the receiver will stop before writing to it NAND, saying "Placement spec bad signature!".

=== ... from the OFW Command Line ===

If the receiver is not secure, you can start the NANDblaster by typing:

ok nb

The one reception command automatically handles all forms of sending - whatever the sender is sending, that's what the receiver will do.

=== Receiver Channel Selection ===

The receiver automatically scans the three wireless channels (1, 6, 11) to find a sender. If a NANDblaster sender is transmitting on one of those channels, the receiver locks onto it and begins reception. To force the receiver to use a specific channel, add "1", "6", or "11" after "nb", for example:

ok nb11

Forcing the channel is usually unnecessary; if there is a sender, the receiver will find it. You might want to force the channel if you have multiple senders on different channels, perhaps sending different images.

=== Receiver Progress Display ===

The receiver shows a map of the NAND eblocks, coloring them to indicate their current status. The color code is:

{| class="wikitable"
|-
! Color
! Status
! Meaning
|-
| <font color=gray>'''Gray'''</font>
| LEAVE_ALONE
| The NANDblaster won't change the block. This is the case for preexisting partition tables, preexisting bad block tables, and partitions that are not being changed during this update.
|-
| <font color=red>'''Red'''</font>
| BAD
| The block is bad as indicated by the NAND's bad block table.
|-
| <font color=black>'''Black'''</font>
| ERASED
| The NANDblaster has erased the block
|-
| <font color=#cccc00>'''Yellow'''</font>
| PENDING
| The NANDblaster expects to rewrite the block, but has not yet received any data for it.
|-
| <font color="lightblue">'''Light Blue'''</font>
| WILL_CLEAN
| The NANDblaster will erase and write a JFFS2 cleanmarker to the block after all of the data has been received.
|-
| <font color=magenta>'''Magenta'''</font>
| PARTIAL
| The NANDblaster has received some data for the block, but not yet a complete set.
|-
| <font color=cyan>'''Cyan'''</font>
| READY
| The NANDblaster has received a complete set of data for the block, but not yet decoded it.
|-
| <font color=green>'''Green'''</font>
| WRITTEN
| The NANDblaster has decoded the data and has written it to the block.
|-
| <font color=blue>'''Blue'''</font>
| CLEAN
| The NANDblaster has erased the block and written a JFFS2 cleanmarker.
|}

== Performance ==

The NANDblaster uses a lot of the wireless bandwidth, sending at between 1.2 and 2.4 MBytes/sec, depending on the source of the data. (With precomputed packet images, the NANDblaster can transmit faster than the Linux wireless driver's fastest measured throughput to date.) In a quiet RF environment, the receivers can accept the sender's data rate with few errors. It's common for the receiver to acquire a complete set of data in one sending pass. If it doesn't get all the data on the first pass, it usually gets the rest on the second pass.

At the typical data rate, one sending pass of a 250 MB image takes just over 3 minutes. After the receiver has a complete set, it takes another couple of minutes to decode the error correction, check the hashes, and write the data to its NAND FLASH.

You can run as many receivers as you want simultaneously, limited only by your ability to power them and place them within wireless range of the sender. The wireless traffic is strictly one way - sender to receiver. The receivers are passive - no beacons, no probes, no acknowledgments. The sender uses the mesh with TTL=1 so its packets aren't retransmitted by other mesh nodes that happen to be listening. Furthermore, the packets are at the Ethernet layer, with a special type code "XO", so devices that use standard protocols should ignore them.


== How it works ==
== How it works ==


The server sends data continuously. It divides each NAND eraseblock-sized chunk of the image into some number of packets (currently 47), then it creates half again as many packets (23) with redundant information using a Forward Error Correction scheme based on Vandermonde matrices. For each eraseblock there are 70 packets. If you have any 47 of those packets, you can recreate the erase block contents - it doesn't matter which ones you have, only that they are distinct. The server sends packet 0 for every erase block, then packet 1 for every erase block, and so on. When it gets to the end, it starts over.
The sender process sends data continuously. It divides each NAND eraseblock-sized chunk of the image into some number of packets (currently 99), then it creates some more packets with redundant information using a Forward Error Correction scheme based on Vandermonde matrices. The redundancy percentage defaults to 20, so there are 119 packets for each eraseblock. The receiver can reconstruct the erase block contents from any set of 99 distinct packets. You can change the redundancy level from the ok prompt with:

ok d# 10 to redundancy
ok nb-update

In my tests in a very quiet RF environment, I've seen good results with redundancy as low as 3%, but I expect that a higher number is a better default. Increasing the redundancy increases the probability that the receiver will get a complete set in one pass, but it slows down each pass.

The receiver can start anywhere in the sequence. It collects packets, storing them on the NAND FLASH with a bit of overflow into RAM, until it has 99 distinct packets for each eraseblock. Then it does the mathematical magic to reconstruct the data, writing the reconstruction back to the NAND. The math essentially amounts to solving a set of simultaneous equations.

If the packet error rate is reasonably low (less than about 10%), there's a good chance that the receiver will get a complete set of 99 packets for every erase block in one server cycle. If not, the receiver keeps listening until it finally gets a complete set. I've tested this with error rates approaching 50%, and it still works, although it takes a long time.

== Partitions ==

If the NAND source data is partitioned, either by cloning a partitioned NAND or by using a control file specifying partitions, the receiver will either use an existing partition map or create a new one. The source partition information includes three items of information for each partition:

* Name - a string of up to 32 characters
* Suggested size - The number of blocks to allocate when creating a new partition table
* Used size - The number of blocks of data that are being sent for this partition

The receiver inspects its existing partition map. If, for each partition in the source information, there is an existing partition with the same name whose existing size is at least "Used size", the existing partition map will be used as-is. Otherwise a new partition map will be created (overwriting the old one if present), with "Suggested size" for the partition sizes.

If the existing partition map is re-used, any existing partitions that aren't mentioned in the source partition information are left alone. That can be used to preserve user data partitions while replacing the OS partition.

When you use "nb-clone" to send a partitioned image from the sender's NAND, it will send all the partitions. It's possible to restrict it so it only sends a subset of those partitions, but the user interface for that is clumsy at present and I don't want to publicize it until I can offer a better UI. If you need it right away, contact me (Mitch Bradley) and I'll work with you.

== Speedups ==


There's a way to make the sender go faster, by feeding it a file with pre-computed packet data. That feature works, but the instructions and use cases aren't ready yet. All told, the faster sending doesn't end up being a huge win, because it pushes the receiver to the point where it starts to drop a few packets, so the net throughput improvement is modest at best.
The receiver can start anywhere in the sequence. It collects packets, storing them on the NAND FLASH with a bit of overflow into RAM, until it has 47 distinct packets for each eraseblock. Then it does the mathematical magic to reconstruct the data, writing the reconstruction back to the NAND.


It's probably possible to shave a couple of minutes off the total receiver time by overlapping the decoding of fully-received blocks with packet reception. But that won't be trivial to implement.
If the packet error rate is reasonably low (less than about 10%), there's a good chance that the receiver will get a complete set of data in one server cycle (for each eraseblock, you only need 47 of the 70 possible packets). If not, the receiver keeps listening until it finally gets a complete set. I've tested this with error rates approaching 50%, and it still works, although it takes a long time (A 460 MByte image transmitted at basically 100 Kbytes/sec took 6 hours to receive. That was using a slow access point that was being fed with data at a rate it couldn't handle, so it had to drop a bunch of the packets.)


== Source Code ==
== Source Code ==


The source for the OFW version is currently at [http://dev.laptop.org/git?p=users/wmb/multicast-nand]
The source for the OFW version is currently at [http://dev.laptop.org/git/users/wmb/multicast-nand/]


The original source for the Linux version and for the server side code is at git://git.infradead.org/mtd-utils.git
The source for David Woodhouse's original Linux version is at git://git.infradead.org/mtd-utils.git . The OFW version has diverged a great deal from the original. The two no longer interoperate - I had to extend the protocol headers to handle partitions and security.


== Background ==
== Background ==
Line 79: Line 200:
<Mitch_Bradley> erikg: the issue is whether the XS network is discoverable via the essids mentioned
<Mitch_Bradley> erikg: the issue is whether the XS network is discoverable via the essids mentioned
<Mitch_Bradley> erikg: and also there were some problems with OFWs support of mesh-mode for the wlan
<Mitch_Bradley> erikg: and also there were some problems with OFWs support of mesh-mode for the wlan
<erikg> Mitch_Bradley: turning on a bunch of laptops and letting them sit takes considerably less manual effort than the usb-based (serial) approach. in peru it would help them to reduce the warehouse staff, most of whom do double-duty activating and physically distributing the laptops.
<erikg> Mitch_Bradley: turning on a bunch of laptops and letting them sit takes considerably less manual effort than the usb-based (serial) approach.
In peru it would help them to reduce the warehouse staff, most of whom do double-duty activating and physically distributing the laptops.
<Mitch_Bradley> erikg: OFW works well in access point mode, but how well it works in ad hoc mode is questionable
<Mitch_Bradley> erikg: OFW works well in access point mode, but how well it works in ad hoc mode is questionable
<erikg> Mitch_Bradley: in OFW wireless reflash the reflash is the same aside from the source of the data?
<erikg> Mitch_Bradley: in OFW wireless reflash the reflash is the same aside from the source of the data?
Line 112: Line 234:
<Mitch_Bradley> dwmw2: wasn't there some problem with some APs?
<Mitch_Bradley> dwmw2: wasn't there some problem with some APs?
<Mitch_Bradley> like, multicast being throttled to a ridiculously low data rate?
<Mitch_Bradley> like, multicast being throttled to a ridiculously low data rate?
<dwmw2> some APs only send multicast at the lowest possible rate (1Mb/s). Which sucks. When I first set it up I got an AP which lets you set the multicast rate (which was left in smithbone's custody and is around 1cc somewhere iirc).
<dwmw2> some APs only send multicast at the lowest possible rate (1Mb/s). Which sucks. When I first set it up I got an AP which
lets you set the multicast rate (which was left in smithbone's custody and is around 1cc somewhere iirc).
<dwmw2> more recently (and in Mongolia) I was just doing it over mesh not infra mode, with a libertas dongle in my shinybook
<dwmw2> more recently (and in Mongolia) I was just doing it over mesh not infra mode, with a libertas dongle in my shinybook
<dwmw2> remember to set the mesh ttl to 1 and the mesh broadcast rate to something sane (like 11Mb/s iirc)
<dwmw2> remember to set the mesh ttl to 1 and the mesh broadcast rate to something sane (like 11Mb/s iirc)
Line 122: Line 245:
<dwmw2> I think it was a Buffalo one
<dwmw2> I think it was a Buffalo one
<erikg> ok
<erikg> ok
<erikg> they've god 40k machines to flash here in peru and they're doing it by usb dongle
<erikg> they've got 40k machines to flash here in peru and they're doing it by usb dongle
<erikg> dwmw2: so what happens on the xo-side?
<erikg> dwmw2: so what happens on the xo-side?
<dwmw2> I went round the mall by the hotel in Shanghai, took photos of all the APs I could see, went back to the hotel and googled for their manuals.
<dwmw2> I went round the mall by the hotel in Shanghai, took photos of all the APs I could see, went back to the hotel and googled for their manuals.
Line 155: Line 278:
<erikg> i am seeing olpc.fth, vmlinuz, and initrd.img ... the script is in the initrd
<erikg> i am seeing olpc.fth, vmlinuz, and initrd.img ... the script is in the initrd
<erikg> are they still using this method?
<erikg> are they still using this method?
<Mitch_Bradley> they don't have dwmw2's shinybook
<Mitch_Bradley> they don't have dwmw2's shinybook
<dwmw2> they don't need it -- just a libertas dongle. But we never really polished it enough for them to be able to use it.
<dwmw2> they don't need it -- just a libertas dongle. But we never really polished it enough for them to be able to use it.
<erikg> hmmmm
<erikg> hmmmm
Line 165: Line 288:
<dwmw2> the hardest part was plugging all the damn things in for long enough to do the OF upgrade :)
<dwmw2> the hardest part was plugging all the damn things in for long enough to do the OF upgrade :)
<dwmw2> the other reason it's not an 'official' install method is because the tool doesn't check the signature of the images
<dwmw2> the other reason it's not an 'official' install method is because the tool doesn't check the signature of the images
<dwmw2> it wouldn't be hard to handle that (I think we talked about not writing the first block of the image until it was all checked and the signature passed; writing some marker there which told OF not to use it?)
<dwmw2> it wouldn't be hard to handle that (I think we talked about not writing the first block of the image until it was all
checked and the signature passed; writing some marker there which told OF not to use it?)
dwmw2> but we never did it. So we had a signed key with my XO-side stuff on it, which had a list of the serial numbers for Mongolia and would _only_ work on those laptops.
<dwmw2> but we never did it. So we had a signed key with my XO-side stuff on it, which had a list of the serial numbers for
Mongolia and would _only_ work on those laptops.
<dwmw2> but would let you install _anything_ on those laptops, of course.
<dwmw2> but would let you install _anything_ on those laptops, of course.
<erikg> dwmw2: i see
<erikg> dwmw2: i see
<dwmw2> if we want to be able to use multicast 'in anger', we should fix that -- it shouldn't be hard.
<dwmw2> if we want to be able to use multicast 'in anger', we should fix that -- it shouldn't be hard.
<dwmw2> I think writing a new node type which is unknown to any existing JFFS2 implementation and has the 'cannot mount if unknown' compatibility bitmask ought to be sufficient -- write that in the first block of the flash when we start to rx the image.
<dwmw2> I think writing a new node type which is unknown to any existing JFFS2 implementation and has the 'cannot mount if
unknown' compatibility bitmask ought to be sufficient -- write that in the first block of the flash when we start
to rx the image.
<dwmw2> then only remove it when the image is completely received and verified.
<dwmw2> then only remove it when the image is completely received and verified.
Mitch_Bradley> The OFW updater has support for such a feature. It uses a marker in the otherwise-unused portion of the OOB area
<Mitch_Bradley> The OFW updater has support for such a feature. It uses a marker in the otherwise-unused portion of the OOB area
<dwmw2> we could use that then, since marking it unusable in OFW should be perfectly sufficient.
<dwmw2> we could use that then, since marking it unusable in OFW should be perfectly sufficient.
</pre>
</pre>

== NANDBlaster for XO-1.5 ==

See [[Nandblaster_for_XO-1.5]]

Latest revision as of 22:50, 10 July 2013

Quick Start

XO-1

How to install on one XO-1 and then clone to a set of XO-1s:

  • install the build you want into one laptop, then turn it off, then get it to the Ok prompt without starting the operating system, then type nb-clone and press enter, ... it will start sending that build for the other laptops,
  • on each of the other laptops, as fast or as slow as you like, turn them on, but with the four game keys held down, ... that laptop will start receiving the spray and install it.

Once all the laptops are installed, they can be rebooted, and the sending laptop can be shutdown with the power button.

OFW Multicast Wireless NAND Updater (NANDblaster)

OLPC Firmware q2e24 up to OLPC Firmware q2f13 can update the NAND FLASH via wireless. One XO-1 is the sender. Any number of other XO-1s can receive simultaneously. The sender data comes either from a NAND image file (USB or SD) or from the sending machine's own NAND FLASH (cloning). The NAND image can be partitioned or not, signed or unsigned. The receiving machine can be secure or unsecure.

Starting the Sender

The sending machine must be unsecure so you can type commands at the ok prompt.

NANDblasting an Unsigned NAND Image File

To send an unsigned NAND image from a file, put the NAND image file on a USB key as "/fs.img". Then you will need a so-called "control file"; a .plc file, which is a fairly new addition to the XO image ecology. It can be created by unleashing the image-digestor program on your intended image. Get it from:

git clone git://dev.laptop.org/users/erik/image-digestor

And run image-digestor.sh with your image as only argument. At time of writing, supplying a link won't work, cause the script won't follow it. Rename the resulting file to fs.plc and put it in the root of the usb key, as in: "/fs.plc". Insert the USB key into the sending XO and type:

 ok nb-update

NANDblasting a Signed NAND Image File

To send a signed NAND image from a file, put the NAND image file on a USB key as "/fs.img". Put the signature bundle (the .zip that contains a control file and a signature file) on the USB key as "/fs.zip". Insert the USB key into the sending XO and type:

 ok nb-secure

When sending from a file, the progress report is text.

Cloning the Sender's NAND

Where an image file is not available, it is possible though risky to clone the sender's own NAND FLASH:

 ok nb-clone

When cloning, the sender displays a graphic progress report showing which of its NAND FLASH blocks is currently being sent.

(Why is it risky? If you have booted the operating system on the XO-1 after installing it but before using it with nb-clone, there will be problems such as activity collaboration failure. See Imaging/Side_effects for how to fix. The fix is best done before nb-clone so that it does not need to be repeated on every laptop. Cloning already booted laptops is not supported by OLPCA's deployment support team. However, it sometimes works in limited testing.)

Stopping

You can stop the sender by typing the ESC key or by powering off. If you type any other key, the sender will pause and ask you if you want to stop. If you then type 'y', it will stop; other keys will resume the sending.

"No quiet channels" - Forcing the Channel

The sender chooses a wireless channel automatically by scanning to find the least-busy channel. If they are all busy, i.e. they all have received signal strength values exceeding some threshold, the sender tell you which one has the least signal strength and ask for confirmation before proceeding on that channel.

You can force it to use a specific channel by appending either "1", "6", or "11" to one of the commands above, for example:

 ok nb-update6

If you force the channel, the sender won't check if the channel is busy, it will just start sending.

Changing the Redundancy

You can change the redundancy percentage before you execute one of the send commands. For example, to set the sender redundancy to 12% (the default is 20%), type:

 ok d# 12 to redundancy

The redundancy controls the number of extra error correction packets that the sender sends for each erase block worth of source data.

Starting the Receiver

... with Game Buttons

Start reception by powering on then holding down the four game buttons above the power button (circle, square, check, X). That button combination runs the updater; the NANDblaster function is one of the update choices. The updater first looks for fs.zip on USB, then on SD, then NANDblaster, and finally tries to associate with an access point and update from an HTTP server.

This works for both secure and non-secure systems. If the system is secure, the sender must be sending a signed image; otherwise the receiver will stop before writing to it NAND, saying "Placement spec bad signature!".

... from the OFW Command Line

If the receiver is not secure, you can start the NANDblaster by typing:

 ok nb

The one reception command automatically handles all forms of sending - whatever the sender is sending, that's what the receiver will do.

Receiver Channel Selection

The receiver automatically scans the three wireless channels (1, 6, 11) to find a sender. If a NANDblaster sender is transmitting on one of those channels, the receiver locks onto it and begins reception. To force the receiver to use a specific channel, add "1", "6", or "11" after "nb", for example:

 ok nb11

Forcing the channel is usually unnecessary; if there is a sender, the receiver will find it. You might want to force the channel if you have multiple senders on different channels, perhaps sending different images.

Receiver Progress Display

The receiver shows a map of the NAND eblocks, coloring them to indicate their current status. The color code is:

Color Status Meaning
Gray LEAVE_ALONE The NANDblaster won't change the block. This is the case for preexisting partition tables, preexisting bad block tables, and partitions that are not being changed during this update.
Red BAD The block is bad as indicated by the NAND's bad block table.
Black ERASED The NANDblaster has erased the block
Yellow PENDING The NANDblaster expects to rewrite the block, but has not yet received any data for it.
Light Blue WILL_CLEAN The NANDblaster will erase and write a JFFS2 cleanmarker to the block after all of the data has been received.
Magenta PARTIAL The NANDblaster has received some data for the block, but not yet a complete set.
Cyan READY The NANDblaster has received a complete set of data for the block, but not yet decoded it.
Green WRITTEN The NANDblaster has decoded the data and has written it to the block.
Blue CLEAN The NANDblaster has erased the block and written a JFFS2 cleanmarker.

Performance

The NANDblaster uses a lot of the wireless bandwidth, sending at between 1.2 and 2.4 MBytes/sec, depending on the source of the data. (With precomputed packet images, the NANDblaster can transmit faster than the Linux wireless driver's fastest measured throughput to date.) In a quiet RF environment, the receivers can accept the sender's data rate with few errors. It's common for the receiver to acquire a complete set of data in one sending pass. If it doesn't get all the data on the first pass, it usually gets the rest on the second pass.

At the typical data rate, one sending pass of a 250 MB image takes just over 3 minutes. After the receiver has a complete set, it takes another couple of minutes to decode the error correction, check the hashes, and write the data to its NAND FLASH.

You can run as many receivers as you want simultaneously, limited only by your ability to power them and place them within wireless range of the sender. The wireless traffic is strictly one way - sender to receiver. The receivers are passive - no beacons, no probes, no acknowledgments. The sender uses the mesh with TTL=1 so its packets aren't retransmitted by other mesh nodes that happen to be listening. Furthermore, the packets are at the Ethernet layer, with a special type code "XO", so devices that use standard protocols should ignore them.

How it works

The sender process sends data continuously. It divides each NAND eraseblock-sized chunk of the image into some number of packets (currently 99), then it creates some more packets with redundant information using a Forward Error Correction scheme based on Vandermonde matrices. The redundancy percentage defaults to 20, so there are 119 packets for each eraseblock. The receiver can reconstruct the erase block contents from any set of 99 distinct packets. You can change the redundancy level from the ok prompt with:

 ok d# 10 to redundancy
 ok nb-update

In my tests in a very quiet RF environment, I've seen good results with redundancy as low as 3%, but I expect that a higher number is a better default. Increasing the redundancy increases the probability that the receiver will get a complete set in one pass, but it slows down each pass.

The receiver can start anywhere in the sequence. It collects packets, storing them on the NAND FLASH with a bit of overflow into RAM, until it has 99 distinct packets for each eraseblock. Then it does the mathematical magic to reconstruct the data, writing the reconstruction back to the NAND. The math essentially amounts to solving a set of simultaneous equations.

If the packet error rate is reasonably low (less than about 10%), there's a good chance that the receiver will get a complete set of 99 packets for every erase block in one server cycle. If not, the receiver keeps listening until it finally gets a complete set. I've tested this with error rates approaching 50%, and it still works, although it takes a long time.

Partitions

If the NAND source data is partitioned, either by cloning a partitioned NAND or by using a control file specifying partitions, the receiver will either use an existing partition map or create a new one. The source partition information includes three items of information for each partition:

  • Name - a string of up to 32 characters
  • Suggested size - The number of blocks to allocate when creating a new partition table
  • Used size - The number of blocks of data that are being sent for this partition

The receiver inspects its existing partition map. If, for each partition in the source information, there is an existing partition with the same name whose existing size is at least "Used size", the existing partition map will be used as-is. Otherwise a new partition map will be created (overwriting the old one if present), with "Suggested size" for the partition sizes.

If the existing partition map is re-used, any existing partitions that aren't mentioned in the source partition information are left alone. That can be used to preserve user data partitions while replacing the OS partition.

When you use "nb-clone" to send a partitioned image from the sender's NAND, it will send all the partitions. It's possible to restrict it so it only sends a subset of those partitions, but the user interface for that is clumsy at present and I don't want to publicize it until I can offer a better UI. If you need it right away, contact me (Mitch Bradley) and I'll work with you.

Speedups

There's a way to make the sender go faster, by feeding it a file with pre-computed packet data. That feature works, but the instructions and use cases aren't ready yet. All told, the faster sending doesn't end up being a huge win, because it pushes the receiver to the point where it starts to drop a few packets, so the net throughput improvement is modest at best.

It's probably possible to shave a couple of minutes off the total receiver time by overlapping the decoding of fully-received blocks with packet reception. But that won't be trivial to implement.

Source Code

The source for the OFW version is currently at [1]

The source for David Woodhouse's original Linux version is at git://git.infradead.org/mtd-utils.git . The OFW version has diverged a great deal from the original. The two no longer interoperate - I had to extend the protocol headers to handle partitions and security.

Background

David Woodhouse developed this scheme while at Quanta and also tested it in Mongolia. Mitch Bradley ported it to OFW and is the current owner.

The following captures an IRC discussion that contains important information about a multicast-distribution scheme for updating XOs wirelessly.

	<erikg>	Mitch_Bradley: XS-based wireless reflash is a *good* idea. i read some emails on the topic, but am not entirely clear on the issues which are involved.
<Mitch_Bradley>	erikg: XS-based reflash should work in principle. I tested reflashing over wireless on my home network with success.
<Mitch_Bradley>	erikg: the issue is whether the XS network is discoverable via the essids mentioned
<Mitch_Bradley>	erikg: and also there were some problems with OFWs support of mesh-mode for the wlan
	<erikg>	Mitch_Bradley: turning on a bunch of laptops and letting them sit takes considerably less manual effort than the usb-based (serial) approach.
                In peru it would help them to reduce the warehouse staff, most of whom do double-duty activating and physically distributing the laptops.
<Mitch_Bradley>	erikg: OFW works well in access point mode, but how well it works in ad hoc mode is questionable
	<erikg>	Mitch_Bradley: in OFW wireless reflash the reflash is the same aside from the source of the data?
	<erikg>	Mitch_Bradley: i'm only suggesting use in AP mode.
<Mitch_Bradley>	erikg: yes
	<erikg>	Mitch_Bradley: and can it use multicast?
	<erikg>	Mitch_Bradley: e.g. so there can be a bunch of laptops turned on which then get the data?
<Mitch_Bradley>	erikg: no, OFW doesn't have support for multicast yet
	* erikg	thought he saw an email from david woodhouse on the topic, but maybe his memory fails him
<Mitch_Bradley>	erikg: dwmw2 developed a multicast update scheme that includes forward error correction
	<erikg>	Mitch_Bradley: forward error correction?
<Mitch_Bradley>	erikg: the plan was for OFW to include support for that protocol at some point
<Mitch_Bradley>	erikg: if you do straight multicast , the probability of an individual client missing a packet is rather high
	<erikg>	Mitch_Bradley: right. the emails i read mentioned this.
<Mitch_Bradley>	erikg: so each client probably has to wait quite a few overall transmits before it has a complete set
<Mitch_Bradley>	erikg: if you send each packet with error correction codes, you can greatly reduce the probability of botched packets
	<erikg>	Mitch_Bradley: i see. so the ECC work and multicast work need to be done? or has dwmw2 already completed the multicast work?
<Mitch_Bradley>	erikg: dwmw2 has a working multicast scheme that requires booting a small ram-based Linux on the clients
	<erikg>	Mitch_Bradley: interesting. perhaps that would provide much more flexibility
<Mitch_Bradley>	erikg: so, while it would be nice to have it directly in OFW, that's not strictly necessary
<Mitch_Bradley>	erikg: OFW would be advantageous to reduce the network congestion from several laptops all trying to boot that ram-based Linux
	<dwmw2>	erikg: http://gallery.infradead.org/main.php?g2_itemId=1540
	<erikg>	Mitch_Bradley: i guess it depends on how big the ram-based linux is
	<dwmw2>	that's the machines being installed in Mongolia, all at once over wireless+multicast.
	<dwmw2>	http://david.woodhou.se/olpc-nandcast.tar.gz
	<dwmw2>	or something like that
	<dwmw2>	you don't need much in it
	<dwmw2>	just the recv_image program from the mtd-utils, basically. And a script which brings the network up and runs it.
	<erikg>	dwmw2: what's needed on the server side?
	<dwmw2>	send_image :)
	<dwmw2>	sorry, serve_image
<Mitch_Bradley>	dwmw2: wasn't there some problem with some APs?
<Mitch_Bradley>	like, multicast being throttled to a ridiculously low data rate?
	<dwmw2>	some APs only send multicast at the lowest possible rate (1Mb/s). Which sucks. When I first set it up I got an AP which
                lets you set the multicast rate (which was left in smithbone's custody and is around 1cc somewhere iirc).
	<dwmw2>	more recently (and in Mongolia) I was just doing it over mesh not infra mode, with a libertas dongle in my shinybook
	<dwmw2>	remember to set the mesh ttl to 1 and the mesh broadcast rate to something sane (like 11Mb/s iirc)
	<erikg>	dwmw2: because if you set it too high the error rate goes way up?
	<dwmw2>	becaue if you set it too low, you die of boredom
	<erikg>	dwmw2: haha
	<erikg>	dwmw2: would you please email me information about the brand of AP in question?
	<erikg>	brand/version
	<dwmw2>	I think it was a Buffalo one
	<erikg>	ok
	<erikg>	they've got 40k machines to flash here in peru and they're doing it by usb dongle
	<erikg>	dwmw2: so what happens on the xo-side?
	<dwmw2>	I went round the mall by the hotel in Shanghai, took photos of all the APs I could see, went back to the hotel and googled for their manuals.
	<dwmw2>	then went back and bought the one which let you configure the multicast rate
	<erikg>	hahaa
	<dwmw2>	sending from a libertas dongle is probably easier
	<erikg>	oh i mean via a usb flash memory
	<erikg>	dwmw2: what do you have to do on the xo-side? what's the procedure?
	<dwmw2>	for the XO side see that tarball.
	<dwmw2>	boot into an initrd with a script which brings the network up and runs the rx tool
	<dwmw2>	you only have to plug the usb key in for long enough for the kernel to start booting, then you can rip it out and move on to booting the next machine
	<erikg>	so you have to flash this onto all the machines
	<erikg>	right
	<dwmw2>	∀ machine, insert key, power on, wait a few seconds, remove key
	<erikg>	dwmw2: this appears to be lacking the server code
	<erikg>	or am i misunderstanding something?
	<dwmw2>	that's for the XO side.
	<erikg>	right
	<dwmw2>	the server code is in the mtd-utils
	<dwmw2>	git.infradead.org/mtd-utils.git
	<erikg>	serve_images
	<erikg>	i've git it before
	<erikg>	serve_image
	<dwmw2>	./serve_image ff0f::1234 1234 myimage.img 131072 1800
	<dwmw2>	or something like that
	<erikg>	cool!
	<dwmw2>	239.255.255.1 or something if you want Legacy IP instead of IPv6 (but then you have to set up Legacy IP addresses too, on the client side)
	<erikg>	do you know what other deployments are using this?
	<dwmw2>	I think I did that at one point -- my script in the tarball I showed you probably sets a 10.x.x.x address using the last three bytes of the MAC address?
	<erikg>	or is it just mongolia
	<dwmw2>	it was just Mongolia, while I was there.
	<erikg>	i am seeing olpc.fth, vmlinuz, and initrd.img ... the script is in the initrd
	<erikg>	are they still using this method?
<Mitch_Bradley>	they don't have dwmw2's shinybook
	<dwmw2>	they don't need it -- just a libertas dongle. But we never really polished it enough for them to be able to use it.
	<erikg>	hmmmm
	<dwmw2>	We _could_ start streaming the image from the servers we have in place, but we haven't
	<dwmw2>	the serve_image program just spews the image out (with 100% FEC) over and over again.
	<dwmw2>	it's the only thing that needs to send on the network.
	<dwmw2>	makes the network not very usable, when it's running full-stream. Even kills my Bluetooth mouse :)
	<dwmw2>	but installs a whole lot of laptops nice and fast.
	<dwmw2>	the hardest part was plugging all the damn things in for long enough to do the OF upgrade :)
	<dwmw2>	the other reason it's not an 'official' install method is because the tool doesn't check the signature of the images
        <dwmw2>	it wouldn't be hard to handle that (I think we talked about not writing the first block of the image until it was all
                checked and the signature passed; writing some marker there which told OF not to use it?)
        <dwmw2>	but we never did it. So we had a signed key with my XO-side stuff on it, which had a list of the serial numbers for
                Mongolia and would _only_ work on those laptops.
	<dwmw2>	but would let you install _anything_ on those laptops, of course.
	<erikg>	dwmw2: i see
	<dwmw2>	if we want to be able to use multicast 'in anger', we should fix that -- it shouldn't be hard.
	<dwmw2>	I think writing a new node type which is unknown to any existing JFFS2 implementation and has the 'cannot mount if
                unknown' compatibility bitmask ought to be sufficient -- write that in the first block of the flash when we start
                to rx the image.
	<dwmw2>	then only remove it when the image is completely received and verified.
<Mitch_Bradley>	The OFW updater has support for such a feature.  It uses a marker in the otherwise-unused portion of the OOB area
	<dwmw2>	we could use that then, since marking it unusable in OFW should be perfectly sufficient.

NANDBlaster for XO-1.5

See Nandblaster_for_XO-1.5