Nandblaster for XO-1

From OLPC
Revision as of 16:54, 26 August 2008 by Wmb@firmworks.com (talk | contribs) (Multicast Update)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page captures an IRC discussion that contains important information about a multicast-distribution scheme for updating XOs wirelessly.

	<erikg>	Mitch_Bradley: XS-based wireless reflash is a *good* idea. i read some emails on the topic, but am not entirely clear on the issues which are involved.
<Mitch_Bradley>	erikg: XS-based reflash should work in principle. I tested reflashing over wireless on my home network with success.
<Mitch_Bradley>	erikg: the issue is whether the XS network is discoverable via the essids mentioned
<Mitch_Bradley>	erikg: and also there were some problems with OFWs support of mesh-mode for the wlan
	<erikg>	Mitch_Bradley: turning on a bunch of laptops and letting them sit takes considerably less manual effort than the usb-based (serial) approach. in peru it would help them to reduce the warehouse staff, most of whom do double-duty activating and physically distributing the laptops.
<Mitch_Bradley>	erikg: OFW works well in access point mode, but how well it works in ad hoc mode is questionable
	<erikg>	Mitch_Bradley: in OFW wireless reflash the reflash is the same aside from the source of the data?
	<erikg>	Mitch_Bradley: i'm only suggesting use in AP mode.
<Mitch_Bradley>	erikg: yes
	<erikg>	Mitch_Bradley: and can it use multicast?
	<erikg>	Mitch_Bradley: e.g. so there can be a bunch of laptops turned on which then get the data?
<Mitch_Bradley>	erikg: no, OFW doesn't have support for multicast yet
	* erikg	thought he saw an email from david woodhouse on the topic, but maybe his memory fails him
<Mitch_Bradley>	erikg: dwmw2 developed a multicast update scheme that includes forward error correction
	<erikg>	Mitch_Bradley: forward error correction?
<Mitch_Bradley>	erikg: the plan was for OFW to include support for that protocol at some point
<Mitch_Bradley>	erikg: if you do straight multicast , the probability of an individual client missing a packet is rather high
	<erikg>	Mitch_Bradley: right. the emails i read mentioned this.
<Mitch_Bradley>	erikg: so each client probably has to wait quite a few overall transmits before it has a complete set
<Mitch_Bradley>	erikg: if you send each packet with error correction codes, you can greatly reduce the probability of botched packets
	<erikg>	Mitch_Bradley: i see. so the ECC work and multicast work need to be done? or has dwmw2 already completed the multicast work?
<Mitch_Bradley>	erikg: dwmw2 has a working multicast scheme that requires booting a small ram-based Linux on the clients
	<erikg>	Mitch_Bradley: interesting. perhaps that would provide much more flexibility
<Mitch_Bradley>	erikg: so, while it would be nice to have it directly in OFW, that's not strictly necessary
<Mitch_Bradley>	erikg: OFW would be advantageous to reduce the network congestion from several laptops all trying to boot that ram-based Linux
	<dwmw2>	erikg: http://gallery.infradead.org/main.php?g2_itemId=1540
	<erikg>	Mitch_Bradley: i guess it depends on how big the ram-based linux is
	<dwmw2>	that's the machines being installed in Mongolia, all at once over wireless+multicast.
	<dwmw2>	http://david.woodhou.se/olpc-nandcast.tar.gz
	<dwmw2>	or something like that
	<dwmw2>	you don't need much in it
	<dwmw2>	just the recv_image program from the mtd-utils, basically. And a script which brings the network up and runs it.
	<erikg>	dwmw2: what's needed on the server side?
	<dwmw2>	send_image :)
	<dwmw2>	sorry, serve_image
<Mitch_Bradley>	dwmw2: wasn't there some problem with some APs?
<Mitch_Bradley>	like, multicast being throttled to a ridiculously low data rate?
	<dwmw2>	some APs only send multicast at the lowest possible rate (1Mb/s). Which sucks. When I first set it up I got an AP which lets you set the multicast rate (which was left in smithbone's custody and is around 1cc somewhere iirc).
	<dwmw2>	more recently (and in Mongolia) I was just doing it over mesh not infra mode, with a libertas dongle in my shinybook
	<dwmw2>	remember to set the mesh ttl to 1 and the mesh broadcast rate to something sane (like 11Mb/s iirc)
	<erikg>	dwmw2: because if you set it too high the error rate goes way up?
	<dwmw2>	becaue if you set it too low, you die of boredom
	<erikg>	dwmw2: haha
	<erikg>	dwmw2: would you please email me information about the brand of AP in question?
	<erikg>	brand/version
	<dwmw2>	I think it was a Buffalo one
	<erikg>	ok
	<erikg>	they've god 40k machines to flash here in peru and they're doing it by usb dongle
	<erikg>	dwmw2: so what happens on the xo-side?
	<dwmw2>	I went round the mall by the hotel in Shanghai, took photos of all the APs I could see, went back to the hotel and googled for their manuals.
	<dwmw2>	then went back and bought the one which let you configure the multicast rate
	<erikg>	hahaa
	<dwmw2>	sending from a libertas dongle is probably easier
	<erikg>	oh i mean via a usb flash memory
	<erikg>	dwmw2: what do you have to do on the xo-side? what's the procedure?
	<dwmw2>	for the XO side see that tarball.
	<dwmw2>	boot into an initrd with a script which brings the network up and runs the rx tool
	<dwmw2>	you only have to plug the usb key in for long enough for the kernel to start booting, then you can rip it out and move on to booting the next machine
	<erikg>	so you have to flash this onto all the machines
	<erikg>	right
	<dwmw2>	∀ machine, insert key, power on, wait a few seconds, remove key
	<erikg>	dwmw2: this appears to be lacking the server code
	<erikg>	or am i misunderstanding something?
	<dwmw2>	that's for the XO side.
	<erikg>	right
	<dwmw2>	the server code is in the mtd-utils
	<dwmw2>	git.infradead.org/mtd-utils.git
	<erikg>	serve_images
	<erikg>	i've git it before
	<erikg>	serve_image
	<dwmw2>	./serve_image ff0f::1234 1234 myimage.img 131072 1800
	<dwmw2>	or something like that
	<erikg>	cool!
	<dwmw2>	239.255.255.1 or something if you want Legacy IP instead of IPv6 (but then you have to set up Legacy IP addresses too, on the client side)
	<erikg>	do you know what other deployments are using this?
	<dwmw2>	I think I did that at one point -- my script in the tarball I showed you probably sets a 10.x.x.x address using the last three bytes of the MAC address?
	<erikg>	or is it just mongolia
	<dwmw2>	it was just Mongolia, while I was there.
	<erikg>	i am seeing olpc.fth, vmlinuz, and initrd.img ... the script is in the initrd
	<erikg>	are they still using this method?
	<Mitch_Bradley>	they don't have dwmw2's shinybook
	<dwmw2>	they don't need it -- just a libertas dongle. But we never really polished it enough for them to be able to use it.
	<erikg>	hmmmm
	<dwmw2>	We _could_ start streaming the image from the servers we have in place, but we haven't
	<dwmw2>	the serve_image program just spews the image out (with 100% FEC) over and over again.
	<dwmw2>	it's the only thing that needs to send on the network.
	<dwmw2>	makes the network not very usable, when it's running full-stream. Even kills my Bluetooth mouse :)
	<dwmw2>	but installs a whole lot of laptops nice and fast.
	<dwmw2>	the hardest part was plugging all the damn things in for long enough to do the OF upgrade :)
	<dwmw2>	the other reason it's not an 'official' install method is because the tool doesn't check the signature of the images
<dwmw2>	it wouldn't be hard to handle that (I think we talked about not writing the first block of the image until it was all checked and the signature passed; writing some marker there which told OF not to use it?)
dwmw2>	but we never did it. So we had a signed key with my XO-side stuff on it, which had a list of the serial numbers for Mongolia and would _only_ work on those laptops.
	<dwmw2>	but would let you install _anything_ on those laptops, of course.
	<erikg>	dwmw2: i see
	<dwmw2>	if we want to be able to use multicast 'in anger', we should fix that -- it shouldn't be hard.
	<dwmw2>	I think writing a new node type which is unknown to any existing JFFS2 implementation and has the 'cannot mount if unknown' compatibility bitmask ought to be sufficient -- write that in the first block of the flash when we start to rx the image.
	<dwmw2>	then only remove it when the image is completely received and verified.
Mitch_Bradley>	The OFW updater has support for such a feature.  It uses a marker in the otherwise-unused portion of the OOB area
	<dwmw2>	we could use that then, since marking it unusable in OFW should be perfectly sufficient.