Device tree upgrade considerations

From OLPC
Revision as of 11:45, 20 March 2013 by DanielDrake (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The approach taken in current XO-1.75 releases is to include a fairly ugly XO-1.75 "board file" in the kernel, describing the XO-1.75 in fairly gruesome detail (memory addresses of component register spaces, GPIO numbers and their meanings, which hardware is included in the laptop, ...).

As we move to using the firmware-provided device tree to trigger kernel device detection (the accepted direction for Linux on ARM upstream) for both XO-1.75 and XO-4, we run into some new considerations that need attention.

The problem

Field upgrades for XO-1.75

We currently have XO-1.75s in the field with a somewhat minimal device tree in their firmware, not sufficient to drive kernel device detection. As we upgrade the kernel and firmware for device tree goodness, we hit some difficult scenarios. In the following descriptions, "old" refers to versions that are shipped in current releases (e.g. 11.3.1) and "new" refers to future versions that will be DT-driven.

New firmware, old kernel

While at a glance the requirement of running a new firmware on top of an old software base may seem uncommon, there are cases where this might not be true. For example, if someone installs a new software release on an XO-1.75, and then downgrades to an old release. And it appears that the AU deployment goes to special lengths to upgrade the firmware before upgrading the OS (which was required for rolling out XO-1.5 nandblaster).

When combining a new, DT-improved firmware with an old release kernel, we should not expect the system to boot. This is because the new firmware boots with a new /chosen/bootpath value which is not recognised by the initramfs shipped with the old kernel.

This problem has been worked around. At boot time, the firmware now scans the initramfs and if it identifies that the initramfs corresponds to an "old" kernel, it artificially modifies the /chosen/bootpath value passed to the system, therefore maintaining compatibility with the old kernel/OS.

Old firmware, new kernel

The issue here is that the old firmware does not present a good-enough device tree to the kernel, and the new kernel does not have the old/static XO-1.75 board definitions. The system won't boot - some corruption appears at the bottom of the screen, and nothing appears over serial.

This is a case we definitely have to care about. When system updates are done in the field, it is not guaranteed that electricity will be available upon the reboot in order to install the updated firmware. So we need to keep this case working.

We know this is an issue because we've pushed OS updates dependent on new firmwares before (when we moved to using DMI for identifying x86 laptops), then we had to revert that upon realising the field difficulties.

General device-tree changes in future

This problem could potentially extend to XO-4 too. Right now we are working under time pressure to define device tree nodes and structures describing all our hardware, and then writing kernel code to work with such a layout. But a great deal of this is not upstream in the kernel, and it is plausible that the layouts and definitions will change during the upstreaming process. This could lead to a similar situation that we have on the XO-1.75: an "old" kernel/firmware combination (the ones we're developing now) in this release, and a "new" kernel/firmware combination in a future release where everything is upstreamed in a somewhat incompatible way. Then we face the same issues of the old-kernel/new-firmware and new-kernel/old-firmware situations described above.

There may also be other reasons for the DT to change in future, but I'm hopeful that there won't be. During interactions with the devicetree list I have seen that once a DT structure gets upstream and documented in Documentation/devicetree, people are firm about keeping it working. For similar reasons they are also strict that any new kernel code that parses the DT has that interface documented in Documentation/devicetree. -DanielDrake 17:55, 23 August 2012 (UTC)

Potential solutions

Bail out

One option is for the firmware to somehow learn about which kernels it supports, and to refuse to boot (printing an informational message) when an unsupported kernel is found.

This is the least attractive option for the user/deployer (leaves them with a non-working system). Internationalisation concerns would apply to the informational message, and its also not clear how OFW would measure a kernel for whether it is supported or not.

Ship board files

The one nice thing about having the board file in the kernel is that it didn't create this critical bridge between firmware and kernel. If we were to use board files instead of DT we would have an easier ride with reference to the issues described above.

The downside here is that we'd have to maintain these board files forever (they wouldn't go upstream), and the fact that they are technically ugly.

Include DT in the kernel

Linux has some support for including the devicetree statically in the kernel image. This helps solve the above problems by not requiring the firmware version to be coupled to the kernel version.

The disadvantage is that we would have to copy the device tree from the firmware. This isn't great, but is at least better than having to maintain a board file (which would actually involve maintaining two separate formats, rather than just a copy).

As the XO-1.75 and XO-4 have different device trees, this would perhaps remove the possibility of having the same OS image for both XO-1.75 and XO-4. But it is not clear if that was ever a hard goal or how much benefit it would present to our users. If we are considering the option of including the DT in the kernel we may need to have that discussion now.

This may present some further complications. The initramfs reads chosen/bootpath from the device tree to determine if we're booting from internal or external media - this part of the device tree cannot be made static. Also, how do we handle variations of laptops - would we need different device trees for each configuration? (For example, the XO-4 will be available with or without touchscreen, and we will also be shipping a mixture of camera sensors).

Other ideas mentioned include:

  • Ship a DT in the kernel, but use the real firmware one if it is correct
    • However, it's not clear how the kernel could measure if the firmware's DT is "correct" (which would mean both not too old, and potentially not too new)
  • Ship various versions of the DT in the firmware
    • However, this wouldn't solve the "old firmware, new kernel" situation, and its not clear how the firmware would know which version of the DT to present.
    • Furthermore, the DT is quite intrinsic to the firmware itself, it is not something that gets bolted on at last minute just to make Linux happy. So this would be a technically strange/ugly solution involving the firmware presenting a DT different from the one it actually used to init the system.