Device tree upgrade considerations

From OLPC
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

The approach taken in current XO-1.75 releases is to include a fairly ugly XO-1.75 "board file" in the kernel, describing the XO-1.75 in fairly gruesome detail (memory addresses of component register spaces, GPIO numbers and their meanings, which hardware is included in the laptop, ...).

As we move to using the firmware-provided device tree to trigger kernel device detection (the accepted direction for Linux on ARM upstream) for both XO-1.75 and XO-4, we run into some new considerations that need attention.

The problem

Field upgrades for XO-1.75

We currently have XO-1.75s in the field with a somewhat minimal device tree in their firmware, not sufficient to drive kernel device detection. As we upgrade the kernel and firmware for device tree goodness, we hit some difficult scenarios. In the following descriptions, "old" refers to versions that are shipped in current releases (e.g. 11.3.1) and "new" refers to future versions that will be DT-driven.

New firmware, old kernel

When combining a new, DT-improved firmware with an old release kernel, the system will not boot. This is because the new firmware boots with a new /chosen/bootpath value which is not recognised by the initramfs shipped with the old kernel.

While at a glance the requirement of running a new firmware on top of an old software base may seem uncommon, there are cases where this might not be true. For example, if someone installs a new software release on an XO-1.75, and then downgrades to an old release. And it appears that the AU deployment goes to special lengths to upgrade the firmware before upgrading the OS (but it is not completely clear why).

Old firmware, new kernel

The issue here is that the old firmware does not present a good-enough device tree to the kernel, and the new kernel does not have the old/static XO-1.75 board definitions. The system won't boot - some corruption appears at the bottom of the screen, and nothing appears over serial.

This is a case we definitely have to care about. When system updates are done in the field, it is not guaranteed that electricity will be available upon the reboot in order to install the updated firmware. So we need to keep this case working.

We know this is an issue because we've pushed OS updates dependent on new firmwares before (when we moved to using DMI for identifying x86 laptops), then we had to revert that upon realising the field difficulties.

General device-tree changes in future

This problem could potentially extend to XO-4 too. Right now we are working under time pressure to define device tree nodes and structures describing all our hardware, and then writing kernel code to work with such a layout. But a great deal of this is not upstream in the kernel, and it is plausible that the layouts and definitions will change during the upstreaming process. This could lead to a similar situation that we have on the XO-1.75: an "old" kernel/firmware combination (the ones we're developing now) in this release, and a "new" kernel/firmware combination in a future release where everything is upstreamed in a somewhat incompatible way. Then we face the same issues of the old-kernel/new-firmware and new-kernel/old-firmware situations described above.

Potential solutions

Bail out

One option is for the firmware to somehow learn about which kernels it supports, and to refuse to boot (printing an informational message) when an unsupported kernel is found.

This is the least attractive option for the user/deployer (leaves them with a non-working system). Internationalisation concerns would apply to the informational message, and its also not clear how OFW would measure a kernel for whether it is supported or not.

Ship board files

The one nice thing about having the board file in the kernel is that it didn't create this critical bridge between firmware and kernel. If we were to use board files instead of DT we would have an easier ride with reference to the issues described above.

The downside here is that we'd have to maintain these board files forever (they wouldn't go upstream), and the fact that they are technically ugly.

Include DT in the kernel

Linux has some support for including the devicetree statically in the kernel image. This helps solve the above problems by not requiring the firmware version to be coupled to the kernel version.

The disadvantage is that we would have to copy the device tree from the firmware. This isn't great, but is at least better than having to maintain a board file (which would actually involve maintaining two separate formats, rather than just a copy).

As the XO-1.75 and XO-4 have different device trees, this would perhaps remove the possibility of having the same OS image for both XO-1.75 and XO-4. But it is not clear if that was ever a hard goal or how much benefit it would present to our users. If we are considering the option of including the DT in the kernel we may need to have that discussion now.

This may present some further complications. The initramfs reads chosen/bootpath from the device tree to determine if we're booting from internal or external media - this part of the device tree cannot be made static. Also, what happens when we start shipping different variants of the laptops - would we need different device trees for each configuration? (For example, the proposed camera sensor change)

Other ideas mentioned include:

  • Ship a DT in the kernel, but use the real firmware one if it is correct
    • However, it's not clear how the kernel could measure if the firmware's DT is "correct" (which would mean both not too old, and potentially not too new)
  • Ship various versions of the DT in the firmware
    • However, this wouldn't solve the "old firmware, new kernel" situation, and its not clear how the firmware would know which version of the DT to present.
    • Furthermore, the DT is quite intrinsic to the firmware itself, it is not something that gets bolted on at last minute just to make Linux happy. So this would be a technically strange/ugly solution involving the firmware presenting a DT different from the one it actually used to init the system.