Watchdog: Difference between revisions

From OLPC
Jump to navigation Jump to search
m (Open Firmware proper name)
(→‎Epitaph HOWTO: missing text explanation)
 
(3 intermediate revisions by one other user not shown)
Line 7: Line 7:


* configures the watchdog clocks, making the watchdog ready to be used, but without turning on counting,
* configures the watchdog clocks, making the watchdog ready to be used, but without turning on counting,
* checks if the current restart of the SoC was caused by watchdog, and if so displays "watchdog restart" on the serial port, and sends a power cycle command to the embedded controller.
* checks if the current restart of the SoC was caused by watchdog, and if so displays "watchdog restart" on the serial port, dumps the kernel message buffer (see the Epitaph HOWTO below), and sends a power cycle command to the embedded controller.


== Open Firmware ==
== Open Firmware ==
Line 42: Line 42:
Lastly another register shows whether the watchdog timer caused a reset.
Lastly another register shows whether the watchdog timer caused a reset.
d408.0070: | |r| timers watchdog status register
d408.0070: | |r| timers watchdog status register

== Epitaph HOWTO ==

CForth and the kernel now have the epitaph feature, which in the event of a watchdog restart will dump the kernel message buffer to the serial port.

This is used in test beds and interactive use like this:

* set up a watchdog script, (e.g. with runin, ''touch /runin/watchdog'', or see [http://dev.laptop.org/~quozl/woof this script]),
* set up the serial port logging,
* begin testing, (there will be no interesting serial port output),
* if a hang occurs, examine the serial port log.

You can recognise the log data because it has <n> at the start of each line, which is how the kernel stores the messages in the kernel buffer.

The log may be incomplete due to memory cache not being flushed. There are two indicators of that:

* the most recent messages may be emitted by CForth first, ... this is caused by the log_end variable in DRAM not being as current as the variable's last value in cache,
* the most recent messages may have some old messages overlaid, usually a power of two number of bytes, ranging from 8 to 32 ... this is caused by the text writes not reaching DRAM,
* the most recent messages may have missing text segments, ... this is caused by the text writes not reaching DRAM, in conjunction with the CForth code that skips NULs instead of printing them.

Latest revision as of 21:37, 19 February 2012

A watchdog timer is built into the XO-1.75 and XO-3, as part of the CPU.

What is a watchdog timer? See Watchdog timer on Wikipedia.


CForth

  • configures the watchdog clocks, making the watchdog ready to be used, but without turning on counting,
  • checks if the current restart of the SoC was caused by watchdog, and if so displays "watchdog restart" on the serial port, dumps the kernel message buffer (see the Epitaph HOWTO below), and sends a power cycle command to the embedded controller.

Open Firmware

  • normally does nothing with the watchdog,
  • can start the watchdog for testing,
  • can provide the watchdog function to any deployment application written to run under Open Firmware.

Linux

  • normally does nothing with the watchdog,

Runin

  • normally does nothing with the watchdog,
  • can start the watchdog for unattended hang detection, allowing a hang to be detected, logged, and then corrected,

See /runin/sdkit-arm/watchdog and /runin/sdkit-arm/watchdog.fth for more details.

Internals

The watchdog timer in the ARMADA 610 is a 16-bit counter configured for 256 Hz operation. The counter can be read or cleared. It counts up by one every 3.9ms.

d408.006c: |                               | | | | | | | | | | | | | | | | |   timers watchdog value register (read-only)
d408.0098: |                                                             |C|   timers watchdog counter reset register (write-only)

Another register contains a match value. This may be set to a value from 3.9ms to four minutes and 12 seconds.

d408.0068: |                               | | | | | | | | | | | | | | | | |   timers watchdog match register

Yet another register can be used turn on and off counting, and to configure the response to a match.

d408.0064: |                                                           |R|C|   timers watchdog match enable register 

The response can be one of:

  • send an interrupt (R=0),
  • restart the processor (R=1).

We use the restart response.

Lastly another register shows whether the watchdog timer caused a reset.

d408.0070: |                                                             |r|   timers watchdog status register

Epitaph HOWTO

CForth and the kernel now have the epitaph feature, which in the event of a watchdog restart will dump the kernel message buffer to the serial port.

This is used in test beds and interactive use like this:

  • set up a watchdog script, (e.g. with runin, touch /runin/watchdog, or see this script),
  • set up the serial port logging,
  • begin testing, (there will be no interesting serial port output),
  • if a hang occurs, examine the serial port log.

You can recognise the log data because it has <n> at the start of each line, which is how the kernel stores the messages in the kernel buffer.

The log may be incomplete due to memory cache not being flushed. There are two indicators of that:

  • the most recent messages may be emitted by CForth first, ... this is caused by the log_end variable in DRAM not being as current as the variable's last value in cache,
  • the most recent messages may have some old messages overlaid, usually a power of two number of bytes, ranging from 8 to 32 ... this is caused by the text writes not reaching DRAM,
  • the most recent messages may have missing text segments, ... this is caused by the text writes not reaching DRAM, in conjunction with the CForth code that skips NULs instead of printing them.