TimeKeeping

From OLPC
Jump to: navigation, search

Kernel Timekeeping

Internally, Linux maintains the time as jiffies and a jiffy offset. A jiffy is 1 clock tick. The jiffy offset is in some other unit, such as the CPU cycle period.

The kernel time is UTC. Converting to local time is a UI issue.

Most PCs have 2 crystals that can be used for timekeeping.

  • The crystal that feeds the CPU clock is usually available through some sort of cycle-counting register, for example TSC on Intel systems. When you want to know what time it is, you read that register, subtract the previous reading, scale by the speed of your CPU, add that to the time in memory, and return that time.
  • There is 32 KHz crystal that drives the TOY clock, aka hwclock. It's battery backed so your system knows the time when you first turn it on. That clock generates scheduler interrupts HZ times per second. (HZ is a kernal compile time parameter.) On every scheduler interrupt, you bump the time in memory by X microseconds. HZ is typically a coarse tick size so most implementations use the cycle counter to interpolate between ticks.

That's a rough description. Here are some details.

  • The XO kernel is currently using the TSC counter. Unfortunately, there is a bug (not XO specific) in the CPU calibration code. It doesn't get a consistent answer. It doesn't vary by much, but it's easy to notice if you are trying to keep good time, especially if you reboot frequently.
  • The crystals as shipped from the factory are not right on. They are typically off by a few parts per million. My XO is off by ~6 ppm. I have two PCs that are off by >100 PPM. 1 PPM is roughly 1 second per week. Most kernels have a fudge-factor to correct for that. This makes a huge difference. (At least if you are a timekeeping nut.) NTP calls this parameter "drift".
    • That drift is temperature dependent. It's (very) roughly 1 PPM per degree C.
    • You can use tickadj to tweak it by hand. You can compute the value with the following recipe:
      • Reboot the system (to make sure things are clean)
      • Set the clock by hand.
      • Wait a week.
      • Compute the correction.
  • Missed interrupts cause problems if you are using the scheduler interrupts to bump the clock. That often happens on a regular PC if HZ is set to 1000, because a normal PC BIOS will often use the SMM/SMI mechanism to steal CPU time from Linux. 250 usually works. The XO does not suffer from this problem, and anyway it uses a tickless kernel.
  • Using the TSC for timekeeping gets complicated if the CPU speed is adjusted on the fly to save power or avoid overheating.
  • I don't know how turning off the CPU to save power is going to interact with timekeeping.

Keeping your Time Correct

Even if you start with the correct time, you system clock will drift, just like your watch drifts.

There are several general approaches to keeping your system clock accurate enough. Note that "close enough" depends upon your needs.

  • You can set the time by hand whenever you notice that your clock is too far off.
  • You can use sntp (ntpdate is depricated) to set the clock at boot time and then repeat occasionally (hours, days, or weeks) after that via a cron job. This is generally good enough for human interaction and has minimal system load.
  • The normal approach for most *nix system is to run ntpd. It's included in most distributions. It's in the XO but not started at boot time. To turn it on:
 /sbin/chkconfig --add ntpd
 /sbin/chkconfig --levels 35 ntpd on

but see below.

NTP Overview

NTP servers are organized into levels labeled stratum. Stratum 1 servers have atomic clocks or some other magic to tell them the time. Stratum 2 servers get their time from stratum 1 servers. ...

ntpd is a typical unix daemon. It is normally started at boot time. It talks to several ntp servers, does lots of sanity checking, and adjusts the time on your local system. It also disciplines the local clock, that is it corrects for any inaccuracy in your crystal so that your system keeps pretty good time without any adjustments.

That correction is called drift. It's how much your clock would drift without the correction. ntpd stores it on the disk (typically /var/lib/ntp/drift) and reads it back on restart. That greatly reduces the startup transient.

ntpd is both a client and a server. It gets its time from lower stratum servers and acts as a server to other ntpd clients.

ntpd adjusts the polling interval to minimize the load on the network and servers. It starts at 64 seconds and stretches out to 1024 seconds when things are stable.

ntpd assumes that the path delays to the servers are symmetric. It works better if the servers are close. (That's close in network performance rather than kilometers.)

Normally, ntpd makes small adjustments to the local clock by slewing it. The clock speed is adjusted by 500 PPM by temporarily making the ticks bigger or smaller or the CPU speed faster or slower. Most applications don't notice and it avoids troubles with time going backwards. (Some applications get really unhappy if time goes backwards.) If the change is too big, ntpd will step the clock. (If the correction is huge, ntpd will exit/crash.)

Some schools, companies, and government agencies provide NTP servers as a public service. Some ISPs and colos provide them for their customers.

A typical configuration is for a company to have several NTP servers using the public servers and setup the rest of their systems to use the company servers. Using 3 servers will let ntpd ignore a broken server (aka falsticker). Using 4 servers will allow ntpd to ignore a broken server even if one of the 4 is down. (Two servers is a nasty case. If one is broken, you can't tell which one.)

There are major problems with zillions of users overwhelming the major public servers. This is complicated by brain damaged implementations. Anybody who is considering writing a ntp client should be sure to read NTP_server_misuse_and_abuse.

There is a pool project to setup lots of secondary servers and make them available via DNS. Most Linux and Unix distributions now come with ntp.conf setup to use the pool. The main problem with using the pool is that it doesn't magically pick good nearby servers. You are likely to get one with a long network delay.

For under $100 you can get a GPS receiver and run your own stratum 1 time server.

ntpd on XO

Bugs:

  • The CPU calibration factor doesn't get initialized correctly. (See above.)
  • ntpd gets confused if there are no network interfaces. NetworkManager does this. (The fix is simple. See me if you need it. Or just restart ntpd by hand: /sbin/service ntpd start)
  • ntpd expects to store the "drift" in the file system. XOs come setup to put that in RAM where it gets deleted on reboot. This means that ntpd takes a several hours to stabalize each time you boot it.
  • ntpd wakes up every second to do a little bookkeeping. That's not compatible with serious power saving.


Hal Murray, Dec 22, 2007