Gen2 CPU Ideas

From OLPC
Revision as of 13:39, 30 January 2008 by Gnu (talk | contribs) (Generation 2 CPU design ideas)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Projects and proposals    +/-
Content ideas Content projects
Hardware ideas Hardware projects
Software ideas Software projects

Ideas for processors and system chip designs for 2nd Generation OLPC laptops. I encourage other engineers who have studied the current suspend/resume architecture and implementation to provide or link to accurate timing information about current suspend/resume and other architectural limitations, and make further suggestions.

Suspend/Resume

  • Geode LX chip takes about 900ms to resume, using Ship.1 kernels. Clearly, major optimization in the kernel can improve this, particularly by restarting USB asynchronously. (Ditto some other subsystems, perhaps.)
  • Gen2 CPU/system chips should be able to suspend power to the CPU much more quickly, and resume from powered-down state much more quickly. Particular areas where small changes could bring large improvements:
    • Clock generation / PLL: Time required on power-up to get the clock set right and then lock its phase-locked loop (PLL) is excessive (measured at: XX ms). In a fast start-up, the clocks should be stable within a few microseconds (not milliseconds). This probably requires keeping the PLL powered during suspend, with its output to the rest of the powered-down chip gated off. Engineer for shorter and more predictable times from cold startup as well.
    • Memory interface: In a fast start-up, memory clocks should be stable within a few microseconds (not milliseconds). The CPU should provide no impediments to beginning full speed memory read cycles as soon as the DRAM chips are specified to be brought out of low power internal self-refresh. If the DRAM chips require more than a few microseconds of leadtime, hardware should be able to provide that leadtime at the same time it triggers the resumption of power to the CPU chip, to overlap the CPU and memory power-up latencies.
    • Interrupt controller: Should be powered and fully functional during CPU power-down. If an interrupt is triggered, and not masked off, during suspend, it exits suspend, and powers up the CPU to take the interrupt.
    • High speed startup cache. A small internal memory in the CPU should be loadable with the data required for the CPU to resume (e.g. the reset interrupt vector, and initial instruction sequence). If set-up by the software, and enabled during suspend, then when the CPU is next powered up, the CPU's normally extremely slow accesses to BIOS flash chips after reset would be completely avoided. Sufficient startup code should be loaded into the CPU so that software can initialize the memory controller, CPU caches, and TLB cleanly. Further instruction fetching could then resume directly from external DRAM.
    • Integrated USB controller: Should be able to power this subsystem up and down independently of the other parts of the CPU. Should be able to operate in a quiescent state without accessing main memory at all. Thus, even if USB devices are plugged in to the system, the CPU should be able to quiesce any I/O to the USB devices, then suspend and power down the CPU, halting main memory accesses. The USB subsystem would continue powering and polling USB devices, using internal resources rather than main memory linked lists. If an interrupt condition arises (such as a device interrupt, or insertion or removal event), the interrupt, routed through the interrupt controller, could exit suspend.
    • Wireless controller: Should be able to power this subsystem up and down independently of the other parts of the CPU. Should be able to operate in a quiescent state without accessing main memory at all. Standard interrupt controls and masks should break the CPU out of suspend as needed. Best if not attached via USB (so USB subsystem need not remain powered up to keep it running). During CPU suspend, Ethernet chip should be able to minimize its power usage and operate without making main memory accesses, keeping its physical layer connection "live" while internally buffering at least the first packet that arrives. That packet would be matched against existing address masks and filters to determine whether to interrupt/wake the CPU. An additional two-byte equal comparison option for a byte of received broadcast packets would permit a 256-fold reduction in spurious wakeups/interrupts for ARP broadcasts that are not addressed to this node (comparing the packet type to ARP, and if true, avoiding an interrupt unless the low byte of the requested IP address matches ours).
    • Clocks and Timers: Several timers should be powered and fully functional during CPU power-down. One should be a trimmable accurate long-term realtime clock. Another should be a high resolution, short or long delay interrupt source. All of these timers should be able, through the interrupt controller, to awaken the CPU from suspend. The software should be able to notice that the CPU will not be needed for the next 100ms, set a timer for 98ms, power itself down using e.g. 8ms, be powered back up 90ms later by the timer interrupt, spend 2ms recovering from suspend, and put itself back to work.
    • Gigabit Ethernet: Should be able to power this subsystem up and down independently of the other parts of the CPU. Should be able to sense a cable connection and/or signalling from the other end of a cable while in extremely low power mode, causing an interrupt and possible resume. Software would then power up the rest of the Ethernet subsystem, enable negotiation of LAN speed, etc. During CPU suspend, Ethernet chip should be able to minimize its power usage and operate without making main memory accesses, keeping its physical layer connection "live" while internally buffering at least the first packet that arrives. That packet would be matched against existing address masks and filters to determine whether to interrupt/wake the CPU. An additional two-byte equal comparison option for a byte of received broadcast packets would permit a 256-fold reduction in spurious wakeups/interrupts for ARP broadcasts that are not addressed to this node (comparing the packet type to ARP, and if true, avoiding an interrupt unless the low byte of the requested IP address matches ours).
    • Video generation: Should provide tighter integration between main video and DCON. Provide finer control of frame rate, permitting frames to be scanned out manually, one at a time; or at any integer frequency between 1 and 200 Hz. Able to change frame rate in the middle of a frame, so that a frame that was started at a slow (power-saving) rate can be sped-up to scan the rest of the frame at maximum rate (after resuming and after a write to the frame buffer). Alternatively, be able to reset the current frame so that it is cleanly abandoned mid-frame, and a new frame is begun at a new rate, without causing display artifacts. This would give the software much lower and more predictable resume-times for providing visible feedback from some action that resumed from suspension.

Integration

High integration of the CPU, system control, memory controller, video, peripheral interfaces, and discrete components in a system-on-chip is desirable for reliability and cost reduction.

XO-1 Power Management Design Notes

Power Management Power Management Scenarios

Measurement of Geode LX (XO-1) Resume Time

Measurement of Geode LX (XO-1) Suspend Time