Gen2 CPU Ideas
Projects and proposals +/- | |
Content ideas | Content projects |
Hardware ideas | Hardware projects |
Software ideas | Software projects |
Ideas for processors and system chip designs for 2nd Generation OLPC laptops. I encourage other engineers who have studied the current suspend/resume architecture and implementation to provide or link to accurate timing information about current suspend/resume and other architectural limitations, and make further suggestions.
Suspend/Resume
- Geode LX chip takes about 900ms to resume, using Ship.1 kernels. Clearly, major optimization in the kernel can improve this, particularly by restarting USB asynchronously. (Ditto some other subsystems, perhaps.)
- Gen2 CPU/system chips should be able to suspend power to the CPU much more quickly, and resume from powered-down state much more quickly. Particular areas where small changes could bring improvements:
- Clock generation / PLL: Time required on power-up to get the clock set right and then lock its phase-locked loop (PLL) is excessive (measured at: < 4 ms). In a fast start-up, the clocks should be stable within a few microseconds (not milliseconds). This probably requires keeping the PLL powered during suspend, with its output to the rest of the powered-down chip gated off. Engineer for shorter and more predictable times from cold startup as well.
- Memory interface: In a fast start-up, memory clocks should be stable within a few microseconds (not milliseconds). The CPU should provide no impediments to beginning full speed memory read cycles as soon as the DRAM chips are specified to be brought out of low power internal self-refresh. If the DRAM chips require more than a few microseconds of leadtime, hardware should be able to provide that leadtime at the same time it triggers the resumption of power to the CPU chip, to overlap the CPU and memory power-up latencies.
- Interrupt controller: Should be powered and fully functional during CPU power-down. If an interrupt is triggered, and not masked off, during suspend, it exits suspend, and powers up the CPU to take the interrupt.
- High speed startup cache. A small internal memory in the CPU should be loadable with the data required for the CPU to resume (e.g. the reset interrupt vector, and initial instruction sequence). If set-up by the software, and enabled during suspend, then when the CPU is next powered up, the CPU's normally extremely slow accesses to BIOS flash chips after reset would be completely avoided. Sufficient startup code should be loaded into this on-CPU memory so that the software contained therein can initialize model-specific registers, the memory controller, CPU caches, MMU and TLB cleanly. Further instruction fetching could then resume directly from external DRAM. This would reduce power-on resume time by < 4 ms.
- Integrated USB controller: Should be able to power this subsystem up and down independently of the other parts of the CPU. Should be able to operate in a quiescent state without accessing main memory at all. Thus, even if USB devices are plugged in to the system, the CPU should be able to quiesce any I/O to the USB devices, then suspend and power down the CPU, halting main memory accesses. The USB subsystem would continue powering and polling USB devices, using internal resources rather than main memory linked lists. If an interrupt condition arises (such as a device interrupt, or insertion or removal event), the interrupt, routed through the interrupt controller, could exit suspend. This would enable clean suspension even when USB devices are plugged in. Should also be able to power down the entire interface for individual USB devices (e.g. a flash memory stick, when we have no immediate plans to read or write it).
- Wireless controller: Should be able to power this subsystem up and down independently of the other parts of the CPU. Should be able to operate in a quiescent state without accessing main memory at all. Standard interrupt controls and masks should break the CPU out of suspend as needed. Best if not attached via USB (so USB subsystem need not remain powered up to keep it running), though should not provide proprietary firmware with dangerous operations such as an ability to control a DMA. During CPU suspend, Ethernet chip should be able to minimize its power usage and operate without making main memory accesses, keeping its physical layer connection "live" while internally buffering at least the first packet that arrives. That packet would be matched against existing address masks and filters to determine whether to interrupt/wake the CPU. An additional two-byte equal comparison option for a byte of received broadcast packets would permit a 256-fold reduction in spurious wakeups/interrupts for ARP broadcasts that are not addressed to this node (comparing the packet type to ARP, and if true, avoiding an interrupt unless the low byte of the requested IP address matches ours).
- Clocks and Timers: Several timers should be powered and fully functional during CPU power-down. One should be a trimmable accurate long-term realtime clock. Another should be a high resolution, short or long delay interrupt source. All of these timers should be able, through the interrupt controller, to awaken the CPU from suspend. The software should be able to notice that the CPU will not be needed for the next 100ms, set a timer for 98ms, power itself down using e.g. 8ms, be powered back up 90ms later by the timer interrupt, spend 2ms recovering from suspend, and put itself back to work.
- Gigabit Ethernet: Should be able to power this subsystem up and down independently of the other parts of the CPU. Should be able to sense a cable connection and/or signalling from the other end of a cable while in extremely low power mode, causing an interrupt and possible resume. Software would then power up the rest of the Ethernet subsystem, enable negotiation of LAN speed, etc. During CPU suspend, Ethernet chip should be able to minimize its power usage and operate without making main memory accesses, keeping its physical layer connection "live" while internally buffering at least the first packet that arrives. That packet would be matched against existing address masks and filters to determine whether to interrupt/wake the CPU. An additional two-byte equal comparison option for a byte of received broadcast packets would permit a 256-fold reduction in spurious wakeups/interrupts for ARP broadcasts that are not addressed to this node (comparing the packet type to ARP, and if true, avoiding an interrupt unless the low byte of the requested IP address matches ours).
- Video generation: Should provide tighter integration between main video and DCON. Provide finer control of frame rate, permitting frames to be scanned out manually, one at a time; or at any integer frequency between 1 and 200 Hz. Able to change frame rate in the middle of a frame, so that a frame that was started at a slow (power-saving) rate can be sped-up to scan the rest of the frame at maximum rate (after resuming and after a write to the frame buffer). Alternatively, be able to reset the current frame so that it is cleanly abandoned mid-frame, and a new frame is begun at a new rate, without causing display artifacts. This would give the software much lower and more predictable resume-times for providing visible feedback from some action that resumed from suspension (saving up to 40ms).
Integration
High integration of the CPU, system control, memory controller, video, peripheral interfaces, and discrete components in a system-on-chip is desirable for reliability and cost reduction.
- Flash interface on-chip should not limit the thruput available from common Flash chips.
- SD interface (perhaps more than one) on chip should not limit the thruput available. 8-bit SD interface. Able to be powered on with CPU off (to avoid upsetting an SD chip) and cause interrupt/resume if an event occurs. Should support low pin count hard disk interface (interrupt improvements to standard SD/SDIO interface). This is the mass storage interface, so make it as fast as possible.
Able to be turned off with CPU on, yet when in very low power mode, able to sense and signal insertion or removal of an SD card.
XO-1 Power Management Design Notes
Measurement of Geode LX (XO-1) Resume Time
The sum total of CPU clock PLL time, memory controller initialization, and slow BIOS accesses after reset is on the order of 4 mS, so reducing them to 0 would have only a small impact.
The "long pole" in the pre-OS time is the 20 mS time between power applied and when the CaFe chip responds to the first PCI config access. CaFe config accesses are necessary in order to reestablish the CaFe's base addresses and timing parameters - those timing parameters are outside the domain of the standard SDHCI programming model, so the portable OS driver doesn't know about them. The firmware attempts to mitigate that 20 mS time by overlapping it with the DCON "resync to video frame" time, also on the order of 20 mS (one video frame). It also turns on the USB power prior to the first CaFe config access, so the USB device settling time is overlapped with the CaFe and DCON times.
Since the CaFe ready-for-config delay starts when power is applied, making the FLASH-access+PLL+memory-init time go to zero would not decrease the startup time at all on the current system.
If I don't reinit the CaFe chip or the DCON, I can get from button-press to ok prompt on serial in about 4 mS. With CaFe reinit, button to serial ok is about 20 mS. With CaFe + DCON, button to screen ok takes between 20 and 40 mS, depending on the phase of the video frame relative to the power on.
(After a kernel-initiated suspend, the above times would be followed by the kernel resume time, which at 900ms obviously dominates the 20-40ms firmware resume time.)
Measurement of Geode LX (XO-1) Suspend Time
Candidate Processors
This section (indeed, the whole page) is all about ideas -- not about deals with chip vendors, plans for products, or anything so concrete. Add your own ideas.
- X86
- AMD Geode family extensions (SSE1)
- Intel "Diamondville" (SSE1, SSE2, SSE3, 64-bit, virtualization, 2 cores, some (2?) threads per core)
- VIA Isaiah (SSE1, SSE2, SSE3, 64-bit, virtualization)
- non-X86
- Sun UltraSPARC T2 core. GPL'd hardware module implements 64-bit SPARC instruction set, memory controller, in 1x to 8x cores with up to 8 threads apiece. Designed for servers, we'd spin down the cores, threads and clock rate for low absolute power consumption. (64-bit, virtualization, 8 cores, 8 threads per core)