Geode LX

From OLPC
Revision as of 09:41, 19 August 2007 by NoiseEHC (talk | contribs) (started)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page is about the AMD Geode processor which is the CPU of the XO. If you are not optimizing programs in assembly then you can just stop reading here. If you are an AMD engineer then feel free to correct my mistakes if any.

The numbers shown here are calculated from test program runs: test1 results test2 results

Architecture: The Geode contains an in-order pipelined integer unit (IU) with a tacked on FPU. Neither the integer ALU (IALU) nor the FPU ALU (FALU) are fully pipelined, so every instruction will take at least as many clock cycles as it is defined in the LX databook (so the numbers are throughputs in Intel slang). The IU can schedule at most 1 instruction per clock cycle. If it encounters an MMX instuction then it puts it into the MMX queue of the FPU which is 6 instructions deep. If the queue is full then it stalls the IU. TBD: FPU queue, 3DNOW queue.

The FPU is an out-of-order execution unit, because every instruction (only MMX tested) takes 6 cycles to execute (latency), the 2 cycles listed in the databook are throughputs in the undocumented FPU pipeline. When you read the MMX registers to integer register the FPU pipeline can stall the IALU. In practice it means that: 1. You have to think about that certain algorithms can be faster if you implement them with integer instructions instead of MMX. 2. You have to interleave MMX/integer instructions to fully utilize the Geode. You can do it by scheduling up to 6 MMX instruction before the integer ones or by just interleaving them. 3. You have to create at least 3 orthogonal dependency chains. TBD: FPU cycles, FPU 2 cycle waitstate, 3DNOW cycles, 3DNOW whether has the 2 cycle waitstate. TBDWGRM: FPU pipeline details

The IU has a branch predictor of an unspecified size.

TBD: To Be Done TBDWGRM: To Be Done Where Get a Real Machine