Geode optimization effort
This page is a placeholder for discussing getting libraries (and possibly apps) built with Geode-specific optimizations.
The Trac bug for this work is <trac>118</trac>
Summary Email from Brian Carnes, November 2007
It was great to get such helpful and varied responses to this inquiry.
The original request from the developers page I was responding to was asking for Geode optimizations in gcc. After digging deeper, and reading the replies on this list, and getting involved w/ the Trac enhancment request behind this, I think I can summarize the current state as follows:
1. gcc (unreleased 4.3) has Geode optimizations we'd like to use, but it is not out yet, nor is the rest of it stable. [GCC 4.3 was released March 2008]
2. People have experimented with backporting just the Geode gcc changes we want to gcc 4.1 or 4.2 with success
3. To leverage existing build systems and package maintenance, we will not rebuild all packages to be geode specific, at least at first.
4. There are big gains to be had: parts of the basic C library showed 20% improvement with Geode specific optimizations (unclear if this was hand-coded asm in key routines, or compiler geode optimizations overall, or both, nor which benchmark was used).
5. Benchmarking of performance gains in real applications has not been done.
So it looks like our tasks are:
1. Standardize on a compiler release and set of backported Geode patches.
2. Benchmark performance gains from geode optimizations, and hand-tuned assembly in key library routines.
3. Measure how these improvements affect real-world responsiveness (pick some easy-to-time metrics in common XO apps)
4. Decide for which shared libraries we should maintain our own geode-specific builds (glibc certainly, others...)
5. Create a portable and robust build system for the above in the short term.
6. Work to get the Fedora project Koji to build geode specific modules in the long run.
I'm intentionally overlooking some of the suggestions from the list to go rewrite existing applications to run better on the Geode, mostly because the above gives us more bang for our development-time-buck.
I think our profiling and benchmarking efforts above will shed lots of light on whether application-level tuning in future is warranted. I imagine we can get quite far by just expanding what are built as geode-specific libraries, targeting the number-crunching libraries like FFTW, BLAS, etc.
The geode-specific 3DNow "pfrsqrtv" (for square root) would be nice to get used in all applications, but that would involve setting up our own build system (Koji or otherwise), or getting the Fedora project to target the geode as an i386 variant.
Since it seems Rob Savoye has already done great work in this area, I'll look to him to lend his insight into immediate next steps.
How should we coordinate moving forward? Get a separate mailing list for this effort, update a wiki page, update the trac ticket, have an IRC meeting?
Let me know.
Comments from Ed Borasky, 30 November 2007
- I just signed up for a Give One Get One on 28 November, so I will be getting a hardware unit whenever I turn up in the rotation.
- I do performance engineering for a living, including profiling.
- I wasn't aware that there are Geode-specific GCC optimizations beyond the Athlon (3DNow! and MMX). Can someone with an actual unit get me a /proc/cpuinfo listing?
- My guess is that the ATLAS (Automatically Tuned Linear Algebra Subroutines) already have close to optimal performance on BLAS/LAPACK on a Geode. If not, I doubt if it would be difficult to tweak the Athlon code to run on a Geode. ATLAS is capable of figuring out the optimum cache management strategy and has assembly language kernels for processors it recognizes. I can ask on the ATLAS mailing list if they have anything for Geode already. In any event, on my Athlon T-Bird and Athlon-XP systems, ATLAS runs at pretty near full clock speed on even moderately-sized linear algebra problems.
- Are the AMD libraries distributable on an XO (license-wise)? Do they have anything for a Geode?
- Another thing you might want to look at is Vector Pascal. This lives at http://www.dcs.gla.ac.uk/~wpc/reports/compilers/compilerindex/Doc2.html. It's open source but requires a JRE. So you'd probably end up cross-compiling. There's a book describing the concepts and language as well -- SIMD Programming Manual for Linux and Windows (Springer Professional Computing). What I don't know is how easy it would be to interface code compiled by Vector Pascal with Python -- I'm not a Pythonista.
Only the recent Geode NX is Athlon-based. The Geode GX and Geode LX are descended from the Cyrix MediaGX. (there was a buy-out) The XO uses the Geode LX. Here is what /proc/cpuinfo has to say:
processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 10 model name : Geode(TM) Integrated Processor by AMD PCS stepping : 2 cpu MHz : 431.243 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext 3dnowext 3dnow bogomips : 863.54 clflush size : 32
- Trac <trac>118</trac>
- Geode optimized code
- patched GCC 4.2 (from 2007) w/ geode experiment writeup: http://wiki.gnashdev.org/wiki/index.php/Building_OLPC_Tools
- original compiler request: Developers program
- Geode instruction set page
- Geode page
- GCC i386 options documentation
Rob Savoye (?owner of the gnashdev.org effort, using latest gcc from svn?) Bernardo Innocenti (gcc geode backports to 4.2.1) Brian Carnes (me, wanting to help) Alexandre Oliva (would like to help) András Rafás (NoiseEHC, would like to finish the Geode asm docu) others?