See previous threads on this here:
http://lists.laptop.org/pipermail/sugar/2008-July/007471.html
Thread on SVG graphics performance here:
http://lists.sugarlabs.org/archive/sugar-devel/2008-December/010200.html
Suggestions from John Gilmore (e-mail here: http://lists.laptop.org/pipermail/devel/2008-December/021595.html)
File read write performance
- Putting a swap partition on an SD card and seeing what it does for performance
- Building an un-compressed JFFS2 filesystem (it's trivial with the tools used) and seeing what it does for performance
System memory usage optimization
- Running "prelink" to avoid dirtying pages for shared libraries (reducing memory pressure) and see what it does for performance
- Working on glibc and other popular libraries in the XO to reduce their dirty memory page footprint (it's huge and doesn't need to be)
CPU cycle and process optimization
- Fixing bug #4680 in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second.
System level tests
Graphics performance
Related thread: http://lists.laptop.org/pipermail/devel/2008-December/thread.html#22027
- Test results (thanks Neil!):
Side by side Cairo graphics performance tests between a 2Ghz PC and XO
http://screamingduck.com/Cruft/cairo_benchmark_XO.txt
http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt
http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt
http://wiki.laptop.org/go/Performance_tuning
Test data comparison
Thanks Jordan for data and code analysis below! (read wiki code for proper formatting)
Test Accel Noaccel Delta
textpath-xlib-textpath 1562.60 1345.12 217.48
texturedtext-xlib-texturedtext 315.61 140.54 175.07
downsample-nearest-xlib-512x512-redsquar 106.37 33.25 73.12
downsample-bilinear-xlib-512x512-redsqua 96.57 35.22 61.35
downsample-bilinear-xlib-512x512-primros 83.36 34.81 48.56
downsample-nearest-xlib-512x512-lenna 78.18 29.83 48.35
downsample-bilinear-xlib-512x512-lenna 83.91 36.32 47.59
downsample-nearest-xlib-512x512-primrose 77.49 30.06 47.43
upsample-nearest-xlib-48x48-todo 86.23 60.14 26.09
upsample-bilinear-xlib-48x48-brokenlock 242.52 216.49 26.03
upsample-bilinear-xlib-48x48-script 237.69 211.70 25.98
upsample-bilinear-xlib-48x48-mail 234.40 208.43 25.97
upsample-bilinear-xlib-48x48-todo 239.85 213.94 25.91
upsample-nearest-xlib-48x48-script 81.67 57.02 24.65
upsample-nearest-xlib-48x48-mail 78.99 54.42 24.57
upsample-nearest-xlib-48x48-brokenlock 86.18 61.73 24.45
upsample-nearest-48x48-script 61.95 57.46 4.49
downsample-bilinear-512x512-redsquare 11.24 7.77 3.47
solidtext-xlib-solidtext 11.70 9.51 2.19
textpath-textpath 1081.14 1079.37 1.78
texturedtext-texturedtext 112.33 111.79 0.54
upsample-bilinear-48x48-todo 224.06 223.68 0.37
upsample-nearest-48x48-brokenlock 64.46 64.16 0.30
upsample-bilinear-48x48-brokenlock 226.51 226.25 0.26
downsample-nearest-512x512-redsquare 2.43 2.23 0.19
gradients-linear-gradients-linear 107.39 107.30 0.09
over-640x480-empty 15.68 15.61 0.07
over-640x480-opaque 20.19 20.12 0.07
add-640x480-opaque 20.77 20.73 0.04
upsample-nearest-48x48-todo 60.75 60.71 0.04
add-640x480-transparentshapes 20.79 20.78 0.02
add-640x480-shapes 20.76 20.74 0.02
multiple-clip-rectangles-multiple clip r 1.23 1.22 0.01
over-clipped-640x480-empty 0.95 0.94 0.01
over-640x480-text 23.51 23.51 0.01
downsample-bilinear-512x512-primrose 7.08 7.08 0.00
multiple-clip-rectangles-xlib-multiple c 0.15 0.15 0.00
over-clipped-640x480-opaque 1.22 1.22 0.00
downsample-bilinear-512x512-lenna 7.03 7.04 -0.01
over-clipped-640x480-shapes 1.23 1.24 -0.01
downsample-nearest-512x512-primrose 2.03 2.05 -0.02
downsample-nearest-512x512-lenna 2.03 2.05 -0.02
over-640x480-transparentshapes 58.66 58.68 -0.02
over-640x480-shapes 18.56 18.59 -0.03
upsample-nearest-48x48-mail 54.71 54.77 -0.07
add-640x480-text 20.70 20.77 -0.08
solidtext-solidtext 42.83 42.94 -0.10
add-640x480-empty 20.66 20.80 -0.13
upsample-bilinear-48x48-mail 217.81 219.44 -1.63
over-clipped-xlib-640x480-opaque 4.55 6.26 -1.71
upsample-bilinear-48x48-script 220.89 222.80 -1.92
over-clipped-xlib-640x480-empty 3.67 6.04 -2.38
lines-lines 426.79 429.16 -2.38
over-clipped-xlib-640x480-shapes 4.00 6.52 -2.51
curves-curves 224.55 236.08 -11.53
over-xlib-640x480-empty 29.88 48.30 -18.42
curves-xlib-curves 245.46 264.19 -18.73
gradients-linear-xlib-gradients-linear 132.35 151.62 -19.26
over-xlib-640x480-opaque 29.92 53.04 -23.12
add-xlib-640x480-transparentshapes 29.98 53.53 -23.54
add-xlib-640x480-opaque 29.97 53.54 -23.57
add-xlib-640x480-empty 29.93 53.61 -23.67
add-xlib-640x480-shapes 30.05 53.77 -23.72
add-xlib-640x480-text 29.75 53.59 -23.84
over-xlib-640x480-shapes 29.77 54.93 -25.16
over-xlib-640x480-text 29.83 57.75 -27.92
over-xlib-640x480-transparentshapes 29.76 91.67 -61.91
lines-xlib-lines 275.59 481.84 -206.25
My first general observation is that the numbers are skewed due to system activity - recall that X runs in user space, so it is subject to be preempted by the kernel. I think that the obviously high numbers in many of the results are due to NAND or wireless interrupts (example):
6: 2261923 (5.25 ms)
7: 16690761 (38.73 ms)
8: 2306919 (5.35 ms)
Three reasons why unaccel would be faster then accel
- a bug in the accel code
- The accel path requires reading from video memory (which is very slow)
- The accel path doesn't punt to unaccel early enough.
Possible driver bug
textpath-xlib and texturedtext-xlib toss up a huge red flag - I am guessing we are probably seeing a bug in the driver.
As before, I encourage you to investigate which operation are heavily used - if you don't use textured text very much, then optimizing it would be heavily on the geek points, but not very useful in the long haul.
X optimization suggestions
From http://lists.laptop.org/pipermail/devel/2008-December/022036.html
The majority of the operations will probably be composite operations. You will want to instrument the three composite hooks in the X driver and their sub-functions: lx_check_composite, lx_prepare_composite, and lx_do_composite (in lx_exa.c).
lx_check_composite is the function where EXA checks to see if we are willing to do the operation at all - most of the acceleration rejects should happen here. lx_prepare_composite is where we store the information we need for the ensuing composite operation(s) - we can also bail out here, but there is an incremental cost in leading EXA further down the primrose path before rejecting it. lx_do_composite() obviously is where the operation happens. You will want to concentrate on these functions - instrument the code to figure out why we accept or reject an operation and why we take so long in rejecting certain operations. Profiling these functions may also help you figure out where we are spending our time.
|