Feature roadmap/General UI sluggishness: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
m (use <trac> for bug, add <br> to make first '*' a bullet)
 
Line 3: Line 3:
|Feature subcategory=Performance
|Feature subcategory=Performance
|Requesters=Uruguay, Peru
|Requesters=Uruguay, Peru
|Requirements=
|Requirements=<br>
* For all of the following, the times measured should apply when the XO is connected to a wireless AP and running Write with a file of less than 1 MB. This is used as a sample "state of the machine" definition. Other definitions of state of the machine are welcome and the performance when the XO is doing more (e.g. more activities open or moving data over the Wireless) should not degrade precipitously.
* For all of the following, the times measured should apply when the XO is connected to a wireless AP and running Write with a file of less than 1 MB. This is used as a sample "state of the machine" definition. Other definitions of state of the machine are welcome and the performance when the XO is doing more (e.g. more activities open or moving data over the Wireless) should not degrade precipitously.
* The time between when the user interacts (e.g. clicks or enters a key stroke) when the result is visible on the screen should be less than 100ms. Specific cases are listed below and when the absolute number above is not achievable, a target percentage improvement is listed.
* The time between when the user interacts (e.g. clicks or enters a key stroke) when the result is visible on the screen should be less than 100ms. Specific cases are listed below and when the absolute number above is not achievable, a target percentage improvement is listed.
Line 31: Line 31:


'''CPU cycle and process optimization''' <br>
'''CPU cycle and process optimization''' <br>
* Fixing bug #4680 in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second.
* Fixing <trac>4680</trac> in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second.


'''System level tests''' <br>
'''System level tests''' <br>
Line 44: Line 44:
http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt <br>
http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt <br>
http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt <br>
http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt <br>
* Tools: [[Performance tuning]] lists tools and techniques
* Tools: <br>
http://wiki.laptop.org/go/Performance_tuning


== Test data comparison ==
== Test data comparison ==

Latest revision as of 21:57, 31 December 2008

Feature subcategory Is part of::Category:Performance
Requesters {{#arraymap:Uruguay, Peru|,|x|Requested by::x}}
Requirements
  • For all of the following, the times measured should apply when the XO is connected to a wireless AP and running Write with a file of less than 1 MB. This is used as a sample "state of the machine" definition. Other definitions of state of the machine are welcome and the performance when the XO is doing more (e.g. more activities open or moving data over the Wireless) should not degrade precipitously.
  • The time between when the user interacts (e.g. clicks or enters a key stroke) when the result is visible on the screen should be less than 100ms. Specific cases are listed below and when the absolute number above is not achievable, a target percentage improvement is listed.
  • The following are examples where think release 8.2 does not meet this requirement.
  • Must be 80% faster than in 8.2 to show or hide the Frame.
  • Must begin showing scroll operation results in the Journal 50% faster than we do now. That is from the time I click on the scroll bar until the image on the screen starts to move.
  • Must copy and paste to the clipboard and show the clipboard icon in the Frame with the right type (text or image) of object 75% faster.
  • Must open a Journal detail page 50% faster.
  • Must show all icons when switching from one view to another (e.g. from Home to Neighborhood etc) 75% faster.
Specification See previous threads on this here:

http://lists.laptop.org/pipermail/sugar/2008-July/007471.html

Thread on SVG graphics performance here:
http://lists.sugarlabs.org/archive/sugar-devel/2008-December/010200.html

Suggestions from John Gilmore (e-mail here: http://lists.laptop.org/pipermail/devel/2008-December/021595.html)
File read write performance

  • Putting a swap partition on an SD card and seeing what it does for performance
  • Building an un-compressed JFFS2 filesystem (it's trivial with the tools used) and seeing what it does for performance

System memory usage optimization

  • Running "prelink" to avoid dirtying pages for shared libraries (reducing memory pressure) and see what it does for performance
  • Working on glibc and other popular libraries in the XO to reduce their dirty memory page footprint (it's huge and doesn't need to be)

CPU cycle and process optimization

  • Fixing <trac>4680</trac> in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second.

System level tests

Graphics performance
Related thread: http://lists.laptop.org/pipermail/devel/2008-December/thread.html#22027

  • Test results (thanks Neil!):

Side by side Cairo graphics performance tests between a 2Ghz PC and XO
http://screamingduck.com/Cruft/cairo_benchmark_XO.txt
http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt
http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt

Test data comparison

Thanks Jordan for data and code analysis below! (read wiki code for proper formatting)
Test Accel Noaccel Delta



textpath-xlib-textpath 1562.60 1345.12 217.48
texturedtext-xlib-texturedtext 315.61 140.54 175.07
downsample-nearest-xlib-512x512-redsquar 106.37 33.25 73.12
downsample-bilinear-xlib-512x512-redsqua 96.57 35.22 61.35
downsample-bilinear-xlib-512x512-primros 83.36 34.81 48.56
downsample-nearest-xlib-512x512-lenna 78.18 29.83 48.35
downsample-bilinear-xlib-512x512-lenna 83.91 36.32 47.59
downsample-nearest-xlib-512x512-primrose 77.49 30.06 47.43
upsample-nearest-xlib-48x48-todo 86.23 60.14 26.09
upsample-bilinear-xlib-48x48-brokenlock 242.52 216.49 26.03
upsample-bilinear-xlib-48x48-script 237.69 211.70 25.98
upsample-bilinear-xlib-48x48-mail 234.40 208.43 25.97
upsample-bilinear-xlib-48x48-todo 239.85 213.94 25.91
upsample-nearest-xlib-48x48-script 81.67 57.02 24.65
upsample-nearest-xlib-48x48-mail 78.99 54.42 24.57
upsample-nearest-xlib-48x48-brokenlock 86.18 61.73 24.45
upsample-nearest-48x48-script 61.95 57.46 4.49
downsample-bilinear-512x512-redsquare 11.24 7.77 3.47
solidtext-xlib-solidtext 11.70 9.51 2.19
textpath-textpath 1081.14 1079.37 1.78
texturedtext-texturedtext 112.33 111.79 0.54
upsample-bilinear-48x48-todo 224.06 223.68 0.37
upsample-nearest-48x48-brokenlock 64.46 64.16 0.30
upsample-bilinear-48x48-brokenlock 226.51 226.25 0.26
downsample-nearest-512x512-redsquare 2.43 2.23 0.19
gradients-linear-gradients-linear 107.39 107.30 0.09
over-640x480-empty 15.68 15.61 0.07
over-640x480-opaque 20.19 20.12 0.07
add-640x480-opaque 20.77 20.73 0.04
upsample-nearest-48x48-todo 60.75 60.71 0.04
add-640x480-transparentshapes 20.79 20.78 0.02
add-640x480-shapes 20.76 20.74 0.02
multiple-clip-rectangles-multiple clip r 1.23 1.22 0.01
over-clipped-640x480-empty 0.95 0.94 0.01
over-640x480-text 23.51 23.51 0.01
downsample-bilinear-512x512-primrose 7.08 7.08 0.00
multiple-clip-rectangles-xlib-multiple c 0.15 0.15 0.00
over-clipped-640x480-opaque 1.22 1.22 0.00
downsample-bilinear-512x512-lenna 7.03 7.04 -0.01
over-clipped-640x480-shapes 1.23 1.24 -0.01
downsample-nearest-512x512-primrose 2.03 2.05 -0.02
downsample-nearest-512x512-lenna 2.03 2.05 -0.02
over-640x480-transparentshapes 58.66 58.68 -0.02
over-640x480-shapes 18.56 18.59 -0.03
upsample-nearest-48x48-mail 54.71 54.77 -0.07
add-640x480-text 20.70 20.77 -0.08
solidtext-solidtext 42.83 42.94 -0.10
add-640x480-empty 20.66 20.80 -0.13
upsample-bilinear-48x48-mail 217.81 219.44 -1.63
over-clipped-xlib-640x480-opaque 4.55 6.26 -1.71
upsample-bilinear-48x48-script 220.89 222.80 -1.92
over-clipped-xlib-640x480-empty 3.67 6.04 -2.38
lines-lines 426.79 429.16 -2.38
over-clipped-xlib-640x480-shapes 4.00 6.52 -2.51
curves-curves 224.55 236.08 -11.53
over-xlib-640x480-empty 29.88 48.30 -18.42
curves-xlib-curves 245.46 264.19 -18.73
gradients-linear-xlib-gradients-linear 132.35 151.62 -19.26
over-xlib-640x480-opaque 29.92 53.04 -23.12
add-xlib-640x480-transparentshapes 29.98 53.53 -23.54
add-xlib-640x480-opaque 29.97 53.54 -23.57
add-xlib-640x480-empty 29.93 53.61 -23.67
add-xlib-640x480-shapes 30.05 53.77 -23.72
add-xlib-640x480-text 29.75 53.59 -23.84
over-xlib-640x480-shapes 29.77 54.93 -25.16
over-xlib-640x480-text 29.83 57.75 -27.92
over-xlib-640x480-transparentshapes 29.76 91.67 -61.91
lines-xlib-lines 275.59 481.84 -206.25

My first general observation is that the numbers are skewed due to system activity - recall that X runs in user space, so it is subject to be preempted by the kernel. I think that the obviously high numbers in many of the results are due to NAND or wireless interrupts (example):
6: 2261923 (5.25 ms)
7: 16690761 (38.73 ms)
8: 2306919 (5.35 ms)

Three reasons why unaccel would be faster then accel

  1. a bug in the accel code
  2. The accel path requires reading from video memory (which is very slow)
  3. The accel path doesn't punt to unaccel early enough.

Possible driver bug textpath-xlib and texturedtext-xlib toss up a huge red flag - I am guessing we are probably seeing a bug in the driver.
As before, I encourage you to investigate which operation are heavily used - if you don't use textured text very much, then optimizing it would be heavily on the geek points, but not very useful in the long haul.

X optimization suggestions

From http://lists.laptop.org/pipermail/devel/2008-December/022036.html
The majority of the operations will probably be composite operations. You will want to instrument the three composite hooks in the X driver and their sub-functions: lx_check_composite, lx_prepare_composite, and lx_do_composite (in lx_exa.c).

lx_check_composite is the function where EXA checks to see if we are willing to do the operation at all - most of the acceleration rejects should happen here. lx_prepare_composite is where we store the information we need for the ensuing composite operation(s) - we can also bail out here, but there is an incremental cost in leading EXA further down the primrose path before rejecting it. lx_do_composite() obviously is where the operation happens. You will want to concentrate on these functions - instrument the code to figure out why we accept or reject an operation and why we take so long in rejecting certain operations. Profiling these functions may also help you figure out where we are spending our time.

Owners {{#arraymap:MarcoPesentiGritti, Erik, Gregorio|,|x|Contact person::User:x}}
Priority Priority::2
Helps deployability? Helps deployability::no
Target for 9.1? Target for 9.1::no