Feature roadmap/General UI sluggishness: Difference between revisions
No edit summary |
m (use <trac> for bug, add <br> to make first '*' a bullet) |
||
(6 intermediate revisions by one other user not shown) | |||
Line 3: | Line 3: | ||
|Feature subcategory=Performance |
|Feature subcategory=Performance |
||
|Requesters=Uruguay, Peru |
|Requesters=Uruguay, Peru |
||
|Requirements= |
|Requirements=<br> |
||
* For all of the following, the times measured should apply when the XO is connected to a wireless AP and running Write with a file of less than 1 MB. This is used as a sample "state of the machine" definition. Other definitions of state of the machine are welcome and the performance when the XO is doing more (e.g. more activities open or moving data over the Wireless) should not degrade precipitously. |
* For all of the following, the times measured should apply when the XO is connected to a wireless AP and running Write with a file of less than 1 MB. This is used as a sample "state of the machine" definition. Other definitions of state of the machine are welcome and the performance when the XO is doing more (e.g. more activities open or moving data over the Wireless) should not degrade precipitously. |
||
* The time between when the user interacts (e.g. clicks or enters a key stroke) when the result is visible on the screen should be less than 100ms. Specific cases are listed below and when the absolute number above is not achievable, a target percentage improvement is listed. |
* The time between when the user interacts (e.g. clicks or enters a key stroke) when the result is visible on the screen should be less than 100ms. Specific cases are listed below and when the absolute number above is not achievable, a target percentage improvement is listed. |
||
Line 31: | Line 31: | ||
'''CPU cycle and process optimization''' <br> |
'''CPU cycle and process optimization''' <br> |
||
* Fixing |
* Fixing <trac>4680</trac> in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second. |
||
'''System level tests''' <br> |
'''System level tests''' <br> |
||
Line 44: | Line 44: | ||
http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt <br> |
http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt <br> |
||
http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt <br> |
http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt <br> |
||
* Tools: [[Performance tuning]] lists tools and techniques |
|||
* Tools: <br> |
|||
http://wiki.laptop.org/go/Performance_tuning |
|||
== Test data comparison == |
|||
Thanks Jordan for data and code analysis below! (read wiki code for proper formatting) <br> |
|||
Test Accel Noaccel Delta<br> |
|||
------------------------------------------------------------------<br> |
|||
textpath-xlib-textpath 1562.60 1345.12 217.48<br> |
|||
texturedtext-xlib-texturedtext 315.61 140.54 175.07<br> |
|||
downsample-nearest-xlib-512x512-redsquar 106.37 33.25 73.12<br> |
|||
downsample-bilinear-xlib-512x512-redsqua 96.57 35.22 61.35<br> |
|||
downsample-bilinear-xlib-512x512-primros 83.36 34.81 48.56<br> |
|||
downsample-nearest-xlib-512x512-lenna 78.18 29.83 48.35<br> |
|||
downsample-bilinear-xlib-512x512-lenna 83.91 36.32 47.59<br> |
|||
downsample-nearest-xlib-512x512-primrose 77.49 30.06 47.43<br> |
|||
upsample-nearest-xlib-48x48-todo 86.23 60.14 26.09<br> |
|||
upsample-bilinear-xlib-48x48-brokenlock 242.52 216.49 26.03<br> |
|||
upsample-bilinear-xlib-48x48-script 237.69 211.70 25.98<br> |
|||
upsample-bilinear-xlib-48x48-mail 234.40 208.43 25.97<br> |
|||
upsample-bilinear-xlib-48x48-todo 239.85 213.94 25.91<br> |
|||
upsample-nearest-xlib-48x48-script 81.67 57.02 24.65<br> |
|||
upsample-nearest-xlib-48x48-mail 78.99 54.42 24.57<br> |
|||
upsample-nearest-xlib-48x48-brokenlock 86.18 61.73 24.45<br> |
|||
upsample-nearest-48x48-script 61.95 57.46 4.49<br> |
|||
downsample-bilinear-512x512-redsquare 11.24 7.77 3.47<br> |
|||
solidtext-xlib-solidtext 11.70 9.51 2.19<br> |
|||
textpath-textpath 1081.14 1079.37 1.78<br> |
|||
texturedtext-texturedtext 112.33 111.79 0.54<br> |
|||
upsample-bilinear-48x48-todo 224.06 223.68 0.37<br> |
|||
upsample-nearest-48x48-brokenlock 64.46 64.16 0.30<br> |
|||
upsample-bilinear-48x48-brokenlock 226.51 226.25 0.26<br> |
|||
downsample-nearest-512x512-redsquare 2.43 2.23 0.19<br> |
|||
gradients-linear-gradients-linear 107.39 107.30 0.09<br> |
|||
over-640x480-empty 15.68 15.61 0.07<br> |
|||
over-640x480-opaque 20.19 20.12 0.07<br> |
|||
add-640x480-opaque 20.77 20.73 0.04<br> |
|||
upsample-nearest-48x48-todo 60.75 60.71 0.04<br> |
|||
add-640x480-transparentshapes 20.79 20.78 0.02<br> |
|||
add-640x480-shapes 20.76 20.74 0.02<br> |
|||
multiple-clip-rectangles-multiple clip r 1.23 1.22 0.01<br> |
|||
over-clipped-640x480-empty 0.95 0.94 0.01<br> |
|||
over-640x480-text 23.51 23.51 0.01<br> |
|||
downsample-bilinear-512x512-primrose 7.08 7.08 0.00<br> |
|||
multiple-clip-rectangles-xlib-multiple c 0.15 0.15 0.00<br> |
|||
over-clipped-640x480-opaque 1.22 1.22 0.00<br> |
|||
downsample-bilinear-512x512-lenna 7.03 7.04 -0.01<br> |
|||
over-clipped-640x480-shapes 1.23 1.24 -0.01<br> |
|||
downsample-nearest-512x512-primrose 2.03 2.05 -0.02<br> |
|||
downsample-nearest-512x512-lenna 2.03 2.05 -0.02<br> |
|||
over-640x480-transparentshapes 58.66 58.68 -0.02<br> |
|||
over-640x480-shapes 18.56 18.59 -0.03<br> |
|||
upsample-nearest-48x48-mail 54.71 54.77 -0.07<br> |
|||
add-640x480-text 20.70 20.77 -0.08<br> |
|||
solidtext-solidtext 42.83 42.94 -0.10<br> |
|||
add-640x480-empty 20.66 20.80 -0.13<br> |
|||
upsample-bilinear-48x48-mail 217.81 219.44 -1.63<br> |
|||
over-clipped-xlib-640x480-opaque 4.55 6.26 -1.71<br> |
|||
upsample-bilinear-48x48-script 220.89 222.80 -1.92<br> |
|||
over-clipped-xlib-640x480-empty 3.67 6.04 -2.38<br> |
|||
lines-lines 426.79 429.16 -2.38<br> |
|||
over-clipped-xlib-640x480-shapes 4.00 6.52 -2.51<br> |
|||
curves-curves 224.55 236.08 -11.53<br> |
|||
over-xlib-640x480-empty 29.88 48.30 -18.42<br> |
|||
curves-xlib-curves 245.46 264.19 -18.73<br> |
|||
gradients-linear-xlib-gradients-linear 132.35 151.62 -19.26<br> |
|||
over-xlib-640x480-opaque 29.92 53.04 -23.12<br> |
|||
add-xlib-640x480-transparentshapes 29.98 53.53 -23.54<br> |
|||
add-xlib-640x480-opaque 29.97 53.54 -23.57<br> |
|||
add-xlib-640x480-empty 29.93 53.61 -23.67<br> |
|||
add-xlib-640x480-shapes 30.05 53.77 -23.72<br> |
|||
add-xlib-640x480-text 29.75 53.59 -23.84<br> |
|||
over-xlib-640x480-shapes 29.77 54.93 -25.16<br> |
|||
over-xlib-640x480-text 29.83 57.75 -27.92<br> |
|||
over-xlib-640x480-transparentshapes 29.76 91.67 -61.91<br> |
|||
lines-xlib-lines 275.59 481.84 -206.25<br> |
|||
<br> |
|||
My first general observation is that the numbers are skewed due to system activity - recall that X runs in user space, so it is subject to be preempted by the kernel. I think that the obviously high numbers in many of the results are due to NAND or wireless interrupts (example):<br> |
|||
6: 2261923 (5.25 ms) <br> |
|||
7: 16690761 (38.73 ms) <br> |
|||
8: 2306919 (5.35 ms) <br> |
|||
Three reasons why unaccel would be faster then accel |
|||
# a bug in the accel code |
|||
# The accel path requires reading from video memory (which is very slow) |
|||
# The accel path doesn't punt to unaccel early enough. |
|||
'''Possible driver bug''' |
|||
textpath-xlib and texturedtext-xlib toss up a huge red flag - I am guessing we are probably seeing a bug in the driver. <br> |
|||
As before, I encourage you to investigate which operation are heavily used - if you don't use textured text very much, then optimizing it would be heavily on the geek points, but not very useful in the long haul. |
|||
==X optimization suggestions== |
|||
From http://lists.laptop.org/pipermail/devel/2008-December/022036.html <br> |
|||
The majority of the operations will probably be composite operations. You will want to instrument the three composite hooks in the X driver and their sub-functions: lx_check_composite, lx_prepare_composite, and lx_do_composite (in lx_exa.c). |
|||
lx_check_composite is the function where EXA checks to see if we are willing to do the operation at all - most of the acceleration rejects should happen here. lx_prepare_composite is where we store the information we need for the ensuing composite operation(s) - we can also bail out here, but there is an incremental cost in leading EXA further down the primrose path before rejecting it. lx_do_composite() obviously is where the operation happens. You will want to concentrate on these functions - instrument the code to figure out why we accept or reject an operation and why we take so long in rejecting certain operations. Profiling these functions may also help you figure out where we are spending our time. |
|||
|Owners=MarcoPesentiGritti, Erik, Gregorio |
|Owners=MarcoPesentiGritti, Erik, Gregorio |
Latest revision as of 21:57, 31 December 2008
Feature subcategory | Is part of::Category:Performance | |
Requesters | {{#arraymap:Uruguay, Peru|,|x|Requested by::x}} | |
Requirements |
| |
Specification | See previous threads on this here: http://lists.laptop.org/pipermail/sugar/2008-July/007471.html Thread on SVG graphics performance here: Suggestions from John Gilmore (e-mail here: http://lists.laptop.org/pipermail/devel/2008-December/021595.html)
System memory usage optimization
CPU cycle and process optimization
System level tests
Graphics performance
Side by side Cairo graphics performance tests between a 2Ghz PC and XO
Test data comparisonThanks Jordan for data and code analysis below! (read wiki code for proper formatting) textpath-xlib-textpath 1562.60 1345.12 217.48 Three reasons why unaccel would be faster then accel
Possible driver bug
textpath-xlib and texturedtext-xlib toss up a huge red flag - I am guessing we are probably seeing a bug in the driver. X optimization suggestionsFrom http://lists.laptop.org/pipermail/devel/2008-December/022036.html lx_check_composite is the function where EXA checks to see if we are willing to do the operation at all - most of the acceleration rejects should happen here. lx_prepare_composite is where we store the information we need for the ensuing composite operation(s) - we can also bail out here, but there is an incremental cost in leading EXA further down the primrose path before rejecting it. lx_do_composite() obviously is where the operation happens. You will want to concentrate on these functions - instrument the code to figure out why we accept or reject an operation and why we take so long in rejecting certain operations. Profiling these functions may also help you figure out where we are spending our time. | |
Owners | {{#arraymap:MarcoPesentiGritti, Erik, Gregorio|,|x|Contact person::User:x}} | |
Priority | Priority::2 | |
Helps deployability? | Helps deployability::no | |
Target for 9.1? | Target for 9.1::no |