Feature roadmap/General UI sluggishness: Difference between revisions

Latest revision as of 21:57, 31 December 2008

Feature subcategory		Is part of::Category:Performance
Requesters		{{#arraymap:Uruguay, Peru\|,\|x\|Requested by::x}}
Requirements		For all of the following, the times measured should apply when the XO is connected to a wireless AP and running Write with a file of less than 1 MB. This is used as a sample "state of the machine" definition. Other definitions of state of the machine are welcome and the performance when the XO is doing more (e.g. more activities open or moving data over the Wireless) should not degrade precipitously. The time between when the user interacts (e.g. clicks or enters a key stroke) when the result is visible on the screen should be less than 100ms. Specific cases are listed below and when the absolute number above is not achievable, a target percentage improvement is listed. The following are examples where think release 8.2 does not meet this requirement. Must be 80% faster than in 8.2 to show or hide the Frame. Must begin showing scroll operation results in the Journal 50% faster than we do now. That is from the time I click on the scroll bar until the image on the screen starts to move. Must copy and paste to the clipboard and show the clipboard icon in the Frame with the right type (text or image) of object 75% faster. Must open a Journal detail page 50% faster. Must show all icons when switching from one view to another (e.g. from Home to Neighborhood etc) 75% faster.
Specification		See previous threads on this here: http://lists.laptop.org/pipermail/sugar/2008-July/007471.html Thread on SVG graphics performance here: http://lists.sugarlabs.org/archive/sugar-devel/2008-December/010200.html Suggestions from John Gilmore (e-mail here: http://lists.laptop.org/pipermail/devel/2008-December/021595.html) File read write performance Putting a swap partition on an SD card and seeing what it does for performance Building an un-compressed JFFS2 filesystem (it's trivial with the tools used) and seeing what it does for performance System memory usage optimization Running "prelink" to avoid dirtying pages for shared libraries (reducing memory pressure) and see what it does for performance Working on glibc and other popular libraries in the XO to reduce their dirty memory page footprint (it's huge and doesn't need to be) CPU cycle and process optimization Fixing <trac>4680</trac> in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second. System level tests Replacing Sugar with Gnome, KDE, or other GUIs and seeing what it does for performance. Suggestions from Erikg focused on Sugar optimization: http://lists.laptop.org/pipermail/sugar/2008-October/009393.html Graphics performance Related thread: http://lists.laptop.org/pipermail/devel/2008-December/thread.html#22027 Test results (thanks Neil!): Side by side Cairo graphics performance tests between a 2Ghz PC and XO http://screamingduck.com/Cruft/cairo_benchmark_XO.txt http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt Tools: Performance tuning lists tools and techniques Test data comparison Thanks Jordan for data and code analysis below! (read wiki code for proper formatting) Test Accel Noaccel Delta textpath-xlib-textpath 1562.60 1345.12 217.48 texturedtext-xlib-texturedtext 315.61 140.54 175.07 downsample-nearest-xlib-512x512-redsquar 106.37 33.25 73.12 downsample-bilinear-xlib-512x512-redsqua 96.57 35.22 61.35 downsample-bilinear-xlib-512x512-primros 83.36 34.81 48.56 downsample-nearest-xlib-512x512-lenna 78.18 29.83 48.35 downsample-bilinear-xlib-512x512-lenna 83.91 36.32 47.59 downsample-nearest-xlib-512x512-primrose 77.49 30.06 47.43 upsample-nearest-xlib-48x48-todo 86.23 60.14 26.09 upsample-bilinear-xlib-48x48-brokenlock 242.52 216.49 26.03 upsample-bilinear-xlib-48x48-script 237.69 211.70 25.98 upsample-bilinear-xlib-48x48-mail 234.40 208.43 25.97 upsample-bilinear-xlib-48x48-todo 239.85 213.94 25.91 upsample-nearest-xlib-48x48-script 81.67 57.02 24.65 upsample-nearest-xlib-48x48-mail 78.99 54.42 24.57 upsample-nearest-xlib-48x48-brokenlock 86.18 61.73 24.45 upsample-nearest-48x48-script 61.95 57.46 4.49 downsample-bilinear-512x512-redsquare 11.24 7.77 3.47 solidtext-xlib-solidtext 11.70 9.51 2.19 textpath-textpath 1081.14 1079.37 1.78 texturedtext-texturedtext 112.33 111.79 0.54 upsample-bilinear-48x48-todo 224.06 223.68 0.37 upsample-nearest-48x48-brokenlock 64.46 64.16 0.30 upsample-bilinear-48x48-brokenlock 226.51 226.25 0.26 downsample-nearest-512x512-redsquare 2.43 2.23 0.19 gradients-linear-gradients-linear 107.39 107.30 0.09 over-640x480-empty 15.68 15.61 0.07 over-640x480-opaque 20.19 20.12 0.07 add-640x480-opaque 20.77 20.73 0.04 upsample-nearest-48x48-todo 60.75 60.71 0.04 add-640x480-transparentshapes 20.79 20.78 0.02 add-640x480-shapes 20.76 20.74 0.02 multiple-clip-rectangles-multiple clip r 1.23 1.22 0.01 over-clipped-640x480-empty 0.95 0.94 0.01 over-640x480-text 23.51 23.51 0.01 downsample-bilinear-512x512-primrose 7.08 7.08 0.00 multiple-clip-rectangles-xlib-multiple c 0.15 0.15 0.00 over-clipped-640x480-opaque 1.22 1.22 0.00 downsample-bilinear-512x512-lenna 7.03 7.04 -0.01 over-clipped-640x480-shapes 1.23 1.24 -0.01 downsample-nearest-512x512-primrose 2.03 2.05 -0.02 downsample-nearest-512x512-lenna 2.03 2.05 -0.02 over-640x480-transparentshapes 58.66 58.68 -0.02 over-640x480-shapes 18.56 18.59 -0.03 upsample-nearest-48x48-mail 54.71 54.77 -0.07 add-640x480-text 20.70 20.77 -0.08 solidtext-solidtext 42.83 42.94 -0.10 add-640x480-empty 20.66 20.80 -0.13 upsample-bilinear-48x48-mail 217.81 219.44 -1.63 over-clipped-xlib-640x480-opaque 4.55 6.26 -1.71 upsample-bilinear-48x48-script 220.89 222.80 -1.92 over-clipped-xlib-640x480-empty 3.67 6.04 -2.38 lines-lines 426.79 429.16 -2.38 over-clipped-xlib-640x480-shapes 4.00 6.52 -2.51 curves-curves 224.55 236.08 -11.53 over-xlib-640x480-empty 29.88 48.30 -18.42 curves-xlib-curves 245.46 264.19 -18.73 gradients-linear-xlib-gradients-linear 132.35 151.62 -19.26 over-xlib-640x480-opaque 29.92 53.04 -23.12 add-xlib-640x480-transparentshapes 29.98 53.53 -23.54 add-xlib-640x480-opaque 29.97 53.54 -23.57 add-xlib-640x480-empty 29.93 53.61 -23.67 add-xlib-640x480-shapes 30.05 53.77 -23.72 add-xlib-640x480-text 29.75 53.59 -23.84 over-xlib-640x480-shapes 29.77 54.93 -25.16 over-xlib-640x480-text 29.83 57.75 -27.92 over-xlib-640x480-transparentshapes 29.76 91.67 -61.91 lines-xlib-lines 275.59 481.84 -206.25 My first general observation is that the numbers are skewed due to system activity - recall that X runs in user space, so it is subject to be preempted by the kernel. I think that the obviously high numbers in many of the results are due to NAND or wireless interrupts (example): 6: 2261923 (5.25 ms) 7: 16690761 (38.73 ms) 8: 2306919 (5.35 ms) Three reasons why unaccel would be faster then accel a bug in the accel code The accel path requires reading from video memory (which is very slow) The accel path doesn't punt to unaccel early enough. Possible driver bug textpath-xlib and texturedtext-xlib toss up a huge red flag - I am guessing we are probably seeing a bug in the driver. As before, I encourage you to investigate which operation are heavily used - if you don't use textured text very much, then optimizing it would be heavily on the geek points, but not very useful in the long haul. X optimization suggestions From http://lists.laptop.org/pipermail/devel/2008-December/022036.html The majority of the operations will probably be composite operations. You will want to instrument the three composite hooks in the X driver and their sub-functions: lx_check_composite, lx_prepare_composite, and lx_do_composite (in lx_exa.c). lx_check_composite is the function where EXA checks to see if we are willing to do the operation at all - most of the acceleration rejects should happen here. lx_prepare_composite is where we store the information we need for the ensuing composite operation(s) - we can also bail out here, but there is an incremental cost in leading EXA further down the primrose path before rejecting it. lx_do_composite() obviously is where the operation happens. You will want to concentrate on these functions - instrument the code to figure out why we accept or reject an operation and why we take so long in rejecting certain operations. Profiling these functions may also help you figure out where we are spending our time.
Owners		{{#arraymap:MarcoPesentiGritti, Erik, Gregorio\|,\|x\|Contact person::User:x}}
Priority		Priority::2
Helps deployability?		Helps deployability::no
Target for 9.1?		Target for 9.1::no

@@ Line 3: / Line 3: @@
 |Feature subcategory=Performance
 |Requesters=Uruguay, Peru
-|Requirements=
+|Requirements=<br>
 * For all of the following, the times measured should apply when the XO is connected to a wireless AP and running Write with a file of less than 1 MB. This is used as a sample "state of the machine" definition. Other definitions of state of the machine are welcome and the performance when the XO is doing more (e.g. more activities open or moving data over the Wireless) should not degrade precipitously.
 * The time between when the user interacts (e.g. clicks or enters a key stroke) when the result is visible on the screen should be less than 100ms. Specific cases are listed below and when the absolute number above is not achievable, a target percentage improvement is listed.
@@ Line 31: / Line 31: @@
 '''CPU cycle and process optimization''' <br>
-*  Fixing bug #4680 in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second.
+*  Fixing <trac>4680</trac> in PyGTK+, which causes every multithreaded Python GTK+ program to uselessly poll ten times a second.
 '''System level tests''' <br>
@@ Line 44: / Line 44: @@
 http://screamingduck.com/Cruft/cairo_benchmark_2GHz_E2180.txt <br>
 http://screamingduck.com/Cruft/cairo_benchmark_XO_NoAccel.txt <br>
+* Tools: [[Performance tuning]] lists tools and techniques
-* Tools: <br>
-http://wiki.laptop.org/go/Performance_tuning
+== Test data comparison ==
+Thanks Jordan for data and code analysis below! (read wiki code for proper formatting) <br>
+Test                                     Accel    Noaccel   Delta<br>
+------------------------------------------------------------------<br>
+textpath-xlib-textpath                   1562.60  1345.12  217.48<br>
+texturedtext-xlib-texturedtext           315.61   140.54   175.07<br>
+downsample-nearest-xlib-512x512-redsquar 106.37   33.25     73.12<br>
+downsample-bilinear-xlib-512x512-redsqua 96.57    35.22     61.35<br>
+downsample-bilinear-xlib-512x512-primros 83.36    34.81     48.56<br>
+downsample-nearest-xlib-512x512-lenna    78.18    29.83     48.35<br>
+downsample-bilinear-xlib-512x512-lenna   83.91    36.32     47.59<br>
+downsample-nearest-xlib-512x512-primrose 77.49    30.06     47.43<br>
+upsample-nearest-xlib-48x48-todo         86.23    60.14     26.09<br>
+upsample-bilinear-xlib-48x48-brokenlock  242.52   216.49    26.03<br>
+upsample-bilinear-xlib-48x48-script      237.69   211.70    25.98<br>
+upsample-bilinear-xlib-48x48-mail        234.40   208.43    25.97<br>
+upsample-bilinear-xlib-48x48-todo        239.85   213.94    25.91<br>
+upsample-nearest-xlib-48x48-script       81.67    57.02     24.65<br>
+upsample-nearest-xlib-48x48-mail         78.99    54.42     24.57<br>
+upsample-nearest-xlib-48x48-brokenlock   86.18    61.73     24.45<br>
+upsample-nearest-48x48-script            61.95    57.46      4.49<br>
+downsample-bilinear-512x512-redsquare    11.24    7.77       3.47<br>
+solidtext-xlib-solidtext                 11.70    9.51       2.19<br>
+textpath-textpath                        1081.14  1079.37    1.78<br>
+texturedtext-texturedtext                112.33   111.79     0.54<br>
+upsample-bilinear-48x48-todo             224.06   223.68     0.37<br>
+upsample-nearest-48x48-brokenlock        64.46    64.16      0.30<br>
+upsample-bilinear-48x48-brokenlock       226.51   226.25     0.26<br>
+downsample-nearest-512x512-redsquare     2.43     2.23       0.19<br>
+gradients-linear-gradients-linear        107.39   107.30     0.09<br>
+over-640x480-empty                       15.68    15.61      0.07<br>
+over-640x480-opaque                      20.19    20.12      0.07<br>
+add-640x480-opaque                       20.77    20.73      0.04<br>
+upsample-nearest-48x48-todo              60.75    60.71      0.04<br>
+add-640x480-transparentshapes            20.79    20.78      0.02<br>
+add-640x480-shapes                       20.76    20.74      0.02<br>
+multiple-clip-rectangles-multiple clip r 1.23     1.22       0.01<br>
+over-clipped-640x480-empty               0.95     0.94       0.01<br>
+over-640x480-text                        23.51    23.51      0.01<br>
+downsample-bilinear-512x512-primrose     7.08     7.08       0.00<br>
+multiple-clip-rectangles-xlib-multiple c 0.15     0.15       0.00<br>
+over-clipped-640x480-opaque              1.22     1.22       0.00<br>
+downsample-bilinear-512x512-lenna        7.03     7.04      -0.01<br>
+over-clipped-640x480-shapes              1.23     1.24      -0.01<br>
+downsample-nearest-512x512-primrose      2.03     2.05      -0.02<br>
+downsample-nearest-512x512-lenna         2.03     2.05      -0.02<br>
+over-640x480-transparentshapes           58.66    58.68     -0.02<br>
+over-640x480-shapes                      18.56    18.59     -0.03<br>
+upsample-nearest-48x48-mail              54.71    54.77     -0.07<br>
+add-640x480-text                         20.70    20.77     -0.08<br>
+solidtext-solidtext                      42.83    42.94     -0.10<br>
+add-640x480-empty                        20.66    20.80     -0.13<br>
+upsample-bilinear-48x48-mail             217.81   219.44    -1.63<br>
+over-clipped-xlib-640x480-opaque         4.55     6.26      -1.71<br>
+upsample-bilinear-48x48-script           220.89   222.80    -1.92<br>
+over-clipped-xlib-640x480-empty          3.67     6.04      -2.38<br>
+lines-lines                              426.79   429.16    -2.38<br>
+over-clipped-xlib-640x480-shapes         4.00     6.52      -2.51<br>
+curves-curves                            224.55   236.08   -11.53<br>
+over-xlib-640x480-empty                  29.88    48.30    -18.42<br>
+curves-xlib-curves                       245.46   264.19   -18.73<br>
+gradients-linear-xlib-gradients-linear   132.35   151.62   -19.26<br>
+over-xlib-640x480-opaque                 29.92    53.04    -23.12<br>
+add-xlib-640x480-transparentshapes       29.98    53.53    -23.54<br>
+add-xlib-640x480-opaque                  29.97    53.54    -23.57<br>
+add-xlib-640x480-empty                   29.93    53.61    -23.67<br>
+add-xlib-640x480-shapes                  30.05    53.77    -23.72<br>
+add-xlib-640x480-text                    29.75    53.59    -23.84<br>
+over-xlib-640x480-shapes                 29.77    54.93    -25.16<br>
+over-xlib-640x480-text                   29.83    57.75    -27.92<br>
+over-xlib-640x480-transparentshapes      29.76    91.67    -61.91<br>
+lines-xlib-lines                         275.59   481.84   -206.25<br>
+<br>
+My first general observation is that the numbers are skewed due to system activity - recall that X runs in user space, so it is subject to be preempted by the kernel.  I think that the obviously high numbers in many of the results are due to NAND or wireless interrupts (example):<br>
+: 2261923 (5.25 ms) <br>
+: 16690761 (38.73 ms) <br>
+: 2306919 (5.35 ms) <br>
+Three reasons why unaccel would be faster then accel
+# a bug in the accel code
+# The accel path requires reading from video memory (which is very slow)
+# The accel path doesn't punt to unaccel early enough.
+'''Possible driver bug'''
+textpath-xlib and texturedtext-xlib  toss up a huge red flag - I am guessing we are probably seeing a bug in the driver. <br>
+As before, I encourage you to investigate which operation are heavily used - if you don't use textured text very much, then optimizing it would be heavily on the geek points, but not very useful in the long haul.
+==X optimization suggestions==
+From http://lists.laptop.org/pipermail/devel/2008-December/022036.html <br>
+The majority of the operations will probably be composite operations. You will want to instrument the three composite hooks in the X driver and their sub-functions:  lx_check_composite, lx_prepare_composite, and lx_do_composite (in lx_exa.c).
+lx_check_composite is the function where EXA checks to see if we are willing to do the operation at all - most of the acceleration rejects should happen here. lx_prepare_composite is where we store the information we need for the ensuing composite operation(s) - we can also bail out here, but there is an incremental cost in leading EXA further down the primrose path before rejecting it.  lx_do_composite() obviously is where the operation happens.  You will want to concentrate on these functions - instrument the code to figure out why we accept or reject an operation and why we take so long in rejecting certain operations. Profiling these functions may also help you figure out where we are spending our time.
 |Owners=MarcoPesentiGritti, Erik, Gregorio

Feature roadmap/General UI sluggishness: Difference between revisions

Latest revision as of 21:57, 31 December 2008

Test data comparison

X optimization suggestions

Navigation menu

Search