User talk:Bluefoxicy/gcc optimizations

From OLPC
Jump to navigation Jump to search

cachegrind

I tried cachegrind on povbench to see if I can attribute the cost to I1 misses.. Ignore L2 miss rate; L2 refs are L1 miss rate. I'm interested primarily in I1 miss rate, not D1.

CFLAGS="-march=pentiumpro -O2"
valgrind --tool=cachegrind --I1=16384,4,32 --D1=16384,4,32 --L2=64,2,32 povbench/bin/povray benchmark.ini benchmark.pov
==1319== I   refs:      41,083,755,757
==1319== I1  misses:       182,725,637
==1319== L2i misses:       152,387,115
==1319== I1  miss rate:           0.44%
==1319== L2i miss rate:           0.37%
==1319== 
==1319== D   refs:      21,345,148,925  (15,955,324,459 rd + 5,389,824,466 wr)
==1319== D1  misses:       221,554,710  (   174,862,567 rd +    46,692,143 wr)
==1319== L2d misses:       221,064,369  (   174,419,792 rd +    46,644,577 wr)
==1319== D1  miss rate:            1.0% (           1.0%   +           0.8%  )
==1319== L2d miss rate:            1.0% (           1.0%   +           0.8%  )
==1319== 
==1319== L2 refs:          404,280,347  (   357,588,204 rd +    46,692,143 wr)
==1319== L2 misses:        373,451,484  (   326,806,907 rd +    46,644,577 wr)
==1319== L2 miss rate:             0.5% (           0.5%   +           0.8%  )

The below is with -fno-tree-pre. I don't see the problem; there's only a 0.05% increased miss rate, which should translate to 1.25% difference in I1... not doing the math right now.

CFLAGS="-march=pentiumpro -O2 -fno-tree-pre"
valgrind --tool=cachegrind --I1=16384,4,32 --D1=16384,4,32 --L2=64,2,32 povbench/bin/povray benchmark.ini benchmark.pov
==11679== I   refs:      25,339,345,936
==11679== I1  misses:       126,151,035
==11679== L2i misses:       104,445,090
==11679== I1  miss rate:           0.49%
==11679== L2i miss rate:           0.41%
==11679== 
==11679== D   refs:      13,199,180,344  (9,804,287,156 rd + 3,394,893,188 wr)
==11679== D1  misses:       147,020,945  (  115,976,223 rd +    31,044,722 wr)
==11679== L2d misses:       146,782,461  (  115,759,617 rd +    31,022,844 wr)
==11679== D1  miss rate:            1.1% (          1.1%   +           0.9%  )
==11679== L2d miss rate:            1.1% (          1.1%   +           0.9%  )
==11679== 
==11679== L2 refs:          273,171,980  (  242,127,258 rd +    31,044,722 wr)
==11679== L2 misses:        251,227,551  (  220,204,707 rd +    31,022,844 wr)
==11679== L2 miss rate:             0.6% (          0.6%   +           0.9%  )