Weighted Geometric
Mean Selected
for SPECviewperfô Composite Numbers 
by Bill LiceaKane
At its February 1995 meeting in Salt Lake City, a subcommittee within the SPECopc^{SM} project group was given the task of recommending a method for deriving a single composite metric for each viewset running under the SPECviewperfô benchmark. Composite numbers had been discussed by the SPECopc group for more than a year.
In May 1995, the SPECopc project group decided to adopt a weighted geometric mean as the single composite metric for each viewset.
Above is the formula for determining a weighted geometric mean, where "n" is the number of individual tests in a viewset, and "w" is the weight of each individual test, expressed as a number between 0.0 and 1.0. (A test with a weight of "10.0%" is a "w" of 0.10. Note the sum of the weights of the individual tests must equal 1.00.)
The weighted geometric mean of CDRS03, for example, is expressed by the following formula:
Given this description, the weighted geometric mean of each viewset is the correct composite metric. This composite metric is a derived quantity that is exactly as if you ran the viewset tests for 100 seconds, where test 1 was run for 100 × weight_{1} seconds, test 2 for 100 × weight_{2 }seconds, and so on.
The end result would be the number of frames rendered/total time which will equal frames/second. It also has the desirable property of "bigger is better"; that is, the higher the number, the better the performance.
Given this description, the weighted harmonic mean would be as if you ran the viewset tests for 100 frames, where 100 × weight_{1} frames were drawn with test1, the next 100 × weight_{2} frames were drawn by test2, and so on. The 100 frames divided by the total time would be the weighted harmonic mean.
Since the weights for the viewsets were selected on percentage of time, not percentage of operations, we chose the weighted geometric mean over the weighted harmonic mean.
Consider for a moment a trivial example, where there are two tests,
equally weighted in a viewset:




System A 



System B 



System C 



System B is 10percent faster at Test1 than System A. System C is 10percent faster at Test2 than System A. But look at the weighted arithmetic means. System B's weighted arithmetic mean is only .1percent higher than System A's, while System C's weighted arithmetic mean is 10percent higher. Even normalization doesn't help here.
Since our weights were percentage of time and since the results from SPECviewperf are expressed in frames/sec, we were not obligated to normalize. Normalization introduces many issues of its own, starting with something as simple as how to select a reference system.
We invite readers to select two different systems whose results are published in this newsletter and to use each one as the reference system. You will discover quickly that the normalized weighted geometric means change only in absolute magnitude. If the weighted geometric mean of System B is 10percent higher than System A, for example, the normalized weighted geometric mean of System B will be 10percent higher than System A, no matter what reference system you choose.
Please don't rely exclusively on any synthetic benchmark such as SPECviewperf. In the end, isn't actual application performance on an actual computer system what you are really attempting to find?
Bill LiceaKane is chair of SPECopc and a member of SPEC's Board of Directors.
He can be reached by email at bill@ati.com.