Compiler Invocation:
C: cc
F90: f90
F77: f77
Base Tuning:
C: -fast -xopenmp -xalias_level=std -xipo=2
-xprefetch_level=3 -m64 -lmtmalloc -g
-xpagesize=4m -xprefetch=latx:4 -xprofile
f90: -fast -openmp -m64 -xipo=2 -autopar -fma=fused
-g -xpagesize=4m -xprefetch=latx:4 -xprofile
ONESTEP=yes
318.galgel_m portability flags: -e -fixed
Extra art allowed flags:
330.art_m: -DINTS_PER_CACHELINE=16 -DDBLS_PER_CACHELINE=8
Peak Notes:
ONESTEP=yes
310.wupwise_m: -fast -openmp -xunroll=4 -autopar -m32
-xipo=2 -fma=fused -xpagesize=512k
-Qoption iropt -Athr,-Apf:l2subblock=256,-Apf:ipa=9
-xprefetch=latx:3 -Qoption iropt -Rloop_dist
-xprofile
312.swim_m: -fast -openmp -m64 -xipo=2 -autopar
-fma=fused -xpagesize=512k -xprefetch=latx:3
314.mgrid_m: -fast -openmp -xipo=2 -xprefetch_level=3
-m32 -xpagesize=512K -xprefetch:latx:4.8
-fma=fused -Qoption iropt -Apf:l2subblock=256
-xprofile
316.applu_m: -fast -xipo=2 -openmp -xautopar -m64
-fma=fused -xpagesize=4m -xprefetch=latx:2.8
-Qoption iropt -Rloop_dist -xunroll=3 -xprofile
318.galgel_m: -fast -openmp -xipo=2 -xprefetch=latx:1
-xlic_lib=sunperf -xprofile
RM_SOURCES=lapak.f90
320.equake_m: -fast -xopenmp -xprefetch_level=3
-xpagesize=64k -xprefetch=latx:2 -xipo=2
-lmtmalloc -W2,-Apf:l2subblock=256
-xprofile
324.apsi_m: -fast -openmp -m64 -xipo=2 -autopar
-fma=fused -xpagesize=4m -xprefetch=latx:3.4
-Qoption iropt -Rloop_dist -xprofile
326.gafort_m: -fast -openmp -xprefetch_level=3 -m64
-fma=fused -xprefetch=latx:0.5 -xprofile
328.fma3d_m: -fast -openmp -autopar -xipo=2 -fma=fused
-m32 -unroll=5 -xprefetch=latx:4 -lmtmalloc
330.art_m: -fast -xopenmp -xipo=2 -xprefetch_level=3
-m64 -xprefetch=latx:3 -xprofile
332.ammp_m: -fast -xipo=2 -xopenmp -xautopar
-xalias_level=strong -lm -xpagesize=512K -g
Alternate Source for Base and Peak:
328.fma3d_m: sqrt.init, avoid a potential race condition.
Available as SPEC OMP alternate source:
ompm2001-fma3dsqrtinit-20070912.tar.gz
Alternate Source for Peak:
312.swim_m: ompl.32 (available in benchmark)
316.applu_m: ompl.32 (available in benchmark)
320.equake_m: ompl.32 (available in benchmark)
328.fma3d_m: ompl.sqrt.init, avoid a potential race condition and
incorporates ompl srcalt. Available as SPEC OMP alternate source:
ompm2001-fma3dsqrtinit-20070912.tar.gz
Feedback optimization (-xprofile) is done as follows,
unless otherwise noted:
fdo_pre0: rm -rf `pwd`/feedback.profile
PASS1: -xprofile=collect:./feedback
PASS2: -xprofile=use:./feedback
Base and Peak User Environment Settings:
unlimit stacksize (in /bin/csh)
setenv SUNW_MP_PROCBIND " 1 2 4 6 8 10 12 14
16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78
80 82 84 86 88 90 92 94 96 98 100 102 104 106 108
110 112 114 116 118 120 122 124 126 "
setenv SUNW_MP_THR_IDLE SPIN
setenv OMP_DYNAMIC FALSE
Additional Peak User Environment Settings:
OMP_NUM_THREADS settings per benchmark
310.wupwise_m 64
312.swim_m 64
314.mgrid_m 64
316.applu_m 64
318.galgel_m 64
320.equake_m 64
324.apsi_m 127
326.gafort_m 64
328.fma3d_m 64
330.art_m 32
332.ammp_m 127
SUNW_MP_PROCBIND was set per benchmark to distribute the work to as
many cpus and cores as possible. See config file for details.
For a description of Sun Studio 12 Compiler flags, portability flags
and system parameters used to generate this result, please refer to
SUN-20080714-Studio-Solaris-sparc.txt file in the flags directory.
This result was measured on Sun SPARC Enterprise M8000.
The Sun SPARC Enterprise M8000 and the Fujitsu SPARC Enterprise
M8000 are electrically equivalent.
"CMU" = CPU/Memory Unit; each holds 2 or 4 CPU chips.
|