| MPI2007 license: | 0005 | Test date: | Oct-2008 |
|---|---|---|---|
| Test sponsor: | IBM Corporation | Hardware Availability: | Nov-2008 |
| Tested by: | IBM Corporation | Software Availability: | Nov-2008 |
| Benchmark | Base | Peak | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ranks | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | Ranks | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | |
| Results appear in the order in which they were run. Bold underlined text indicates a median measurement. | ||||||||||||||
| 104.milc | 32 | 860 | 1.82 | 871 | 1.80 | 869 | 1.80 | 32 | 860 | 1.82 | 871 | 1.80 | 869 | 1.80 |
| 107.leslie3d | 32 | 1681 | 3.11 | 1719 | 3.04 | 1728 | 3.02 | 32 | 1692 | 3.08 | 1694 | 3.08 | 1686 | 3.10 |
| 113.GemsFDTD | 32 | 1447 | 4.36 | 1445 | 4.36 | 1451 | 4.35 | 32 | 1447 | 4.36 | 1445 | 4.36 | 1451 | 4.35 |
| 115.fds4 | 32 | 873 | 2.23 | 898 | 2.17 | 873 | 2.24 | 32 | 863 | 2.26 | 859 | 2.27 | 863 | 2.26 |
| 121.pop2 | 32 | 1440 | 2.87 | 1436 | 2.88 | 1441 | 2.86 | 32 | 1440 | 2.87 | 1436 | 2.88 | 1441 | 2.86 |
| 122.tachyon | 32 | 2056 | 1.36 | 2059 | 1.36 | 2055 | 1.36 | 32 | 2015 | 1.39 | 2022 | 1.38 | 2016 | 1.39 |
| 126.lammps | 32 | 1183 | 2.46 | 1209 | 2.41 | 1194 | 2.44 | 32 | 1183 | 2.46 | 1209 | 2.41 | 1194 | 2.44 |
| 127.wrf2 | 32 | 2675 | 2.91 | 2670 | 2.92 | 2677 | 2.91 | 32 | 1774 | 4.39 | 1786 | 4.36 | 1771 | 4.40 |
| 128.GAPgeofem | 32 | 667 | 3.10 | 668 | 3.09 | 667 | 3.10 | 32 | 667 | 3.10 | 668 | 3.09 | 667 | 3.10 |
| 129.tera_tf | 32 | 2077 | 1.33 | 2077 | 1.33 | 2076 | 1.33 | 32 | 1510 | 1.83 | 1511 | 1.83 | 1501 | 1.84 |
| 130.socorro | 32 | 1165 | 3.28 | 1166 | 3.27 | 1166 | 3.27 | 32 | 451 | 8.46 | 447 | 8.53 | 456 | 8.37 |
| 132.zeusmp2 | 32 | 1273 | 2.44 | 1325 | 2.34 | 1288 | 2.41 | 32 | 1273 | 2.44 | 1325 | 2.34 | 1288 | 2.41 |
| 137.lu | 32 | 1607 | 2.29 | 1600 | 2.30 | 1604 | 2.29 | 32 | 1607 | 2.29 | 1600 | 2.30 | 1604 | 2.29 |
| Hardware Summary | |
|---|---|
| Type of System: | Heterogeneous |
| Compute Nodes: | IBM System JS22 IBM System JS22 |
| Interconnects: | InfiniBand Ethernet |
| File Server Node: | IBM System JS22 |
| Head Node: | IBM System JS22 |
| Total Compute Nodes: | 4 |
| Total Chips: | 8 |
| Total Cores: | 16 |
| Total Threads: | 32 |
| Total Memory: | 80 GB |
| Base Ranks Run: | 32 |
| Minimum Peak Ranks: | 32 |
| Maximum Peak Ranks: | 32 |
| Software Summary | |
|---|---|
| C Compiler: | IBM XL C/C++ Enterprise Edition V9 for AIX Updated with the September 2008 Fix level |
| C++ Compiler: | IBM XL C/C++ Enterprise Edition V9 for AIX Updated with the September 2008 Fix level |
| Fortran Compiler: | IBM XL Fortran Enterprise Edition V11.1 for AIX Updated with the September 2008 Fix level |
| Base Pointers: | 32-bit |
| Peak Pointers: | 32/64-bit |
| MPI Library: | IBM Parallel Environment for AIX, Version 5 Release 1 |
| Other MPI Info: | None |
| Pre-processors: | None |
| Other Software: | IBM Engineering and Scientific Subroutine Library (ESSL) for AIX Version 4 Release 3 Updated with PTF Set 3 |
| Hardware | |
|---|---|
| Number of nodes: | 1 |
| Uses of the node: | compute, head, fileserver |
| Vendor: | IBM Corporation |
| Model: | IBM System JS22 |
| CPU Name: | POWER6 |
| CPU(s) orderable: | 4 cores per blade |
| Chips enabled: | 2 |
| Cores enabled: | 4 |
| Cores per chip: | 2 |
| Threads per core: | 2 |
| CPU Characteristics: | |
| CPU MHz: | 4000 |
| Primary Cache: | 64 KB I + 64 KB D on chip per core |
| Secondary Cache: | 4 MB I+D on chip per core |
| L3 Cache: | None |
| Other Cache: | None |
| Memory: | 32 GB (4x8 GB) DDR2 500 MHz |
| Disk Subsystem: | 1x146 GB SAS 15K RPM |
| Other Hardware: | BladeCenter-H chassis Voltaire 4X InfiniBand Pass-thru Module (P/N 43W4419) |
| Adapter: | 4X InfiniBand DDR Expansion Card (CFFh) for IBM BladeCenter (P/N 43W4423) |
| Number of Adapters: | 1 |
| Slot Type: | PCIe x8 Gen2 |
| Data Rate: | 4x DDR 20Gbps |
| Ports Used: | 1 |
| Interconnect Type: | InfiniBand |
| Software | |
|---|---|
| Adapter: | 4X InfiniBand DDR Expansion Card (CFFh) for IBM BladeCenter (P/N 43W4423) |
| Adapter Driver: | devices.pciex.b3157862.rte 6.1.2.0 |
| Adapter Firmware: | 2.3.0 |
| Operating System: | IBM AIX V6.1 with the 6100-02 Technology Level |
| Local File System: | AIX/JFS2 |
| Shared File System: | NFSv3 |
| System State: | Multi-user |
| Other Software: | None |
Blade[1] runs the following commands to compose the cluster:
mkdev -c management -s infiniband -t icm
/usr/sbin/mkiba -a 192.1.10.1 -m 255.255.255.0 -i ib0 -A iba0 -p 1 -P 0xFFFF -M 65532 -q 4000 -k off -Q 0x1E -S up
startsrc -s ctcas
preprpnode mpiblade1
mkrpdomain mpiblades mpiblade1 mpiblade2 mpiblade3 mpiblade4
startrpdomain mpiblades
cd /usr/lpp/ppe.poe/samples/nrt
make
chmod 4755 nrt_api
shutdown -rF
su spec
cd mpiblades.64ranks.load
../nrt_api -l
| Hardware | |
|---|---|
| Number of nodes: | 3 |
| Uses of the node: | compute |
| Vendor: | IBM Corporation |
| Model: | IBM System JS22 |
| CPU Name: | POWER6 |
| CPU(s) orderable: | 4 cores per blade |
| Chips enabled: | 2 |
| Cores enabled: | 4 |
| Cores per chip: | 2 |
| Threads per core: | 2 |
| CPU Characteristics: | |
| CPU MHz: | 4000 |
| Primary Cache: | 64 KB I + 64 KB D on chip per core |
| Secondary Cache: | 4 MB I+D on chip per core |
| L3 Cache: | None |
| Other Cache: | None |
| Memory: | 16 GB (4x4 GB) DDR2 667 MHz |
| Disk Subsystem: | 1x146 GB SAS 15K RPM |
| Other Hardware: | BladeCenter-H chassis Voltaire 4X InfiniBand Pass-thru Module (P/N 43W4419) |
| Adapter: | 4X InfiniBand DDR Expansion Card (CFFh) for IBM BladeCenter (P/N 43W4423) |
| Number of Adapters: | 1 |
| Slot Type: | PCIe x8 Gen2 |
| Data Rate: | 4x DDR 20Gbps |
| Ports Used: | 1 |
| Interconnect Type: | InfiniBand |
| Software | |
|---|---|
| Adapter: | 4X InfiniBand DDR Expansion Card (CFFh) for IBM BladeCenter (P/N 43W4423) |
| Adapter Driver: | devices.pciex.b3157862.rte 6.1.2.0 |
| Adapter Firmware: | 2.3.0 |
| Operating System: | IBM AIX V6.1 with the 6100-02 Technology Level |
| Local File System: | AIX/JFS2 |
| Shared File System: | NFSv3 |
| System State: | Multi-user |
| Other Software: | None |
Each blade runs the following commands to compose the cluster, where $CLUSTER_INDEX is 2-4 for Blade[2]-Blade[4]:
mkdev -c management -s infiniband -t icm
/usr/sbin/mkiba -a 192.1.10.$CLUSTER_INDEX -m 255.255.255.0 -i ib0 -A iba0 -p 1 -P 0xFFFF -M 65532 -q 4000 -k off -Q 0x1E -S up
startsrc -s ctcas
preprpnode mpiblade1
cd /usr/lpp/ppe.poe/samples/nrt
make
chmod 4755 nrt_api
shutdown -rF
su spec
cd mpiblades.64ranks.load
../nrt_api -l
| Hardware | |
|---|---|
| Vendor: | IBM Corporation |
| Model: | 4x DDR InfiniBand |
| Switch Model: | QLogic SilverStorm 9024 |
| Number of Switches: | 1 |
| Number of Ports: | 24 |
| Data Rate: | 4x DDR 20Gbps |
| Firmware: | 4.2.1.1.1 |
| Topology: | single switch |
| Primary Use: | MPI Communication |
| Hardware | |
|---|---|
| Vendor: | IBM Corporation |
| Model: | 4-port Gigabit Ethernet |
| Switch Model: | IBM BladeCenter 4-port Gigabit Ethernet switch module (P/N 26K6483) |
| Number of Switches: | 1 |
| Number of Ports: | 18 |
| Data Rate: | 1Gbps |
| Firmware: | 1.08 |
| Topology: | single switch |
| Primary Use: | File system |
Blade[1], with 32GB of memory and 32GB of paging space, was used to compile the benchmarks.
The config file option 'submit' was used.
submit = poe task_stride.2level.32+64rank 4 2 8 $ranks $command -procs $ranks -hostfile /spec/MapFiles/ib0hosts.8x.1-8
Environment settings:
All ulimits set to unlimited
ranks = 32
CWD = /spec/mpi2007
MEMORY_AFFINITY = MCM
XLFRTEOPTS = intrinthds=1
MP_PGMMODEL = spmd
MP_MSG_API = mpi
MP_DEVTYPE = ib
MP_CLOCK_SOURCE = AIX
MP_STDINMODE = none
MP_SHARED_MEMORY = yes
MP_SINGLE_THREAD = yes
MP_EUILIB = us
NRT_WINDOW_COUNT = 1
MP_RESD = no
MP_PULSE = 0
ADAPTER_USE = shared
EUIDEVICE = sn_single
MP_CSS_INTERRUPT = no
MP_BUFFER_MEM = 67108864
MP_USE_BULK_XFER = yes
MP_BULK_MIN_MSG_SIZE = 8192
MP_EAGER_LIMIT = 65536
MP_WAIT_MODE = yield
MP_INFOLEVEL = 0
MP_LABELIO = no
MP_STDOUTMODE = unordered
MP_PMDLOG = no
NRT_JOB_KEY = 64
| /usr/bin/mpcc_r |
| 126.lammps: | /usr/bin/mpCC_r |
| /usr/bin/mpxlf95_r |
| /usr/bin/mpcc_r /usr/bin/mpxlf95_r |
| 107.leslie3d: | -qfixed |
| 115.fds4: | -DSPEC_MPI_LC_NO_TRAILING_UNDERSCORE -qfixed |
| 121.pop2: | -DSPEC_MPI_AIX |
| 127.wrf2: | -DNOUNDERSCORE -DSPEC_MPI_AIX |
| 130.socorro: | -DSPEC_NO_UNDERSCORE -qcpluscmt |
| 132.zeusmp2: | -qfixed -DSPEC_SINGLE_UNDERSCORE |
| 137.lu: | -qfixed |
| -bmaxdata:0x80000000 -O5 -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K |
| 126.lammps: | -bmaxdata:0x80000000 -O5 |
| -bmaxdata:0x80000000 -O4 -qstrict -qalias=nostd -qhot=level=0 -qsave -bdatapsize:64K -bstackpsize:64K -btextpsize:64K |
| -bmaxdata:0x80000000 -O5 -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K -O4 -qstrict -qalias=nostd -qhot=level=0 -qsave |
| 104.milc: | basepeak = yes |
| 122.tachyon: | -O5 -lessl -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K -q64 |
| 126.lammps: | basepeak = yes |
| 107.leslie3d: | -O5 -bdatapsize:64K -bstackpsize:64K -btextpsize:64K -bmaxdata:0x70000000 |
| 113.GemsFDTD: | basepeak = yes |
| 129.tera_tf: | -O5 -qessl -lessl -bdatapsize:64K -bstackpsize:64K -btextpsize:64K |
| 137.lu: | basepeak = yes |
| 115.fds4: | -O5 -lessl -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K -qstrict -qalias=nostd -qhot=level=0 -qsave -q64 |
| 121.pop2: | basepeak = yes |
| 127.wrf2: | -O5 -bmaxdata:0x80000000 |
| 128.GAPgeofem: | basepeak = yes |
| 130.socorro: | -O5 -lessl -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K -qessl -bmaxdata:0x80000000 |
| 132.zeusmp2: | basepeak = yes |
| -w -qsuppress=1500-036 -qipa=noobject -qipa=threads |
| 126.lammps: | -w -qsuppress=1500-036 -qipa=noobject -qipa=threads |
| -w -qsuppress=1500-036 -qsuppress=cmpmsg -qspillsize=32648 |
| -w -qsuppress=1500-036 -qipa=noobject -qipa=threads -qsuppress=cmpmsg -qspillsize=32648 |