SPEChpc™ 2021 Result File Fields

Last updated: 6 July 2022 mec, original document by cgp

ABSTRACT
This document describes the contents and arrangement of a SPEChpc 2021 result disclosure. In this document, we will refer to the arrangement of fields in the HTML format of a SPEChpc 2021 result disclosure, since there may be differences in arrangement of result fields between the text, HTML and other report formats (CSV, PDF, PS, etc.). While the reports are formatted in a way that is intended to be self-explanatory to the reader, he/she may desire a formal statement of the meaning of a field, or have technical questions about information provided in the fields. Further, the SPEC website contain links from the fields of the published reports to their descriptions in this document.

The contents of the result reports are either generated by the run of the benchmarks, extracted from the configuration file that controls the building and running of the benchmarks, or are provided by descriptive fields filled in by the tester. These follow conventions that are specified in the separate documents on the Run Rules, Config Files, and XML Flags Files. Reports published on the SPEC website have been peer-reviewed by the members of the SPEC/HPG committee and are expected to be correct in every detail.

Abbreviated Contents

Selecting one of the following will take you to the detailed table of contents for that section or subsection:

1 SPEChpc 2021 Benchmarks

Application Name	Benchmark	Language	Approximate LOC	Application Area
Tiny	Small	Medium	Large
LBM D2Q37	505.lbm_t	605.lbm_s	705.lbm_m	805.lbm_l	C	9000	Computational Fluid Dynamics
SOMA Offers Monte-Carlo Acceleration	513.soma_t	613.soma_s	Not included.	C	9500	Physics / Polymeric Systems
Tealeaf	518.tealeaf_t	618.tealeaf_s	718.tealeaf_m	818.tealeaf_l	C	5400	Physics / High Energy Physics
Cloverleaf	519.clvleaf_t	619.clvleaf_s	719.clvleaf_m	819.clvleaf_l	Fortran	12,500	Physics / High Energy Physics
Minisweep	521.miniswp_t	621.miniswp_s	Not included.	C	17,500	Nuclear Engineering - Radiation Transport
POT3D	528.pot3d_t	628.pot3d_s	728.pot3d_m	828.pot3d_l	Fortran	495,000 (Includes HDF5 library)	Solar Physics
SPH-EXA	532.sph_exa_t	632.sph_exa_s	Not included.	C++14	3400	Astrophysics and Cosmology
HPGMG-FV	534.hpgmgfv_t	634.hpgmgfv_s	734.hpgmgfv_m	834.hpgmgfv_l	C	16,700	Cosmology, Astrophysics, Combustion
miniWeather	535.weather_t	635.weather_s	735.weather_m	835.weather_l	Fortran	1100	Weather

2. Result and Configuration Summary

2.1 Result Header

System Vendor	The vendor of the system under test.
System Name	The name of the system under test.
Hardware Availability	The date when all the hardware necessary to run the result is generally available. For example, if the CPU is available in Aug-2021, but the memory is not available until Oct-2021, then the hardware availability date is Oct-2021 (unless some other component pushes it out farther).
Software Availability	The date when all the software necessary to run the result is generally available. For example, if the operating system is available in Aug-2021, but the compiler or other libraries are not available until Oct-2021, then the software availability date is Oct-2021 (unless some other component pushes it out farther).
Test date	The date when the test is run. This value is recorded by SPEC tools; the time reported by the system under test is recorded in the raw result file.
Test sponsor	The name of the organization or individual that sponsored the test. Generally, this is the name of the license holder.
Tested by	The name of the organization or individual that ran the test. If there are installations in multiple geographic locations, sometimes that will also be listed in this field.

2.2 Performance Metrics

SPEChpc 2021_tny_peak	The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Tiny Suite.
SPEChpc 2021_tny_base	The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Tiny Suite.
SPEChpc 2021_sml_peak	The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Small Suite.
SPEChpc 2021_sml_base	The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Small Suite.
SPEChpc 2021_med_peak	The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Medium Suite.
SPEChpc 2021_med_base	The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Medium Suite.
SPEChpc 2021_lrg_peak	The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Large Suite.
SPEChpc 2021_lrg_base	The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Large Suite.

2.3 Results table

Benchmark	The name of the benchmarks making up this SPEChpc 2021 suites.
Parallel Model	This column indicates the parallel model being used by the benchmark. Parallel Models: MPI: Use MPI only without a node level parallel model. ACC: Use MPI+OpenACC. OMP: Use MPI+OpenMP using task/thread based directives. TGT: Use MPI+OpenMP using 'target' based directives.
Ranks	This column indicates the number of MPI ranks (processes) that were used during the running of the benchmark.
Threads per rank	This column indicates the number of host threads (OpenMP or OpenACC) per rank that were used during the running of the benchmark.
Seconds	This is the amount of elapsed (wall) time in seconds that the benchmark took to run from job submit to job completion.
Ratio	This is the ratio of benchmark run time (number of seconds) to the run time on the reference platform.

2.4 System Hardware Summary

Collective hardware details across the whole system. Run rules relating to these items can be found in section 4.2 of the SPEChpc 2021 Run and Reporting Rules.

Type of System	Description of the system being benchmarked: SMP, Homogeneous Cluster or Heterogeneous Cluster.
Compute Node	These systems are used as compute nodes during the run of the benchmark.
Interconnects	These devices are used for the interconnects during the benchmark run.
Compute Nodes Used	Number of compute nodes used to execute the benchmark.
Total Chips	The total number of chips in the compute nodes available to execute the benchmark.
Total Cores	The total number of cores in the compute nodes available to execute the benchmark.
Total Threads	The total number of threads in the compute nodes available to execute the benchmark.
Total Memory	The total amount of memory in all of the compute nodes available to execute the benchmark.

2.5 System Software Summary

Information on how the benchmark binaries are constructed. Run rules relating to these items can be found in section 4.2 of the SPEChpc 2021 Run and Reporting Rules.

Compiler	The names and versions of compilers used to generate the result.
MPI Library	The names and versions of MPI Libraries used to generate the result.
Other MPI Information	Any performance-relevant MPI information used to generate the result.
Other Software	Any performance-relevant non-compiler software used, including third-party libraries, accelerators, etc.
Base Parallel Model	The parallel model used in base.
Base Ranks	The number of MPI ranks used to execute the benchmark on the base optimization runs.
Base Threads per Rank	The number of host threads run per rank used to execute the benchmark on the base optimization runs.
Peak Parallel Models	The list of parallel models used in peak.
Minimum Peak Ranks	The smallest number of ranks used to execute the benchmark runs using peak optimizations.
Maximum Peak Ranks	The largest number of ranks used to execute the peak version of the benchmark.
Minimum Peak Threads per Rank	The smallest number of host threads per rank used to execute the benchmark runs using peak optimizations.
Maximum Peak Threads per Rank	The largest number of host threads per rank used to execute the peak version of the benchmark.

2.6 Internal Timer Table

As part of SPEC/HPG's follow-on SPEChpc weak scaling suite (currently under development), internal timers were added to the codes to measure MPI initializations overhead, application initialization overhead, the core computation time, and residual time. For weak scaling, the core compute time will be used to determine a throughput "Figure of Merit" (FOM) measurement of units of work over time.

For the current strong scaled suites, SPEC/HPG decided to optionally include this measurement as it may better help understanding scaling.

Reporting of the internal timing is disabled by default. To enable, either add "showtimer=1" to your config file, use the runhpc --showtimer=1 option, or edit the resulting "raw" (.rsf) file changing the "showtimer" field to 1 and use rawformat utility to reformat the reports.

Reported Time	Time in seconds reported by the SPEC tools which is used in computing the benchmark ratio. Same as Seconds
Start-up overhead	Reported time less Application Time. Captures the overhead time for node allocation, scheduled overhead, MPI start-up time, etc.
Initialization Time	Time the application spends initializing data, reading input files, performing domain decomposition, etc.
Core Compute Time	Time the application spends performaning it's core computation. Time includes MPI communication
Residual Time	Remaining application time not captured under intialization or core compute. Includes items such as verification of results or saving output data files.
Application Time	Time measured between MPI_Init and MPI_Finalize. Note that this time is measured by the tools and shown in the log files but not included in the Internal Timer Table given it is the summation of the Intialization, Core Compute, and Residual Time.

3 Node Description

SPEChpc 2021 is capable of running on large heterogeneous clusters containing different kinds of nodes linked by different kinds of interconnects. The report format contains a separate section for each kind of node and each kind of interconnect. Section 4.2 of the Run Rules document describes what information is to be provided by the tester.

For example, an SMP will consist of one node and no interconnect. Homogeneous cluster systems will typically consist of one kind of compute node and one or two kinds of interconnects. There will also often be a file server node. It is possible that the node and interconnect components are available from their respective vendors but no vendor sells the configured system as a whole; in this case the report is intended to provide enough detail to reconstruct an equivalent system with equivalent performance.

3.1 Node Hardware Description(s)

Number of nodes	The number of nodes of this type in the system.
Uses of the Node	The purpose of this type of node: compute node, file server, head node, etc.
Vendor	The manufacturer of this kind of node.
Model	The model name of this kind of node.
CPU Name	A manufacturer-determined formal name of the processor used in this node type.
CPU(s) orderable	The number of CPUs that can be ordered in this kind of node.
Chip(s)/CPU(s) enabled	The number of Chips (CPUs) that were enabled and active in the node during the benchmark run.
Core(s) enabled	The number of cores that were enabled and active in the node during the benchmark run.
Cores per Chip	The number of cores in each chip that were enabled and active in the node during the benchmark run.
Threads per Core	The number of threads in each core that were enabled and active in the node during the benchmark run.
CPU Characteristics	Technical characteristics to help identify the processor type used in the node.
CPU MHz	The clock frequency of the CPU used in the node, expressed in megahertz.
Primary Cache	Description (size and organization) of the CPU's primary cache. This cache is also referred to as "L1 cache".
Secondary Cache	Description (size and organization) of the CPU's secondary cache. This cache is also referred to as "L2 cache".
L3 Cache	Description (size and organization) of the CPU's tertiary, or "Level 3" cache.
Other Cache	Description (size and organization) of any other levels of cache memory.
Memory	Description of the system main memory configuration. End-user options that affect performance, such as arrangement of memory modules, interleaving, latency, etc, are documented here.
Disk Subsystem	A description of the disk subsystem (size, type, and RAID level if any) of the storage used to hold the benchmark tree during the run.
Other Hardware	Any additional equipment added to improve performance.
Accelerator Model	The model name of the accelerator(s).
Accelerator Count	The number of accelerators of each model.
Accelerator Vendor	The company/vendor of the accelerator.
Accelerator Type	The Describes the type of accelerator. Possible values include, but not limited to: GPU, APU, CPU, FPGA, etc.
Accelerator Connection	Tells how the accelerator is connected to the system. Possible descriptions include, but not limited to: PCIe, integrated, etc.
Accelerator ECC Enabled	Shows if the Accelerator uses ECC for its memory.
Accelerator Description	Further description of the accelertor.
Adapter Card(s)	There will be one of these groups of entries for each network adapter -- aka Host Channel Adapter (HCA) or Network Interface Card (NIC) -- used to connect to an interconnect to carry MPI or file server traffic. This field contains this adapter's vendor and model name.
Number of Adapters	How many of these adapters attach to the node.
Adapter Slot Type	The type of slot used to attach the adapter card to the node.
Data Rate	The per-port, nominal data transfer rate of the adapter.
Ports Used	The number of ports used to run the benchmark on the adapter (especially for those which have multiple ports available).
Interconnect Type	In general terms, the type of interconnect (Ethernet, InfiniBand, etc.) attached to this adapter.

3.2 Node Software Description

Adapter Driver	The driver type and level for this adapter.
Adaptor Firmware	The adaptor firmware type and level for this device.
Operating System	The operating system name and version. If there are patches applied that affect performance, they must be disclosed in the Notes.
Local File System	The type of the file system local to each compute node.
Shared File System	The type of the file system used to contain the run directories.
System State	The state (sometimes called "run level") of the system while the benchmarks were being run. Generally, this is "single user", "multi-user", "default", etc.
Other Software	Any performance-relevant non-compiler software used, including third-party libraries, accelerators, etc.
Accelerator Driver	The name and version of the software driver used to control the accelerator.

4 Interconnect Description

Vendor	The manufacturer(s) of this interconnect.
Model	The model name(s) of the interconnect as a whole, or components of it -- not including the switch model, which is the next field.
Switch Model(s)	The model and manufacturer of the switching element(s) of this interconnect. There may be more than one kind declared.
Number of switches	The number of switches of this type in the interconnect.
Ports per switch	The number of ports per switch available for carrying the type of traffic noted in the "Primary Use" field.
Data Rate	The per-port, nominal data transfer rate of the adapter.
Firmware	The Firmware type and level for the switch(es).
Topology	Description of the arrangement of switches and links in the interconnect.
Primary Use	The kind of data traffic carried by the interconnect: MPI, file server, etc.

5 Compilation Description

This section describes how the benchmarks are compiled. The HTML and PDF reports contain links from the settings that are listed, to the descriptions of those settings in the XML flags file report.

Much information is derived from compilation rules written into the config file and interpreted according to rules specified in the XML flags file. Free-form notes can be added to this. Sections only show up if the corresponding flags are used, such as peak optimization flags; otherwise the section is not printed. Section 2 of the MPI2007 Run and Reporting Rules document gives rules on how these items can be used in reportable runs.

6 Tester-provided notes

Notes/Tuning Information	Tester's free-form notes.
Compiler Notes	Tester's notes about any compiler-specific information (example: special paths, setup scripts, and so forth.)
Submit Notes	Tester's notes about how the config file submit option was used to assign processes to processors.
Portability Notes	Tester's notes about portability options and flags used to build the benchmarks.
Base Tuning Notes	Tester's notes about base optimization options and flags used to build the benchmarks.
Peak Tuning Notes	Tester's notes about peak optimization options and flags used to build the benchmarks.
Operating System Notes	Tester's notes about changes to the default operating system state and other OS tuning.
Platform Notes	Tester's notes about changes to the default hardware state and other non-OS tuning.
Component Notes	Tester's notes about components needed to build a particular system (for User-Built systems).
General Notes	Tester's notes about anything not covered in the other notes sections.
Compiler Version Notes	This section is automatically generated. It contains output from CC_VERSION_OPTION (and FC_VERSION_OPTION and CXX_VERSION_OPTION).

7 Errors

This section is automatically inserted by the benchmark tools when there are errors present that prevent the result from being a valid reportable result.