ABSTRACT
This document describes the contents and arrangement of a SPEChpc 2021 result disclosure.
In this document, we will refer to the arrangement of fields in the HTML
format of a SPEChpc 2021 result disclosure, since there may be
differences in arrangement of result fields between the text, HTML and
other report formats (CSV, PDF, PS, etc.).
While the reports are formatted in a way that is intended to be self-explanatory to the reader,
he/she may desire a formal statement of the meaning of a field, or have technical questions
about information provided in the fields.
Further, the SPEC website contain links from the fields of the published reports
to their descriptions in this document.
The contents of the result reports are either generated by the run of the benchmarks, extracted from the configuration file that controls the building and running of the benchmarks, or are provided by descriptive fields filled in by the tester. These follow conventions that are specified in the separate documents on the Run Rules, Config Files, and XML Flags Files. Reports published on the SPEC website have been peer-reviewed by the members of the SPEC/HPG committee and are expected to be correct in every detail.
(To check for possible updates to this document, please see http://www.spec.org/hpc2021/Docs/)
Selecting one of the following will take you to the detailed table of contents for that section or subsection:
2. Result and Configuration Summary
2.1 Results Header
2.3 Results Table
7. Errors
SPEChpc has 9 benchmarks (5 C, 1 C++, and 3 Fortran), organized into 4 suites by workload size: Tiny, Small, Medium, and Large.
Application Name | Benchmark | Language | Approximate LOC | Application Area | |||
---|---|---|---|---|---|---|---|
Tiny | Small | Medium | Large | ||||
LBM D2Q37 | 505.lbm_t | 605.lbm_s | 705.lbm_m | 805.lbm_l | C | 9000 | Computational Fluid Dynamics |
SOMA Offers Monte-Carlo Acceleration | 513.soma_t | 613.soma_s | Not included. | C | 9500 | Physics / Polymeric Systems | |
Tealeaf | 518.tealeaf_t | 618.tealeaf_s | 718.tealeaf_m | 818.tealeaf_l | C | 5400 | Physics / High Energy Physics |
Cloverleaf | 519.clvleaf_t | 619.clvleaf_s | 719.clvleaf_m | 819.clvleaf_l | Fortran | 12,500 | Physics / High Energy Physics |
Minisweep | 521.miniswp_t | 621.miniswp_s | Not included. | C | 17,500 | Nuclear Engineering - Radiation Transport | |
POT3D | 528.pot3d_t | 628.pot3d_s | 728.pot3d_m | 828.pot3d_l | Fortran | 495,000 (Includes HDF5 library) | Solar Physics |
SPH-EXA | 532.sph_exa_t | 632.sph_exa_s | Not included. | C++14 | 3400 | Astrophysics and Cosmology | |
HPGMG-FV | 534.hpgmgfv_t | 634.hpgmgfv_s | 734.hpgmgfv_m | 834.hpgmgfv_l | C | 16,700 | Cosmology, Astrophysics, Combustion |
miniWeather | 535.weather_t | 635.weather_s | 735.weather_m | 835.weather_l | Fortran | 1100 | Weather |
System Vendor | The vendor of the system under test. |
---|---|
System Name | The name of the system under test. |
Hardware Availability | The date when all the hardware necessary to run the result is generally available. For example, if the CPU is available in Aug-2021, but the memory is not available until Oct-2021, then the hardware availability date is Oct-2021 (unless some other component pushes it out farther). |
Software Availability | The date when all the software necessary to run the result is generally available. For example, if the operating system is available in Aug-2021, but the compiler or other libraries are not available until Oct-2021, then the software availability date is Oct-2021 (unless some other component pushes it out farther). |
Test date | The date when the test is run. This value is recorded by SPEC tools; the time reported by the system under test is recorded in the raw result file. |
Test sponsor | The name of the organization or individual that sponsored the test. Generally, this is the name of the license holder. |
Tested by | The name of the organization or individual that ran the test. If there are installations in multiple geographic locations, sometimes that will also be listed in this field. |
SPEChpc 2021_tny_peak | The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Tiny Suite. |
---|---|
SPEChpc 2021_tny_base | The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Tiny Suite. |
SPEChpc 2021_sml_peak | The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Small Suite. |
SPEChpc 2021_sml_base | The geometric mean of 9 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Small Suite. |
SPEChpc 2021_med_peak | The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Medium Suite. |
SPEChpc 2021_med_base | The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Medium Suite. |
SPEChpc 2021_lrg_peak | The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Large Suite. |
SPEChpc 2021_lrg_base | The geometric mean of 6 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Large Suite. |
More detailed information about the performance metrics may be found in section 4.3.1 of the SPEChpc 2021 Run and Reporting Rules.
Benchmark | The name of the benchmarks making up this SPEChpc 2021 suites. |
---|---|
Parallel Model |
This column indicates the parallel model being used by the benchmark. Parallel Models:
|
Ranks | This column indicates the number of MPI ranks (processes) that were used during the running of the benchmark. |
Threads per rank | This column indicates the number of host threads (OpenMP or OpenACC) per rank that were used during the running of the benchmark. |
Seconds | This is the amount of elapsed (wall) time in seconds that the benchmark took to run from job submit to job completion. |
Ratio | This is the ratio of benchmark run time (number of seconds) to the run time on the reference platform. |
Identifying the Median results:
For a reportable SPEChpc 2021 run, at least two iterations of each benchmark are run, and the median of the runs (lower of middle two, if even) is selected to be part of the overall metric. In output formats that support it, the medians in the result table are underlined in bold. The ".txt" report will mark each median score with an asterisk "*".
Each iteration in the SPEChpc 2021 benchmark suites will run each benchmark once, in order. For example, given benchmarks "110.aaa", "120.bbb", and "130.ccc", here's what you might see as the benchmarks were run if they were part of each suite:
SPEChpc 2021
Running (#1) 110.aaa ref base oct09a default Running (#1) 120.bbb ref base oct09a default Running (#1) 130.ccc ref base oct09a default Running (#2) 110.aaa ref base oct09a default Running (#2) 120.bbb ref base oct09a default Running (#2) 130.ccc ref base oct09a default Running (#3) 110.aaa ref base oct09a default Running (#3) 120.bbb ref base oct09a default Running (#3) 130.ccc ref base oct09a default
When you read the results table from a run, the results in the results table are listed in the order that they were run, in column-major order. In other words, if you're interested in the base scores as they were produced, start in the upper-lefthand column and read down the first column, then read the middle column, then the right column.
If the benchmarks were run with both base and peak tuning, all base runs were completed before starting peak.
Collective hardware details across the whole system. Run rules relating to these items can be found in section 4.2 of the SPEChpc 2021 Run and Reporting Rules.
Type of System | Description of the system being benchmarked: SMP, Homogeneous Cluster or Heterogeneous Cluster. |
---|---|
Compute Node | These systems are used as compute nodes during the run of the benchmark. |
Interconnects | These devices are used for the interconnects during the benchmark run. |
Compute Nodes Used | Number of compute nodes used to execute the benchmark. |
Total Chips | The total number of chips in the compute nodes available to execute the benchmark. |
Total Cores | The total number of cores in the compute nodes available to execute the benchmark. |
Total Threads | The total number of threads in the compute nodes available to execute the benchmark. |
Total Memory | The total amount of memory in all of the compute nodes available to execute the benchmark. |
Information on how the benchmark binaries are constructed. Run rules relating to these items can be found in section 4.2 of the SPEChpc 2021 Run and Reporting Rules.
Compiler | The names and versions of compilers used to generate the result. |
---|---|
MPI Library | The names and versions of MPI Libraries used to generate the result. |
Other MPI Information | Any performance-relevant MPI information used to generate the result. |
Other Software | Any performance-relevant non-compiler software used, including third-party libraries, accelerators, etc. |
Base Parallel Model | The parallel model used in base. |
Base Ranks | The number of MPI ranks used to execute the benchmark on the base optimization runs. |
Base Threads per Rank | The number of host threads run per rank used to execute the benchmark on the base optimization runs. |
Peak Parallel Models | The list of parallel models used in peak. |
Minimum Peak Ranks | The smallest number of ranks used to execute the benchmark runs using peak optimizations. |
Maximum Peak Ranks | The largest number of ranks used to execute the peak version of the benchmark. |
Minimum Peak Threads per Rank | The smallest number of host threads per rank used to execute the benchmark runs using peak optimizations. |
Maximum Peak Threads per Rank | The largest number of host threads per rank used to execute the peak version of the benchmark. |
As part of SPEC/HPG's follow-on SPEChpc weak scaling suite (currently under development), internal timers were added to the codes to measure MPI initializations overhead, application initialization overhead, the core computation time, and residual time. For weak scaling, the core compute time will be used to determine a throughput "Figure of Merit" (FOM) measurement of units of work over time.
For the current strong scaled suites, SPEC/HPG decided to optionally include this measurement as it may better help understanding scaling.
The internal timing information may only be used for academic and research purposes. or as a derived value per SPEC's Fair Use Rules
Reporting of the internal timing is disabled by default. To enable, either add "showtimer=1" to your config file, use the runhpc --showtimer=1 option, or edit the resulting "raw" (.rsf) file changing the "showtimer" field to 1 and use rawformat utility to reformat the reports.
Reported Time | Time in seconds reported by the SPEC tools which is used in computing the benchmark ratio. Same as Seconds |
---|---|
Start-up overhead | Reported time less Application Time. Captures the overhead time for node allocation, scheduled overhead, MPI start-up time, etc. |
Initialization Time | Time the application spends initializing data, reading input files, performing domain decomposition, etc. |
Core Compute Time | Time the application spends performaning it's core computation. Time includes MPI communication |
Residual Time | Remaining application time not captured under intialization or core compute. Includes items such as verification of results or saving output data files. |
Application Time | Time measured between MPI_Init and MPI_Finalize. Note that this time is measured by the tools and shown in the log files but not included in the Internal Timer Table given it is the summation of the Intialization, Core Compute, and Residual Time. |
SPEChpc 2021 is capable of running on large heterogeneous clusters containing different kinds of nodes linked by different kinds of interconnects. The report format contains a separate section for each kind of node and each kind of interconnect. Section 4.2 of the Run Rules document describes what information is to be provided by the tester.
For example, an SMP will consist of one node and no interconnect. Homogeneous cluster systems will typically consist of one kind of compute node and one or two kinds of interconnects. There will also often be a file server node. It is possible that the node and interconnect components are available from their respective vendors but no vendor sells the configured system as a whole; in this case the report is intended to provide enough detail to reconstruct an equivalent system with equivalent performance.
Description of the hardware configuration of the node.
Number of nodes | The number of nodes of this type in the system. |
---|---|
Uses of the Node | The purpose of this type of node: compute node, file server, head node, etc. |
Vendor | The manufacturer of this kind of node. |
Model | The model name of this kind of node. |
CPU Name | A manufacturer-determined formal name of the processor used in this node type. |
CPU(s) orderable | The number of CPUs that can be ordered in this kind of node. |
Chip(s)/CPU(s) enabled | The number of Chips (CPUs) that were enabled and active in the node during the benchmark run. |
Core(s) enabled | The number of cores that were enabled and active in the node during the benchmark run. |
Cores per Chip | The number of cores in each chip that were enabled and active in the node during the benchmark run. |
Threads per Core | The number of threads in each core that were enabled and active in the node during the benchmark run. |
CPU Characteristics | Technical characteristics to help identify the processor type used in the node. |
CPU MHz | The clock frequency of the CPU used in the node, expressed in megahertz. |
Primary Cache | Description (size and organization) of the CPU's primary cache. This cache is also referred to as "L1 cache". |
Secondary Cache | Description (size and organization) of the CPU's secondary cache. This cache is also referred to as "L2 cache". |
L3 Cache | Description (size and organization) of the CPU's tertiary, or "Level 3" cache. |
Other Cache | Description (size and organization) of any other levels of cache memory. |
Memory | Description of the system main memory configuration. End-user options that affect performance, such as arrangement of memory modules, interleaving, latency, etc, are documented here. |
Disk Subsystem | A description of the disk subsystem (size, type, and RAID level if any) of the storage used to hold the benchmark tree during the run. |
Other Hardware | Any additional equipment added to improve performance. |
Accelerator Model | The model name of the accelerator(s). |
Accelerator Count | The number of accelerators of each model. |
Accelerator Vendor | The company/vendor of the accelerator. |
Accelerator Type | The Describes the type of accelerator. Possible values include, but not limited to: GPU, APU, CPU, FPGA, etc. |
Accelerator Connection | Tells how the accelerator is connected to the system. Possible descriptions include, but not limited to: PCIe, integrated, etc. |
Accelerator ECC Enabled | Shows if the Accelerator uses ECC for its memory. |
Accelerator Description | Further description of the accelertor. |
Adapter Card(s) | There will be one of these groups of entries for each network adapter -- aka Host Channel Adapter (HCA) or Network Interface Card (NIC) -- used to connect to an interconnect to carry MPI or file server traffic. This field contains this adapter's vendor and model name. |
Number of Adapters | How many of these adapters attach to the node. |
Adapter Slot Type | The type of slot used to attach the adapter card to the node. |
Data Rate | The per-port, nominal data transfer rate of the adapter. |
Ports Used | The number of ports used to run the benchmark on the adapter (especially for those which have multiple ports available). |
Interconnect Type | In general terms, the type of interconnect (Ethernet, InfiniBand, etc.) attached to this adapter. |
Software configuration of the node.
Adapter Driver | The driver type and level for this adapter. |
---|---|
Adaptor Firmware | The adaptor firmware type and level for this device. |
Operating System | The operating system name and version. If there are patches applied that affect performance, they must be disclosed in the Notes. |
Local File System | The type of the file system local to each compute node. |
Shared File System | The type of the file system used to contain the run directories. |
System State | The state (sometimes called "run level") of the system while the benchmarks were being run. Generally, this is "single user", "multi-user", "default", etc. |
Other Software | Any performance-relevant non-compiler software used, including third-party libraries, accelerators, etc. |
Accelerator Driver | The name and version of the software driver used to control the accelerator. |
Description of the configuration of the interconnect.
Vendor | The manufacturer(s) of this interconnect. |
---|---|
Model | The model name(s) of the interconnect as a whole, or components of it -- not including the switch model, which is the next field. |
Switch Model(s) | The model and manufacturer of the switching element(s) of this interconnect. There may be more than one kind declared. |
Number of switches | The number of switches of this type in the interconnect. |
Ports per switch | The number of ports per switch available for carrying the type of traffic noted in the "Primary Use" field. |
Data Rate | The per-port, nominal data transfer rate of the adapter. |
Firmware | The Firmware type and level for the switch(es). |
Topology | Description of the arrangement of switches and links in the interconnect. |
Primary Use | The kind of data traffic carried by the interconnect: MPI, file server, etc. |
This section describes how the benchmarks are compiled. The HTML and PDF reports contain links from the settings that are listed, to the descriptions of those settings in the XML flags file report.
Much information is derived from compilation rules written into the config file and interpreted according to rules specified in the XML flags file. Free-form notes can be added to this. Sections only show up if the corresponding flags are used, such as peak optimization flags; otherwise the section is not printed. Section 2 of the MPI2007 Run and Reporting Rules document gives rules on how these items can be used in reportable runs.
Base & Peak Unknown Flags |
This section lists flags, used in the base or peak compiles, that were not recognized by the report generation. Results with unknown flags are marked "invalid" and may not be published. Likely the flagsurl parameter was not set correctly, or details need to be added to the XML flags file. The "invalid" marking may be removed by reformatting the result using a flags file that describes all of the unknown flags. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Base & Peak Forbidden Flags |
This section lists flags, used in the base or peak compiles, that are designated as Forbidden in the XML flags file for the benchmark or the platform. Results with forbidden flags are marked "invalid" and may not be published. | ||||||||
Base & Peak Compiler Invocation | This section describes how the compilers are invoked, whether any special paths had to be used or flags were passed, etc. | ||||||||
Base & Peak Portability Flags |
This section describes the portability settings that are used to build the benchmarks.
Optimization settings are not listed here.
| ||||||||
Base & Peak Optimization Flags |
This section describes the optimizations settings that are used to
build the benchmark binaries for the base and peak runs.
| ||||||||
Base & Peak Other Flags |
This section describes the other settings that are used to build or run the benchmark binaries for the base and peak runs. These are classified as being neither portability nor optimization settings.
|
Notes/Tuning Information | Tester's free-form notes. |
---|---|
Compiler Notes | Tester's notes about any compiler-specific information (example: special paths, setup scripts, and so forth.) |
Submit Notes | Tester's notes about how the config file submit option was used to assign processes to processors. |
Portability Notes | Tester's notes about portability options and flags used to build the benchmarks. |
Base Tuning Notes | Tester's notes about base optimization options and flags used to build the benchmarks. |
Peak Tuning Notes | Tester's notes about peak optimization options and flags used to build the benchmarks. |
Operating System Notes | Tester's notes about changes to the default operating system state and other OS tuning. |
Platform Notes | Tester's notes about changes to the default hardware state and other non-OS tuning. |
Component Notes | Tester's notes about components needed to build a particular system (for User-Built systems). |
General Notes | Tester's notes about anything not covered in the other notes sections. |
Compiler Version Notes | This section is automatically generated.
It contains output from CC_VERSION_OPTION (and FC_VERSION_OPTION and CXX_VERSION_OPTION). |
This section is automatically inserted by the benchmark tools when there are errors present that prevent the result from being a valid reportable result.
Copyright © 2022 Standard Performance Evaluation Corporation
All Rights Reserved