SPEChpc™ 2021 Frequently Asked Questions (FAQ)

This document has frequently asked technical questions and answers. The latest version of this document may be found at http://www.spec.org/hpg/hpc2021/Docs/faq.html.

If you are looking for the list of known problems with SPEChpc, please see http://www.spec.org/hpg/hpc2021/Docs/known-problems.html.html.

Contents

Requirements

Require.01 How much memory do I need?

Require.02 Does this work with Windows?

Require.03 What software do I need?

Require.04 How many ranks/nodes do I need?

Installation

Install.01 ./install.sh: /bin/sh: bad interpreter: Permission denied

Install.02 The DVD drive is on system A, but I want to install on system B. What do I do?

Install.03 Do I need to be root?

Install.04 Can I install SPEChpc 2021 on a read-only file system?

Install.05 Install fails with Error re-homing the benchmark tools, libnsl.so.1

runhpc

runhpc.01 Can't locate strict.pm

runhpc.02 specperl: bad interpreter: No such file or directory

runhpc.03 Do I need to be root?

Building benchmarks

Build.01 Why is it rebuilding the benchmarks?

Build.02 The directives aren't tuned for my achitecture. What can I do?

Setting up

Setup.01 hash doesn't match after copy

Setup.02 Copying executable failed

Running benchmarks

Run.01 Why does this benchmark take so long to run?

Run.02 Why was there this cryptic message from the operating system?

Run.03 What happens with the compilation of accelerator code?

Run.04 Can I run on a 32-bits system?

Run.05 My runtimes vary quite a lot. Is there a way to fix it?

Run.06 How do I run on a particular accelerator device?

Miscompares

Miscompare.01 I got a message about a miscompare

Miscompare.02 The benchmark took less than 1 second

Miscompare.03 The .mis file says "short"

Miscompare.04 My compiler is generating bad code!

Miscompare.05 The code is bad even with low optimization!

Miscompare.06 The .mis file is just a bunch of numbers.

Results reporting

Results.01 It's hard to cut/paste into my spreadsheet

Results.02 What is a "flags file"? What does Unknown Flags mean?

Results.03 Submission Check -> FAILED

Results.04 Why does the report have an (*) that says ...

Power

Power.01 Where's the power metrics?

Requirements

Require.01 q. How much memory do I need?

a. The system requirements may be found in the document system-requirements.html. Currently, the expected minimum amount of memory is 60 GB for the Tiny workload, 480 GB for the Small workload, 4 TB for the Medium workload, and 14.5 TB for the Large workload. Though the exact memory usage will vary per benchmark and the number of ranks and threads.

Require.02 q. Does this work with Windows?

a. The SPEChpc suite has been tested on a number of platforms, but Windows is not one of them. Because of how this benchmark shares components with SPEC CPU benchmarks, it is possible that it might work on Windows, however, very unlikely. If you buy this benchmark and expect it to work on Windows, SPEC will not be able to support you because it is not a supported operating system.

Require.03 q. What software do I need?

a. The system requirements may be found in the document system-requirements.html. You will need a MPI installation configured for the system and compiler you wish to use.
If you want to test the OpenACC model, you will need a compiler that supports OpenACC.
If you want to test the OpenMP model, you will need a compiler that supports OpenMP.
OpenMP is supported in two different configurations. "Threaded" uses common OpenMP directives that are meant for node level parallelism on the host CPU. OpenMP "Target" includes additional directive based on the OpenMP 5.0 standard and is intended for target offload to an accelerator device. The "target" device could also be the host CPU, however, in some cases how the OpenMP directives are applied could bias performance towards CPU or an accelerator.

Require.04 q. How many ranks/nodes do I need?

a. SPEChpc has no specific limits on the number of MPI ranks or nodes required to run the benchmarks. The main limitation is the amount of memory required by the benchmarks (60GB for Tiny, 480GB for Small,, 4TB for Medium, 14.5TB for Large). However, the Tiny workload is mainly intended for use on a single node (though multiple nodes can be used as well) and the reference result (run on TU-Dresden's Taurus system) uses 24 ranks on a single node. The Small workload is meant for a more dense compute node or a few nodes. The small reference result uses 240 ranks (10 nodes) The Medium workload is intended for use on medium sized clusters but could be run on a single node if you have enough memory. The medium reference result uses 2040 ranks on 85 nodes. The Large workload is intended for large sized clusters with the reference result using 9600 ranks on 400 nodes.

Installation

Install.01 q. Why am I getting a message such as "./install.sh: /bin/sh: bad interpreter: Permission denied"?

a. If you are installing from a DVD you created, check to be sure that your operating system allows programs to be executed from the DVD. For example, some Linux man pages for mount suggest setting the properties for the CD or DVD drive in /etc/fstab to "/dev/cdrom /cd iso9660 ro,user,noauto,unhide", which is notably missing the property "exec". Add exec to that list in /etc/fstab, or add it to your mount command. Notice that the sample Linux mount command in install-guide-unix.html does include exec.

Perhaps install.sh lacks permission to run because you tried to copy all the files from the DVD, in order to move them to another system. If so, please don't do that. There's an easier way. See the next question.

Install.02 q. The DVD drive is on system A, but I want to install on system B. What do I do?

a. The installation guides have an appendix just for you, which describe installing from the network or installing from a tarfile. See Appendix 1 in install-guide-unix.html.

Install.03 q. Do I need to be root?

a. Occasionally, users of Unix systems have asked whether it is necessary to elevate privileges, or to become 'root', prior to installing or running SPEChpc.

SPEC recommends (*) that you do not become root, because: (1) To the best of SPEC's knowledge, no component of SPEChpc needs to modify system directories, nor does any component need to call privileged system interfaces. (2) Therefore, if it appears that there is some reason why you need to be root, the cause is likely to be outside the SPEC toolset - for example, disk protections, or quota limits. (3) For safe benchmarking, it is better to avoid being root, for the same reason that it is a good idea to wear seat belts in a car: accidents happen, humans make mistakes. For example, if you accidentally type:

kill 1

when you meant to say:

kill %1

then you will be very grateful if you are not privileged at that moment.

(*) This is only a recommendation, not a requirement nor a rule.

Install.04 q. Can I install SPEChpc on a read-only file system?

a. Yes. This is a new feature in SPEChpc. However when using a read-only file system to install the benchmarks, you must include output_root in your config file with a path to a writeable location. Also, your config file will not be located in the installation directory's config directory so runhpc will not be able to find it unless you provide the full path of the config file on the runhpc command line: runhpc -c /path/to/myconfig.cfg ...

Install.05 q. Install fails with Error re-homing the benchmark tools, libnsl.so.1

If you encounter a Error re-homing the benchmark tools during installation, it's likely that specperl is failing due the dependent 'libnsl.so.1' library not being installed on your system.

'libnsl.so.1' has been deprecated on new Linux OS however the SPEC tools use it in order to maintain compatibility on older OS. Most new OS do install libnsl.so.1 for compatibility (often as a link to libnsl.so.2), but RHEL 8.1 does not by default.

To work around, either install libnsl.so.1 or use the 'linux-x86_64-rhel8' toolset which links against libnsl.so.2. libnsl.so.2 linked toolsets are not currently provided for ARM or POWER ISA.

runhpc

runhpc.01 q. When I say runhpc, why does it say Can't locate strict.pm? For example:

Can't locate strict.pm in @INC (@INC contains: .) at runhpc line 28.
BEGIN failed--compilation aborted at runhpc line 28.

a. You can't use runhpc if its path is not set correctly. On Unix, Linux, or Mac OS X, you should source shrc or cshrc, as described in Install Guide section 6.

runhpc.02 q. Why am I getting messages about specperl: bad interpreter? For example:

bash: /hpc2021/bin/runhpc: /hpc2021/bin/specperl: bad interpreter: No such file or directory

a. Did you move the directory where runhpc was installed? If so, you can probably put everything to rights, just by going to the new top of the directory tree and typing "bin/relocate".

For example, the following unwise sequence of events is repaired after completion of the final line.

Top of SPEC benchmark tree is '/hpc2021'
Everything looks okay.  cd to /hpc2021, source the shrc file and have at it!
$ cd /hpc2021
$ . ./shrc
$ cd ..
$ mv hpc2021 hpc2021.new
$ runhpc -h | head
bash: runhpc: command not found
$ cd hpc2021.new/
$ . ./shrc
$ runhpc --help | head
bash: /hpc2021.new/bin/runhpc: /hpc2021/specperl: bad interpreter: No such file or directory
$ bin/relocate

runhpc.03 Do I need to be root?

a. Regarding the root account, the answer for runhpc is the same as the answer for installation question #3, above.

Building benchmarks

Build.01 q. Why is it rebuilding the benchmarks?

a. You changed something, and the tools thought that it might affect the generated binaries. See the section about automatic rebuilds in the config.html document.

Build.02 q. The directives aren't tuned for my achitecture. What can I do?

a. During development, SPEC/HPG made every effort to provide performace portable versions of the OpenMP and OpenACC directives. It's the main reason why we provide two OpenMP versions. The "Thread" version (pmodel=OMP) is tuned for the host and "Target" version (pmodel=TGT) is more tuned for accelerators. However, given the wide variety of architectures, the many various OpenMP directives that could be applied, as well as differing level of compiler support for those directives, the SPEChpc benchmarks use of the directives may not be ideally tuned for every architecuture.

To help, SPEChpc does allows limited changes to the directives for Peak runs. See Run Rule 2.4.5 Directive Modifications. For example, the current source codes use the OpenMP "SIMD" directive in only a few spots. You may find your compiler and architecture needs this added directive to take full advantage of SIMD vectorization.

Why only allow changes in Peak?

This is mainly due to the underlying philosophy behind Base and Peak. Base is looking at what the performance is if the user knows little about the system, compiler, or codes. It's most useful for comparing general performance between systems. Peak looks for what the optmial performance would be if more information is known about the system, compilers, and individual benchmarks. Thus Peak allows for a finer level of tuning.

More information about Base and Peak can be found in Run Rule Section 1.4.

Setting up

Setup.01 q. What does hash doesn't match after copy mean?

I got this strange, difficult to reproduce message:
    hash doesn't match after copy ... in copy_file (1 try total)! Sleeping 2 seconds...
followed by several more tries and sleeps. Why?

a. During benchmark setup, certain files are checked. If they don't match what they are expected to, you might see this message. Check:

If the condition persists, try turning up the verbosity level. Look at the files with other tools; do they exist? Can you see differences? Try a different disk and controller. And, check for the specific instance of this message described in the next question.

Setup.02q. Why does it say ERROR: Copying executable failed?

I got this strange, difficult to reproduce message:
    ERROR: Copying executable to run directory FAILED
or
    ERROR: Copying executable from build dir to exe dir FAILED!
along with the bit about hashes not matching from the previous question. Why?

a. Perhaps you have attempted to build the same benchmark twice in two simultaneous jobs.

On most operating systems, the SPEC tools don't mind concurrent jobs. They use your operating system's locking facilities to write the correct outputs to the correct files, even if you fire off many runhpc commands at the same time.

But there's one case of simultaneous building that is difficult for the tools to defend against: please don't try to build the very same executable from two different jobs at the same time. Notice that if you say something like this:

$ tail myconfig.cfg
605.lbm_s=peak:
basepeak=yes
$ runhpc --config myconfig --size test --tune base 605.lbm_s &
$ runhpc --config myconfig --size test --tune peak 605.lbm_s &

then you are trying to build the same benchmark twice in two different jobs, because of the presence of basepeak=yes. Please don't try to do that.

Running benchmarks

Run.01 q. Why does this benchmark suite take so long to run?

a. Please understand that the suite has been designed to do many things and be useful for at least several years. Benchmarks that seem slow today probably will not seem slow at the end of life of the suite. In addition, benchmarks could be compute-intensive or memory-intensive. Especially when they are memory-intensive please check with the compiler vendor if there are any specific memory policy related flags that needs to be turned on to maximize performance.

Also for SPEChpc, we needed to use workloads large enough to allow for scaling to larger rank counts as well as accelerators. So what may take a while on one or two nodes using only MPI, could be much faster on dozens of nodes or when using accelerators.

Run.02 q. Why was there this cryptic message from the operating system?

a. If you are getting cryptic, hard-to-reproduce, unpredictable error messages from your system, one possible reason may be that the benchmarks consume substantial resources of several types. If an OS runs out of some resource - for example, pagefile space, or process heap space - it might not give you a very clear message. Instead, you might see only a very brief message, or a dialog box with a hex error code in it. Please see the hints and suggestions in the section about resources in system-requirements.html.

Run.03 q. Can I run on a 32-bit system?

a. The benchmarks have been tested extensively as 64-bit binaries on a range of systems, but it is highly unlikely that they could be run as 32-bit binaries due to memory constraints. All codes presume a 64-bit ABI.

Run.05 q. My runtimes vary quite a lot. Is there a way to fix it?

a. This usually happens on multi-socket systems when your host process runs on a different socket from your accelerator. Try pinning the process and threads (via submit) to the right socket using the NUMA tool of your choice.

Run.06 q. How do I run on a particular accelerator device?

a. The benchmarks include code to automatically assign MPI ranks to the available accelerator device on a node when using OpenACC or OpenMP with target offload using the following logic:

  1. Determine the local rank number.
  2. Query the appropriate API call to determine the number of accelerator devices on the node.
  3. Use a mod operation (local_rank mod num_devices) to get the device number to use. If there are more ranks than available devices, multiple ranks will be assigned to the same device.
  4. Assign the rank to the device via an API call.

To change which devices are used and the order they are assigned, you will need to use an appropriate method from the accelerator vendor.

For example if using an NVIDIA device, you could set the environment variable CUDA_VISIBLE_DEVICES to a different order than the default. Setting "CUDA_VISIBLE_DEVICES=3,1,2,0" will remap the device numbering where local rank 0 uses device 3, rank 1 uses device 1, rank 2 uses device 2, and rank 3 uses device 0.

Miscompares

Miscompare.01 q. I got a message about a miscompare. The tools said something like:

Running Benchmarks
  Running 350.md ref base 12.3 default 
/spec/accel/bin/specinvoke -d /spec/accel/benchspec/ACCEL/350.md/run/run_base_ref_12.3.0000 
-e speccmds.err -o speccmds.stdout -f speccmds.cmd -C -q
/spec/accel/bin/specinvoke -E -d /spec/accel/benchspec/ACCEL/350.md/run/run_base_ref_12.3.0000 
-c 1 -e compare.err -o compare.stdout -f compare.cmd -k

*** Miscompare of md.log.01228060000; for details see
    /spec/accel/benchspec/ACCEL/350.md/run/run_base_ref_12.3.0000/md.log.01228060000.mis
Error: 1x350.md
Producing Raw Reports
mach: default
  ext: 12.3
    size: ref
      set: openacc

Why did it say that? What's the problem?

a. We don't know. Many things can cause a benchmark to miscompare, so we really can't tell you exactly what's wrong based only on the fact that a miscompare occurred.

But don't panic.

Please notice, if you read the message carefully, that there's a suggestion of a very specific file to look in. It may be a little hard to read if you have a narrow terminal window, as in the example above, but if you look carefully you'll see that it says:

*** Miscompare of md.log.01228060000; for details see
    /spec/accel/benchspec/ACCEL/350.md/run/run_base_ref_12.3.0000/md.log.01228060000.mis

Now is the time to look inside that file. Simply doing so may provide a clue as to the nature of your problem.

On Unix systems, change your current directory to the run directory using the path mentioned in the message, for example:

cd /spec/accel/benchspec/ACCEL/350.md/run/run_base_ref_12.3.0000

Then, have a look at the file that was mentioned, using your favorite text editor. If the file does not exist, then check your paths, and check to see whether you have run out of disk space.

Miscompare.02 q. The benchmark ran, but it took less than 1 second and there was a miscompare. Help!

a. If the benchmark took less than 1 second to execute, it didn't execute properly. There should be one or more .err files in the run directory which will contain some clues about why the benchmark failed to run. Common causes include libraries that were used for compilation but not available during the run, executables that crash with access violations or other exceptions, and permissions problems. See also the suggestions in the next question.

Miscompare.03 q. I looked in the .mis file and it said something like:

   'lbm_s.log.01228060000' short

What does "short" mean?

a. If a line like the above is the only line in the .mis file, it means that the benchmark failed to produce any output. In this case, the corresponding error file (look for files with .err extensions in the run directory) may have a clue. In this case, it was Segmentation Fault - core dumped. For problems like this, the first things to examine are the portability flags used to build the benchmark.

Have a look at the sample config files in $SPEC/config. If you constructed your own config file based on one of those, maybe you picked a starting point that was not really appropriate. Have a look at other samples in that directory. Check at www.spec.org/hpc/hpc2021 to see if there have been any result submissions using the platform that you are trying to test. If so, compare your portability flags to the ones in the the config files for those results.

If the portability flags are okay, your compiler may be generating bad code.

Miscompare.04 q. My compiler is generating bad code! Help!

a. Try reducing the optimization that the compiler is doing. Instructions for doing this will vary from compiler to compiler, so it's best to ask your compiler vendor for advice if you can't figure out how to do it for yourself.

Miscompare.05 q. My compiler is generating bad code with low or no optimization! Help!

a. If you're using a beta compiler, try dropping down to the last released version, or get a newer copy of the beta. If you're using a version of GCC that shipped with your OS, you may want to try getting a "vanilla" (no patches) version and building it yourself.

Miscompare.06 q. I looked in the .mis file and it was just full of a bunch of numbers.

a. In this case, the benchmark is probably running, but it's not generating answers that are within the tolerances set. See the suggestions for how to deal with compilers that generate bad code in the previous two questions. In particular, you might see if there is a way to encourage your compiler to be careful about optimization of floating point expressions.

Results reporting

Results.01 q. It's hard to cut/paste into my spreadsheet

a. Please don't do that. With SPEChpc, there's a handy .csv format file right next to the other result formats on the index page. Or, you can go up to the top of your browser and change the .pdf (or .whichever) to .csv

Results.02 q. What is a "flags file"? What does the message Unknown Flags mean in a report?

a. SPEChpc provides benchmarks in source code form, which are compiled under control of SPEC's toolset. Compilation flags (such as -O5 or -unroll) are detected and reported by the tools with the help of flag description files. Therefore, to do a complete run, you need to (1) point to an existing flags file (easy) or (2) modify an existing flags file (slightly harder) or (3) write one from scratch (definitely harder).

  1. Find an existing flags file by noticing the address of the .xml file at the bottom of any result published at www.spec.org/hpg/hpc2021. You can use the --flagsurl switch to point your own runhpc command at that file, or you can reference it from your config file with the flagsurl option. For example:
       runhpc --config=myamdconfig --flagsurl=http://www.spec.org/hpg/hpc2021/flags/amd2021_flags.xml int
  2. You can download the .xml flags file referenced at the bottom of any published result at www.spec.org/hpg/hpc2021. Warning: clicking on the .xml link may just confuse your web browser; it's probably better to use whatever methods your browser provides to download a file without viewing it - for example, control-click in Safari, right click in Firefox. Then, look at it with a text editor.
  3. You can write your own flags file by following the instructions in flag-description.html.

Notice that you do not need to re-run your tests if the only problem was Unknown flags. You can just use runhpc --rawformat --flagsurl

Results.03 q. What's all this about Submission Check -> FAILED littering my log file and my screen?

At the end of my run, why did it print something like this?

format: Submission Check -> FAILED.  Found the following errors:
        - The "hw_memory" field is invalid.
            It must contain leading digits, followed by a space,
            and a standard unit abbreviation.  Acceptable
            abbreviations are KB, MB, GB, and TB.
           The current value is "20480 Megabytes".

a. A complete, reportable result has various information filled in for readers. These fields are listed in the table of contents for config.html. If you wish to submit a result to SPEC for publication at www.spec.org/hpg/hpc2021, these fields not only have to be filled in; they also have to follow certain formats. Although you are not required to submit your result to SPEC, for convenience the tools try to tell you as much as they can about how the result should be improved if you were to submit it. In the above example, the tools would stop complaining if the field hw_memory said something like "20 GB" instead of "20480 Megabytes".

Notice that you can repair minor formatting problems such as these without doing a re-run of your tests. You are allowed to edit the rawfile, as described in utility.html.

Results.04 q. Why does the report have an (*) that says ...

The report has a line that says

(*) Indicates compiler flags found in non-compiler variables

What does this mean, how do I make it go away?

a. There are potentially a number of errors that will show up like this. They usually mean that you have a conflict of some kind between flags file specifications and how you ran. If you specify a compiler flag that is listed as portability but put it in a config file variable for optimization, the reporter will notice this and warn you about a potential problem. Unfortunately, many of these kinds of problems require a rerun to make everything report nicely. Sometimes you can get lucky and fix your flags file to make the error go away. So the first thing to look at is your flags file. If that isn't the issue and you have a config file issue, you will need to rerun to make the error go away.

Power

Power.01 q. Where's the power metrics?

a. A power measurement component that was available with SPEC ACCEL and SPEC OMP2012 is not available with SPEChpc. The SPEC tools do still include this support, but since power measurement is currently limited to a single node, the SPEC/HPG committee decided to not include it as a SPEChpc metric.


Copyright 2014-2021 Standard Performance Evaluation Corporation

All Rights Reserved