gcc
GNU Compiler Collection Flags
SPECrate runs might use one of these methods to bind processes to specific processors, depending on the config file.
Linux systems: the numactl command is commonly used. Here is a brief guide to understanding the specific
command which will be found in the config file:
- syntax: numactl [--interleave=nodes] [--preferred=node] [--physcpubind=cpus] [--cpunodebind=nodes]
[--membind=nodes] [--localalloc] command args ...
- numactl runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for a
command and inherited by all of its children.
- --localalloc instructs numactl to keep a process memory on the local node while -m specifies which node(s) to
place a process memory.
- --physcpubind specifies which core(s) to bind the process. In this case, copy 0 is bound to processor 0
etc.
- For full details on using numactl, please refer to your Linux documentation, man numactl
Solaris systems: The pbind command is commonly used, via
submit=echo 'pbind -b...' > dobmk; sh dobmk
The specific command may be found in the config file; here is a brief guide to understanding that command:
- submit= causes the SPEC tools to use this line when submitting jobs.
- echo ...> dobmk causes the generated commands to be written to a file, namely
dobmk.
pbind -b causes this copy's processes to be bound to the CPU specified by the expression that
follows it. See the config file used in the run for the exact syntax, which tends to be cumbersome because of
the need to carefully quote parts of the expression. When all expressions are evaluated, the jobs are typically
distributed evenly across the system, with each chip running the same number of jobs as all other chips, and each
core running the same number of jobs as all other cores.
The pbind expression may include various elements from the SPEC toolset and from standard Unix commands, such
as:
- $BIND: a reference to a value from the bind line, a line of the form
"bind = n n n n", where each "n" is a processor number. See http://www.spec.org/cpu2017/Docs/config.html#bind
for details on this feature.
- $$: the current process id
- $SPECCOPYNUM: the SPEC tools-assigned number for this copy of the benchmark.
- psrinfo: find out what processors are available
- grep on-line: search the psrinfo output for information regarding on-line cpus
- expr: Calculate simple arithmetic expressions. For example, the effect of binding jobs to a
(quote-resolved) expression such as:
expr ( $SPECCOPYNUM / 4 ) * 8 + ($SPECCOPYNUM % 4 ) )
would be to send the jobs to processors whose numbers are:
0,1,2,3, 8,9,10,11, 16,17,18,19 ...
- awk...print \$1: Pick out the line corresponding to this copy of the benchmark and use the CPU
number mentioned at the start of this line.
- sh dobmk actually runs the benchmark.
]]>
One or more of the following may have been used in the run. If so, it will be listed in the notes sections. Here
is a brief guide to understanding them:
LD_LIBRARY_PATH=<directories> (set via config file preENV)
LD_LIBRARY_PATH controls the search order for libraries. Often, it can be defaulted. Sometimes, it is
explicitly set (as documented in the notes in the submission), in order to ensure that the correct versions of
libraries are picked up.
OMP_STACKSIZE=N (set via config file preENV)
Set the stack size for subordinate threads.
ulimit -s N
ulimit -s unlimited
'ulimit' is a Unix commands, entered prior to the run. It sets the stack size for the main process, either
to N kbytes or to no limit.
]]>
No special commands are needed for feedback-directed optimization, other than the compiler profile flags.
]]>
Flag descriptions for GCC, the GNU Compiler Collection
Note: The GNU Compiler Collection provides a wide array of compiler options, described in detail and readily
available at
https://gcc.gnu.org/onlinedocs/gcc/Option-Index.html#Option-Index and https://gcc.gnu.org/onlinedocs/gfortran/. This SPEC CPU flags file
contains excerpts from and brief summaries of portions of that documentation.
SPEC's modifications are:
Copyright (C) 2006-2017 Standard Performance Evaluation Corporation
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being "Funding Free
Software", the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the
license is included in your SPEC CPU kit at $SPEC/Docs/licenses/FDL.v1.3 and on the web at http://www.spec.org/cpu2017/Docs/licenses/FDL.v1.3.
A copy of "Funding Free Software" is on your SPEC CPU kit at $SPEC/Docs/licenses/FundingFreeSW and on the web at http://www.spec.org/cpu2017/Docs/licenses/FundingFreeSW.
(a) The FSF's Front-Cover Text is:
A GNU Manual
(b) The FSF's Back-Cover Text is:
You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free
Software Foundation raise funds for GNU development.
]]>
/path/to/{gcc|g++|gfortran}
Invokes the GNU C compiler.
]]>
Invokes the GNU Fortran compiler.
]]>
g++
Invokes the GNU C++ compiler.
]]>
Ensure that there are no surprises if the benchmarks are run in an environment where file system metadata uses 64 bits.
Do not rely on language constraints to derive bounds for the number of iterations of a loop.
Use big-endian representation for unformatted files. This is important when reading 521.wrf_r, 621.wrf_s, and 628.pop2_s
data files that were originally generated in big-endian format.
]]>
Enables a range of optimizations that provide faster, though sometimes less precise, mathematical operations.
]]>
Allows source code in traditional (fixed-column) Fortran layout.
]]>
Tells GCC to use the GNU semantics for "inline" functions, that is, the behavior prior to the C99 standard.
This switch may resolve duplicate symbol errors, as noted in the 502.gcc_r benchmark description.
]]>
This option runs the standard link-time optimizer.
When invoked with source code, it generates GIMPLE (one of GCC’s internal representations) and writes it to special ELF sections in the object file.
When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
]]>
Enable Link Time Optimization
Enable handling of OpenMP directives and generate parallel code.
-fplugin=/path/to/plugin
Adds the plugin named. XXX this description needs to be expanded.
]]>
Enables prefetching of arrays used in loops.
]]>
Instruments code to collect information for profile-driven feedback.
Information is collected regarding both code paths and data values.
]]>
Applies information from a profile run in order to improve optimization.
Several optimizations are improved when profile data is available, including branch probabilities, loop peeling, and loop
unrolling.
]]>
Let the type "char" be signed, like "signed char".
]]>
Disable optimizations for floating-point arithmetic that ignore the signedness of zero.
]]>
-fstack-arrays, -fno-stack-arrays
Enabled: Put all local arrays, even those of unknown size onto stack memory.
The -fno- form disables the behavior.
]]>
The language standards set aliasing requirements: programmers are expected to follow conventions so that the
compiler can keep track of memory. If a program violates the requirements (for example, using pointer arithmetic),
programs may crash, or (worse) wrong answers may be silently produced.
Unfortunately, the aliasing requirements from the standards are not always well understood.
Sometimes, the aliasing requirements are understood and nevertheless intentionally violated by smart programmers who
know what they are doing, such as the programmer responsible for the inner workings of Perl storage allocation and
variable handling.
The -fno-strict-aliasing switch instructs the optimizer that it must not assume that the aliasing
requirements from the standard are met by the current program. You will probably need it for 500.perlbench_r and
600.perlbench_s. Note that this is an optimization switch, not a portability switch. When running
SPECint2017_rate_base or SPECint2017_speed_base, you must use the same optimization switches for all the C modules
in base; see
http://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags
and
http://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
]]>
There are a group of GCC optimizations invoked via -ftree-vectorize and related flags, as
described at
https://gcc.gnu.org/projects/tree-ssa/vectorization.html. During testing of SPEC CPU2017, for some versions of
GCC on some chips, some benchmarks did not get correct answers when the vectorizor was enabled. These problems were
to isolate, and it is possible that later versions of the compiler might not encounter them.
You can turn off loop vectorization with -fno-tree-loop-vectorize. Note that this is an optimization
switch, not a portability switch. If it is needed, then in base you must use it consistently. See:
http://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags and
http://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
]]>
Attempts to decompose loops in order to run them on multiple processors.
]]>
Do not transform names of entities specified in the Fortran source file by appending underscores to them.
Tells the optimizer to unroll all loops.
]]>
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
]]>
Apply unroll and jam transformations on feasible loops.
In a loop nest this unrolls the outer loop by some factor and fuses the resulting multiple inner loops.
This flag is enabled by default at -O3. It is also enabled by -fprofile-use and -fauto-profile.
]]>
It is the code-gen option.
]]>
It is the linker option. Don’t produce a dynamically linked position independent executable.
]]>
Omit the frame pointer in functions that don’t need one.
This avoids the instructions to save, set up and restore the frame pointer; on many targets it also makes an extra register available.
]]>
Perform loop interchange outside of graphite.
This flag can improve cache performance on loop nest and allow further loop optimizations, like vectorization, to take place.
]]>
Assume that the current compilation unit represents the whole program being compiled.
All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers.
]]>
The switch -funsafe-math-optimizations allows the compiler to make certain(*) aggressive assumptions, such as
disregarding the programmer's intended order of operations. The run rules allow such re-ordering
http://www.spec.org/cpu2017/Docs/runrules.html#reordering. The rules also point out that you must get answers
that pass SPEC's validation requirements. In some cases, that will mean that some optimizations must be turned off.
-fno-unsafe-math-optimizations turns off these(*) optimizations. You may need to use this flag in order to get
certain benchmarks to validate. Note that this is an optimization switch, not a portability switch. If it is
needed, then in base you will need to use it consistently. See:
http://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags and
http://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
(*) Much more detail about which optimizations is available.
]]>
Let the type "char" be unsigned, like "unsigned char".
Note: this particular portability flag is included for 526.blender_r per the recommendation in its documentation - see
http://www.spec.org/cpu2017/Docs/benchmarks/526.blender_r.html.
]]>
-g
Produce debugging information.
-L/path
Add the specified path to the list of paths that the linker will
search for archive libraries and control scripts.
Link with libjemalloc, a fast, arena-based memory allocator.
-m32
Compiles for a 32-bit (LP32) data model.
]]>
-m64
Compiles for a 64-bit (LP64) data model.
]]>
-mabi=ilp32, -mabi=lp64
ilp32 (int, long, pointer 32-bit) or lp64 (int 32-bit, longs and pointers 64-bit)
Generate code for $1. With ilp32, int, long int and pointer are 32-bit; with
lp64, int is 32-bit, but long int and pointer are 64-bit.
]]>
-march=core2,
-march=athlon,
-marvh=armv8.2-a+lse,
-march=native...
On x86 systems, allows use of instructions that require the listed architecture.
On arm systems, specifies the name of the target architecture and, optionally, one or more feature modifiers.
This option has the form -march=arch{+[no]feature}*
]]>
Generate code for processors that include the AVX extensions.
]]>
-mcpu=core2, -mcpu=niagara4, ...
On SPARC systems, mcpu sets the available instruction set.
On x86 systems, mcpu is a deprecated synonym for mtune.
]]>
Generate code to take advantage of fused multiply-add
]]>
-mrecip, -mrecip=all, -mrecip=sqrt, ...
-mrecip
This option enables use of "RCPSS" and "RSQRTSS" instructions (and
their vectorized variants "RCPPS" and "RSQRTPS") with an additional
Newton-Raphson step to increase precision instead of "DIVSS" and
"SQRTSS" (and their vectorized variants) for single-precision
floating-point arguments. These instructions are generated only when
-funsafe-math-optimizations is enabled together with
-finite-math-only and -fno-trapping-math.
-mrecip=opt
This option controls which reciprocal estimate instructions may be
used. opt is a comma-separated list of options, which may be
preceded by a ! to invert the option:
all
Enable all estimate instructions.
default
Enable the default instructions, equivalent to -mrecip.
none
Disable all estimate instructions, equivalent to -mno-recip.
div Enable the approximation for scalar division.
vec-div
Enable the approximation for vectorized division.
sqrt
Enable the approximation for scalar square root.
vec-sqrt
Enable the approximation for vectorized square root.
So, for example, -mrecip=all,!sqrt enables all of the reciprocal
approximations, except for square root.
]]>
-msse2, -msse4.2...
Allows use of instructions that require the SIMD units of the indicated type.
]]>
-mtune=niagara4, -mtune=athlon...
Tunes code based on the timing characteristics of the listed processor.
]]>
Generate code to take advantage of version 3 of the SPARC Visual Instruction Set extensions
]]>
Enable all optimizations of -O3 plus optimizations that are not valid for standard-compliant programs, such as re-ordering
operations without regard to parentheses.
Many more details are available.
]]>
-O1, -O2, -O3
Increases optimization levels: the higher the number, the more optimization is done. Higher levels of optimization may
require additional compilation time, in the hopes of reducing execution time. At -O, basic optimizations are performed,
such as constant merging and elimination of dead code. At -O2, additional optimizations are added, such as common
subexpression elimination and strict aliasing. At -O3, even more optimizations are performed, such as function inlining and
vectorization.
Many more details are available.
]]>
Same as -O1
Link the C++ library statically.
]]>
Sets the language dialect to include syntax from the C99 standard, such as bool and other features used in CPU2017
benchmarks.
]]>
-std=c++03
Sets the language dialect to include syntax from the 1998 ISO C++ standard plus the 2003 technical corrigendum.
]]>
-std=gnu90
Sets the language dialect to include GNU next version.
]]>
-std=gnu90
gnu90 portability
]]>
-std=f2003
Sets the language dialect to include syntax from the Fortran 2003 standard.
]]>
Enables warnings.
]]>
Remove unused functions from the generated executable. Without this flag, on Mac OS X, you are likely to encounter duplicate
symbols when linking 502.gcc_r or 602.gcc_s.
Note that this is an optimization
switch, not a portability switch. If it is needed, then in base you must use it consistently. See:
http://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags and
http://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
]]>
-Wl,-rpath,/path/to/lib
Add the specified directory to the runtime library search path used
when linking an ELF executable with shared objects.
-Wl,-stack_size,0xnnn
Add the linker flag that requests a large stack. This flag is likely to be important only to one or
two of the floating point speed benchmarks. In accordance with the rules for Base, it is set for
all of fpspeed in base. See:
http://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags.
]]>
-Wl,-z common-page-size=<n>
one of the available sizes for your system - for example 2M, 4M, 1G.
Set the requested page size for the program to $1
]]>
Do not warn about functions defined with a return type that defaults to "int" or which return something other than what they were declared to.
]]>
-z muldefs
Allows links to proceed even if there are multiple definitions of some symbols.
This switch may resolve duplicate symbol errors, as noted in the 502.gcc_r benchmark description.
]]>
-pipe
Performance improvement.Use pipes rather than temporary files for communication between the various stages of compilation.
]]>
-fpermissive
cxx portability.Downgrade some diagnostics about nonconformant code from errors to warnings.
]]>