Sun flags file for SPEC benchmark suite SPEC OMP2001 This file is for flags used with the Sun, Intel and Opteron based systems. Flags described below are for the compilers: Sun Studio 10 Sun Studio 11 Sun Studio 12 Sun Studio 12 update 1 Sun Studio Express 11/08 And for the OS Solaris 10 OpenSolaris 2008.05 OpenSolaris 2008.11 OpenSolaris 2009.06 Revised 5 January 2010 ---------------------------------------------------------------------------- Compiler flags ---------------------------------------------------------------------------- Flag Description ---- ----------- -aligncommon= Align common blocks elements on byte boundaries. -D Set definition for preprocessor. -dalign Selects generation of faster double word load/store instructions, and alignment of double and quad data on their natural boundaries in common blocks. -depend=yes Selects dependence analysis to better optimize DO loops. -e Accept extended (132 character) input source lines (FORTRAN) -fast This is a convenience option for selecting a set of optimizations for performance and it chooses the following switches that are defined elsewhere in this page: (C) -fns -fsimple=2 -fsingle -nofstore (x86 only) -xalias_level=basic -xbuiltin=%all -xlibmil -xlibmopt -xmemalign=8s (SPARC only) -xO5 -xprefetch=auto,explict (SPARC only) -xregs=frameptr (x86 only) -xtarget=native (Fortran) -xtarget=native -xO5 -depend=yes -xlibmil -fsimple=2 -dalign -xlibmopt -pad=local (SPARC only) -xvector=lib (SPARC only) -fns -fround=nearest (SPARC only) -ftrap=common -nofstore (x86 only) -xregs=frameptr (x86 only) -fixed Accept fixed-format input source files (FORTRAN) -fns[=no] Select (turn off) non-standard floating point mode. This flag causes the nonstandard floating point mode to be enabled when a program begins execution. By default, the nonstandard floating point mode will not be enabled automatically. Warning: When nonstandard mode is enabled, floating point arithmetic may produce results that do not conform to the requirements of the IEEE 754 standard. See the Numerical Computation Guide for more information (see docs.sun.com). -fsimple=2 Selects aggressive floating-point optimizations. This option might be unsuited for programs requiring strict IEEE 754 standards compliance. -fsingle (-Xt and -Xs modes only) Causes the compiler to evaluate float expressions as single precision, rather than double precision. (This option has no effect if the compiler is used in either -Xa or -Xc modes, as float expressions are already evaluated as single precision.) -fstore Force presision of floating-point expressions. -ftrap=t Sets the IEEE 754 trapping mode in effect at startup. t is a comma-separated list that consists of one or more of the following: %all, %none, common, [no%]invalid, [no%]overflow, [no%]underflow, [no%]division, [no%]inexact. The default is -ftrap=%none. This option sets the IEEE 754 trapping modes that are established at program initialization. Processing is left-to-right. common - invalid, division by zero, and overflow. %none - the default, turns off all trapping modes. Do not use this option for programs that depend on IEEE standard exception handling; you can get different numerical results, premature program termination, or unexpected SIGFPE signals. -lm Link with math library -lmopt This chooses the math library that is optimized for speed -lmtmalloc This uses fast concurrent malloc library suitable for multi-threaded applications -lmvec Link with vector math library -m32 Compile for 32 bit executables. -m64 Compile for 64 bit executables. -nofstore Cancels forcing expressions to have the precision of the result. -pad=local Local padding to improve use of cache. -Qoption Pass option list to the compiler phase (Fortran, C++): f90comp - Fortran first pass iropt - Global optimizer ube - Code generator -qoption Same as -Qoption, the q is not case sensitive -Qoption f90comp -hoist_expensive,-hoist_trivial Enables additional loop invariant code motion, hoisting operations out of loops. -Qoption iropt -Aujam:inner=g Increase the probability that small-trip-count inner loops will be fully unrolled. -Qoption iropt -xprefetch_level[=1|2|3] Increase the probability that small-trip-count inner loops will be fully unrolled. -xprefetch_level=1 enables automatic generation of prefetch instructions. -xprefetch_level=2 enables additional generation beyond level 1 and -xprefetch=3 enables additional generation beyond level 2. -Qoption ube -fsimple=3 Allow optimizer to use x87 hardware instructions for sine, cosine, and rsqrt. The precision and rounding effects are determined by the underlying hardware implementation, rather than by standard IEEE754 semantics. -Qoption ube -sched_first_pass=1 Causes the backend to perform instruction scheduling for the generated code. -Wu, Pass option list to the code generator (C) -Wu,-sched_first_pass=1 Causes the backend to perform instruction scheduling for the generated code. -xautopar Enables automatic compiler parallelization. -xalias_level= Allows compiler to perform type-based alias analysis at the given alias level (C). basic - assume ISO C9X aliasing rules for basic types only. layout - assume memory references involving the same basic types do not alias each other. std - assume ISO C9X aliasing rules. strong - assume all pointers are type safe (strongly typed). -xarch=isa This option limits the code generated by the compiler to the instructions of the specified instruction set architecture. generic Compile using the instruction set common to most processors. amd64 Compile for 64-bit Solaris x86 platforms. sse2a Compile for SSE2 instructions with AMD extensions. -xbuiltin=%all Substitute intrinsic functions or inline system functions where profitable for performance. -Xc Assume strict ANSI C conformance. -xcrossfile[=] Enable optimization and inlining across source files, n={0|1}. The default is -xcrossfile=0 which specifies that no cross file optimizations are performed. -xcrossfile is equivalent to -xcrossfile=1. Normally, the scope of the compiler's analysis is limited to each separate file on the command line. With -xcrossfile, the compiler analyzes all the files named on the command line as if they had been concatenated into a single source file. -xdepend[=no] Analyze loops for data dependencies. "no" says to disable this option. -xipo[=] Enable optimization and inlining across source files, n={0|1|2}. At -xipo=2, the compiler performs interprocedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance. -xlibmil selects inlining of certain math library routines. -xlibmopt Selects linking the optimized math library. -xlic_lib=sunperf Link in the Sun supplied performance libraries -xmodel=[a] enables the compiler to create 64-bit shared objects for the Solaris x86 platforms. -xmodel=medium generates code for the medium model in which no assumptions are made about the range of symbolic references to data sections. -xO1 Does basic local optimization (peephole). -xO2 xO1 and more local and global optimizations. -xO3 Besides what xO2 does, it optimizes references or definitions for external variables. Loop unrolling and software pipelining are also performed. -xO4 xO3 plus function inlining. -xO5 Besides what xO4 does, it enables speculative code motion. -xopenmp[=] Enable OpenMP language extension ={noopt|parallel|none}. If you specify -xopenmp, but do not include a value, the compiler assumes -xopenmp=parallel. parallel Enables recognition of OpenMP pragmas. The optimization level under -xopenmp=parallel is -x03. The compiler changes the optimization level to -x03 if necessary and issues a warning. -xpad=local Add padding between adjacent local variables. -xpagesize= Set the preferred page size for running the program. -xpagesize_stack= Set the preferred page size for the stack for running the program. -xprefetch_level[=] Controls the aggressiveness of the -xprefetch=auto option (n={1|2|3}). The compiler becomes more aggressive, or in other words, introduces more prefetches with each, higher, level of -xprefetch_level. -xprefetch[=val[,val]] Enable prefetch instructions on those architectures that support prefetch. [no%]auto [Disable] Enable automatic generation of prefetch instructions. [no%]explicit [Disable] Enable explicit prefetch macros yes -xprefetch=yes is the same as -xprefetch=auto,explicit no -xprefetch=no is the same as -xprefetch=no%auto,no%explicit latx:n.n Adjust the compiler's assumed prefetch-to-load and prefetch-to-store latencies by the specified factor. Defaults If -xprefetch is not specified, -xprefetch=no%auto,explicit is assumed. If only -xprefetch is specified, -xprefetch=auto,explicit is assumed. -xprofile Use the profile feature, shorthand used for the process below -xprofile=

Collect data for a profile or use a profile to optimize

={{collect,use}[:],tcov} collect[:name] Collects and saves execution frequency for later use by the optimizer with -xprofile=use. The compiler generates code to measure statement execution-frequency. use[:name] Uses execution frequency data to optimize strategically. The name is the name of the executable that is being analyzed. -xreduction Analyze loops for reductions such as dot products, maximum and minimum finding. -xregs= Specify the usage of optional registers frameptr - (x86 only) allow compilers to use the frame-pointer register. -xtarget= Sets the hardware target. If the program is intended to run on a different target than the compilation machine, follow the -fast with the appropriate -xtarget= option. native - optimize for the host platform. nehalem - optimize for the Nehalem architecture. -xvector=simd Automatic generation of the vector SIMD instructions -xvector=lib Selects the vectorized math library. -xvector=yes Selects the vectorized math library. -xprofile=

Collect or optimize with runtime profiling data

must be collect[:nm], use[:nm], or tcov. At runtime a program compiled with -xprofile=collect:nm will create the subdirectory nm.profile to hold the runtime feedback information. nm is an optional name. -xprofile=collect Collect profile data for feedback directed optimizations. -xprofile=use Use data collected for profile feedback. ---------------------------------------------------------------------------- Operating System ---------------------------------------------------------------------------- Environment Variables Description --------------------- ----------- LD_PRELOAD=mpss.so.1 Allow use of the mpss.so.1 shared object, which provides a means by which preferred stack and/or heap page sizes can be selected. Once preloaded, the mpss.so.1 shared object reads environment variables MPSSHEAP and MPSSSTACK to determine any preferred page MPSSHEAP= Specify the preferred page size for heap. The specified page size is applied to all created processes. MPSSSTACK= Specify the preferred page size for stack. The specified page size is applied to all created processes. OMP_DYNAMIC Enables (TRUE) or disables (FALSE) dynamic adjustment of the number of threads available for execution of parallel regions. OMP_NUM_THREADS Sets the number of threads to use during execution, unless that number is explicitly changed by calling the OMP_SET_NUM_THREADS subroutine. SUNW_MP_PROCBIND This environment variable can be used to bind threads of an OpenMP program to virtual processors on the running system. Performance can be enhanced with processor binding, but performance degradation will occur if multiple threads are bound to the same virtual processor. The value for SUNW_MP_PROCBIND can be: TRUE/true - use all virtual processors bound in round-robin fashion starting with 0 FALSE/false - no binding is done (default) n - start binding at virtual processor in a round-robin fashion list of integers separated by one or more spaces - use the listed virtual processors, bound in a round-robin fashion range of integers - use the listed range of virtual processors, bound in a round-robin fashion Please go to docs.sun.com and see the "OpenMP API User's Guide" for more information on binding. STACKSIZE A default stacksize of 4 MB (for 32-bit programs) and 8 MB (for 64-bit programs) is used for additional threads created in an OpenMP program. The environment variable STACKSIZE can be used to set it to a different value. For example, setenv STACKSIZE 2048 creates threads with stacksize of 2 MB each. OMP_NESTED Enables or disables nested parallelism. Value is either TRUE or FALSE. SUNW_MP_THR_IDLE=SPIN Controls the end-of-task status of each helper thread executing the parallel part of a program. You can set the value to spin, sleep ns, or sleep nms. The default is SPIN -- the thread spins (or busy-waits) after completing a parallel task until a new parallel task arrives. Choosing SLEEP time specifies the amount of time a helper thread should spin-wait after completing a parallel task. If, while a thread is spinning, a new task arrives for the thread, the tread executes the new task immediately. Otherwise, the thread goes to sleep and is awakened when a new task arrives. time may be specified in seconds, (ns) or just (n), or milliseconds, (nms). SLEEP with no argument puts the thread to sleep immediately after completing a parallel task. SLEEP, SLEEP (0), SLEEP (0s), and SLEEP (0ms) are all equivalent. - - - - - - - - - - - - - - - - - - - - - - - - - Shell Variables Description --------------- ----------- ulimit -s unlimited Set size of stack segment to unlimited ----------------------------------------------------------------------------