Flag Disclosure for Running SPEC HPC2002 on SUN HPC3500 and SUN E450 with SUN Forte Developer 7 C and Fortran Compilers Purdue University Last Revised: Nov. 05, 2003 Sun Forte Developer 7 C Compiler Options -fast Selects a set of baseline options for optimizing bench- mark applications. These optimizations may alter the behavior of programs from that defined by the ISO C and IEEE standards. Modules compiled with -fast, must also be linked with -fast. -fast is a macro option that can be effectively used as a starting point for tuning an executable for maximum runtime performance. -fast is a macro that can change from one release of the compiler to the next and expands to options that are target platform specific. We suggest that you use the -# option to examine the expansion of -fast, and incorporate the appropriate options of -fast into the ongoing process of tunning the executable. The -fast option is unsuitable for programs that are intended to run on a different target than the compila- tion machine. In such cases, follow -fast with the appropriate -xtarget option. For example: % cc -fast -xtarget=ultra For C modules depending on exception handling specified by SUID, follow -fast by -xnolibmil % cc -fast -xnolibmil The -fast option acts like a macro expansion on the command line. Therefore, you can override any of the expanded options by following -fast with the desired option. If you combine -fast with other options, the last specification applies. In previous releases, the -fast macro option included -fnonstd, but now it includes -fns instead. These options are turned on for -fast: -dalign (x86) -fns -fsimple=2 (SPARC) -fsingle -ftrap=%none -nofstore (x86) -xalias_level=basic (SPARC) -xarch -xbuiltin=%all -xdepend -xlibmil -xmemalign=8s (SPARC) -xO5 -xprefetch=auto,explicit (SPARC) -dalign (SPARC) -dalign is equivalent to -xmemalign=8s. For more information, see -xmemalign. -fns [=[yes|no]] Select SPARC nonstandard floating point (SPARC Only). Select the SPARC nonstandard floating-point mode. The default, -fns=no, is SPARC standard floating-point mode. Optional use of =yes or =no provides a way of toggling the -fns flag following some other macro flag that includes -fns, such as -fast. -fns is the same as -fns=yes. -fns=yes selects non-standard floating-point. -fns=no selects standard floating-point. This flag causes the nonstandard floating point mode to be enabled when a program begins execution. By default, the nonstandard floating point mode will not be enabled automatically. On some SPARC systems, the nonstandard floating point mode disables "gradual underflow", causing tiny results to be flushed to zero rather than producing subnormal numbers. It also causes subnormal operands to be silently replaced by zero. On those SPARC systems that do not support gradual underflow and subnormal numbers in hardware, use of this option can significantly improve the performance of some programs. Warning: When nonstandard mode is enabled, floating point arithmetic may produce results that do not con- form to the requirements of the IEEE 754 standard. See the Numerical Computation Guide for more information. This option is effective only on SPARC systems and only if used when compiling the main program. On x86 sys- tems, the option is ignored. -fsimple[=n] Allows the optimizer to make simplifying assumptions concerning floating-point arithmetic. If n is present, it must be 0, 1, or 2. -fsimple=2 Permits aggressive floating point optimizations that may cause many programs to produce different numeric results due to changes in rounding. For example, -fsim- ple=2 permits the optimizer to attempt replace all computations of x/y in a given loop with x*z, where x/y is guaranteed to be evaluated at least once in the loop, z=1/y, and the values of y and z are known to have constant values during execution of the loop. Even with -fsimple=2, the optimizer still is not per- mitted to introduce a floating point exception in a program that otherwise produces none. -fsingle (-Xt and -Xs modes only) Causes the compiler to evalu- ate float expressions as single precision, rather than double precision. (This option has no effect if the compiler is used in either -Xa or -Xc modes, as float expressions are already evaluated as single precision.) -ftrap[=t[,t...] ] Sets the IEEE 754 trapping mode in effect at startup. t is a comma-separated list that consists of one or more of the following: %all, %none, common, [no%]invalid, [no%]overflow, [no%]underflow, [no%]division, [no%]inexact. The default is -ftrap=%none. This option sets the IEEE 754 trapping modes that are established at program initialization. Processing is left-to-right. The common exceptions, by definition, are invalid, division by zero, and overflow. Example: -ftrap=%all,no%inexact means set all traps, except inexact. The meanings are the same as for the ieee_flags subrou- tine, except that: o %all turns on all the trapping modes. o %none, the default, turns off all trapping modes. o A no% prefix turns off that specific trapping mode. If you compile one routine with -ftrap=t, compile all routines of the program with the same -ftrap=t option; otherwise, you can get unexpected results. -xalias_level=basic the compiler assumes that memory references that involve different C basic types do not alias each other. The compiler also assumes that references to all other types can alias each other as well as any C basic type. The compiler assumes that references using char * can alias any other type. -xarch=native Set the parameters for the best performance on the host environment. This is the default for the -fast option. The compiler chooses the appropriate setting for the current sys- tem processor it is running on. -xarch=v9 Compile for the SPARC-V9 ISA. Enables the compiler to generate code for good performance on the V9 SPARC architec- ture. The resulting .o object files are in ELF64 format and can only be linked with other SPARC-V9 object files in the same for- mat. The resulting executable can only be run on an UltraSPARC processor running a 64-bit enabled Solaris operating environment with the 64-bit kernel. -xarch=v9 is only available when compiling in a 64-bit enabled Solaris environment. -xbuiltin[=a] Use the -xbuiltin[=(%all|%none)] command when you want to improve the optimization of code that calls standard library functions. Many standard library functions, such as the ones defined in math.h and stdio.h, are commonly used by various programs. This command lets the compiler substitute intrinsic functions or inline system functions where profitable for performance. a stands for (%all|%none). Note: -xbuiltin only inlines global functions defined in system header files, never static functions defined by the user. The first default of this command is -xbuiltin=%none, which means no functions from the standard libraries are substituted or inlined. The first default applies when you do not specify -xbuiltin. The second default of this command is -xbuiltin=%all, which means the compiler substitutes intrinsics or inlines standard library functions as it determines the optimization benefit. The second default applies when you specify -xbuiltin but do not provide an argument. If you compile with -fast, then -xbuiltin is set to %all. -xdepend (SPARC) Analyzes loops for inter-iteration data depen- dencies and does loop restructuring. Loop restructuring includes loop interchange, loop fusion, scalar replacement, and elimination of "dead" array assignments. If optimization is not at -xO3 or higher, optimization is raised to -xO3 and a warning is issued. Dependency analysis is included in -xautoparor-xparallel. The dependency analysis is done at compile time. Dependency analysis may help on single-processor sys- tems. However, if you try -xdepend on single-processor systems, you should not use either -xautopar or -xex- plicitpar. If either of them is on, the -xdepend optim- ization is done for multiple-processor systems. -xlibmil Inlines some library routines for faster execution. This option selects the appropriate assembly language inline templates for the floating-point option and platform for your system. -xlibmil inlines a function regardless of any specification of the function as part of the -xinline flag. -xmemalign[=ab] Specify maximum assumed memory alignment and behavior of misaligned data accesses. For memory accesses where the alignment is determinable at compile time, the compiler will generate the appropriate load/store instruction sequence for that alignment of data. For memory accesses where the alignment cannot be determined at compile time, the compiler must assume an alignment to generate the needed load/store sequence. The -xmemalign flag allows the user to specify the max- imum memory alignment of data to be assumed by the com- piler in these indeterminable situations. It also specifies the error behavior to be followed at run-time when a misaligned memory access does take place. VALUES: If a value is specified, it must consist of two parts: a numerical alignment value, a, and an alphabetic behavior flag, b. Allowed values for alignment, a are: 1 Assume at most 1 byte alignment. 2 Assume at most 2 byte alignment. 4 Assume at most 4 byte alignment. 8 Assume at most 8 byte alignment. 16 Assume at most 16 byte alignment. Allowed values for behavior, b are: i Interpret access and continue execution. s Raise signal SIGBUS. f Raise signal SIGBUS for alignments less than or equal to 4, otherwise interpret access and con- tinue execution. -xO1 Does basic local optimization (peephole). -xO2 Does basic local and global optimization. This includes induction variable elimination, local and global common subexpression elimination, algebraic simplification, copy propagation, constant propa- gation, loop-invariant optimization, register allocation, basic block merging, tail recursion elimination, dead code elimination, tail call elimination and complex expression expansion. This level does not optimize references or defini- tions for external or indirect variables. -O and -xO2 are equivalent. -xO3 In addition to optimizations performed at the -xO2 level, also optimizes references and definitions for external variables. This level does not trace the effects of pointer assignments. When compil- ing either device drivers that are not properly protected by volatile, or programs that modify external variables from within signal handlers, use -xO2. In general, this level, and -xO4, usually result in the minimum code size when used with the -xspace option. -xO4 Does automatic inlining of functions contained in the same file in addition to performing -xO3 optimizations. This automatic inlining usually improves execution speed, but sometimes makes it worse. In general, this level results in increased code size unless combined with -xspace. -xO5 Does the highest level of optimization, suitable only for the small fraction of a program that uses the largest fraction of computer time. Uses optim- ization algorithms that take more compilation time or that do not have as high a certainty of improv- ing execution time. Optimization at this level is more likely to improve performance if it is done with profile feedback. See -xprofile=collect|use. -xprefetch[=val[,val]] (SPARC) Enable prefetch instructions on those architec- tures that support prefetch, such as UltraSPARC II. (-xarch=v8plus, v9plusa, v9, or v9a) Explicit prefetching should only be used under special circumstances that are supported by measurements. val must be one of the following: auto Enable automatic generation of prefetch instructions no%auto Disable automatic generation explicit Enable explicit prefetch macros no%explicit Disable explicit prefectch macros -xopenmp[=i] where i is one of parallel, stubs, or none. If you specify -xopenmp but do not include a value, the com- piler assumes -xopenmp=parallel. If you do not specify -xopenmp, the compiler assumes -xopenmp=none. -xopenmp=parallel enables recognition of OpenMP pragmas and applies to SPARC only. The optimization level under -xopenmp=parallel is -xO3. The compiler issues a warn- ing if the optimization level of your program is changed from a lower level to -xO3. -xopenmp=parallel predefines the _OPENMP preprocessor token. -xunroll=n Synonym for -unroll=n -unroll=n Enable unrolling of DO loops n times where possible. n is a positive integer. Sun Forte Developer 7 Fortran Compiler Options -fast Select options that optimize execution performance. -fast provides high performance for certain benchmark applications. However, the particular choice of options may or may not be appropriate for your applica- tion. Use -fast as a good starting point for compiling your application for best performance. But additional tuning may still be required. If your program behaves improperly when compiled with -fast, look closely at the individual options that make up -fast and invoke only those appropriate to your program that preserve correct behavior. Note also that a program compiled with -fast may show good performance and accurate results with some data sets, but not with others. Avoid compiling with -fast those programs that depend on particular properties of floating-point arithmetic. -fast selects the following options: o -xtarget=native sets the hardware target. If the program is intended to run on a different tar- get than the compilation machine, follow the -fast with the appropriate -xtarget= option. For example: f95 -fast -xtarget=ultra ... o -O5 selects optimization level 5. o -libmil selects inlining of certain math library rou- tines. o -fsimple=2 selects aggressive floating-point optimi- zations. This option may be unsuited for programs requiring strict IEEE 754 standards compliance. o -dalign selects generation of faster double word load/store instructions, and alignment of double and quad data on their natural boundaries in common blocks. Using this option may generate nonstandard Fortran data alignment. o -xlibmopt selects linking the optimized math library. o -depend selects dependence analysis to better optim- ize DO loops. o -fns selects faster (but nonstandard) handling of floating-point arithmetic exceptions and gradual underflow. o -ftrap=common selects trapping on common floating- point exceptions (this is the default for f95). o -pad=local selects local padding to improve use of cache. o -xvector=yes selects the vectorized math library. o -xprefetch=yes selects automatic generation of pre- fetch instructions on platforms that support it. o -xprefetch_level=2 sets the default prefetch level. Note that this selection of component option flags is subject to change with each release of the compiler. For details on the options set by -fast, see the For- tran User's Guide. -xtypemap=spec Specify default data mappings. This option provides a flexible way to specify the byte sizes for default data types. The syntax of the string spec is: type:bits,type:bits,... The allowable data types are REAL, DOUBLE, INTEGER. The data sizes accepted are 32, 64,and 128. This option applies to all variables declared without explicit byte sizes, as in REAL XYZ. The allowable combinations are: real:32 or real:64, double:64 or double:128, integer:32 or integer:64. A useful mapping is: -xtypemap=real:64,double:64:integer:64 which maps REAL and DOUBLE to 8 bytes, but does not pro- mote DOUBLE PRECISION to QUAD PRECISION. Note also that INTEGER and LOGICAL are treated the same, and COMPLEX is mapped as two REAL data elements. Also, DOUBLE COMPLEX will be treated the way DOUBLE is mapped. -xunroll=n Synonym for -unroll=n -unroll=n Enable unrolling of DO loops n times where possible. n is a positive integer.