Flag Disclosure for Running SPEC HPC2002 on SUN E450 with SUN Forte 6 update 2 C and Fortran Compilers Purdue University Last Revised: Nov. 05, 2003 mpicc: /package/sun_compilers/6.2/sparc_sunos5.8/SUNWspro/WS6U2/bin/cc -fast -xO4 -xO3 -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -L/home/yara/re/paramnt/tools/mpi/mpich/ch_shmem/lib -lmpich -lnsl -lrt -lthread -lnsl -laio mpif90: /package/sun_compilers/6.2/sparc_sunos5.8/SUNWspro/WS6U2/bin/f90 -L/home/yara/re/paramnt/tools/mpi/mpich/ch_shmem/lib -lmpichf90 -lmpich -lnsl -lrt -lthread -lnsl -laio MPICH 1.2.4 with ch_shmem device (shared-memory implementations): ./configure --with-device=ch_shmem make SUN Forte 6 update 2 C and Fortran Compiler Options -D Set definition for preprocessor. -dalign Assume double-type data is double aligned. -dn Specify static binding. -e Accept extended (132 character) input source lines (Fortran). -fast This is a convenience option for selecting a set of optimizations for performance, and it chooses: o The -native best machine characteristics option (-xarch=native, -xchip=native, -xcache=native) o Optimization level: -xO5 o A set of inline expansion templates (-libmil) o The -fsimple=2 option o The -dalign option o The -xalias_level=basic option (C only) o The -xlibmopt option o The -xdepend option (Fortran only) o The -xprefetch option (Fortran only) o Options to turn off all trapping (-fns -ftrap=%none) -fixed Accept fixed-format input source files (Fortran). -fns Select non-standard floating point mode. This flag causes the nonstandard floating point mode to be enabled when a program begins execution. By default, the nonstandard floating point mode will not be enabled automatically. On some SPARC systems, the nonstandard floating point mode disables "gradual underflow", causing tiny results to be flushed to zero rather than producing subnormal numbers. It also causes subnormal operands to be silently replaced by zero. On those SPARC systems that do not support gradual underflow and subnormal numbers in hardware, use of this option can significantly improve the performance of some programs. Warning: When nonstandard mode is enabled, floating point arithmetic may produce results that do not con- form to the requirements of the IEEE 754 standard. See the Numerical Computation Guide for more information. -fsimple=0 Permits no simplifying assumptions. Preserves strict IEEE 754 conformance. -fsimple=1 With -fsimple=1, the optimizer can assume the following: o The IEEE 754 default rounding/trapping modes do not change after process initialization. o Computations producing no visible result other than potential floating-point exceptions may be deleted. o Computations with Infinity or NaNs as operands need not propagate NaNs to their results. For example, x*0 may be replaced by 0. o Computations do not depend on sign of zero. -fsimple=2 Permits aggressive floating point optimizations that may cause programs to produce different numeric results due to changes in rounding. Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none. -fsimple[=n] Allows the compiler to make simplifying assumptions concerning floating-point arithmetic. -ftrap=t Sets the IEEE 754 trapping mode in effect at startup. t is a comma-separated list that consists of one or more of the following: %all, %none, common, [no%]invalid, [no%]overflow, [no%]underflow, [no%]division, [no%]inexact. The default is -ftrap=%none. This option sets the IEEE 754 trapping modes that are established at program initialization. Processing is left-to-right. The common exceptions, by definition, are invalid, division by zero, and overflow. o %none, the default, turns off all trapping modes. Do not use this option for programs that depend on IEEE standard exception handling; you can get different numerical results, premature program termination, or unexpected SIGFPE signals. -inline=%auto Enables automatic inlining -libmil Use inline expansion templates for libm. -lm Link with math library -native Select native machine characteristics for optimization. -openmp Accept the openMP directives -Qoption Pass flags along to compiler phase: f90comp: Fortran first pass iropt: Global optimizer cg: Code generator -Qoption cg See -Wc, below. -Qoption f90comp Enable padding of f90 arrays by n. -array_pad_rows, -Qoption f90comp -expansion Enable f90 array expansion. -Qoption iropt +ansi_alias assume (more restrictive) ANSI C semantics for pointer aliasing -Qoption iropt See -W2, below. -Qoption iropt -Adata_access enable optimizations based on data access patterns -Qoption iropt bmerge enable branch merge optimizations -Qoption iropt -O4+algassoc enable floating point reassociation -Qoption iropt -O4+bcopy allows replacing copy and memset loops with library calls -Qoption iropt -O4+scalarrep disable scalar replacement optimization -stackvar Allocate routine local variables on stack (Fortran). -W, Pass flags along to compiler phase: 2: global optimizer c: code generator -W2,-Abopt Enable aggressive optimizations of all branches. -W2,-Adata_access Enable optimizations based on data access patterns. -W2,-Aheap Allows the compiler to recognize malloc-like memory allocation functions. -W2,-Aunroll Enables outer-loop unrolling. -W2,-crit Enable optimization of critical control paths -W2,-Ma Enable inlining of routines with frame size up to n. -W2,-Mm Maximum module increase limit for inlining. -W2,-Mp Procedures with entry counts equal or greater than n become candidates for inlining. -W2,-Mr Maximum code increase due to inlining is limited to n triples. -W2,-Ms Maximum level of recursive inlining. -W2,-Mt The maximum size of a routine body eligible for inlining is limited to n triples. -W2,-O4+ansi_alias Assume (more restrictive) ANSI C semantics for pointer aliasing. -W2,-O4+restrict This tells the compiler to assume that different pointer-type formal parameters point to their own memory locations (C restricted pointers) -W2,-O4+restrict_g This tells the compiler to assume that different global pointer variables point to their own memory locations. -W2,-reroll=1 Turns on loop rerolling. -W2,-whole Do whole program optimizations. -Wc,-Qdepgraph-early_cross_call=1 Enable early cross-call instruction scheduling. -Wc,-Qgsched-T4 Sets the aggressiveness of the trace formation. -Wc,-Qgsched-trace_late=1 Turns on the late trace scheduler. -Wc,-Qgsched-trace_spec_load=1 Turns on the conversion of loads to non-faulting loads inside the trace. -Wc,-Qiselect-funcalign= Do function entry alignment at n-byte boundaries. -Wc,-Qpeep-Sh0 Disables the max live base registers algorithm for sethi hoisting. -Xa Assume ANSI C conformance, allow K & R extensions. (default mode) -xalias_level= Allows compiler to perform type-based alias analysis at the given alias level. basic: assume ISO C9X aliasing rules for basic types only. std: assume ISO C9X aliasing rules. strong: assume all pointers are type safe (strongly typed). -xarch= Limit the set of instructions the compiler may use. -Xc Assume strict ANSI C conformance. -xcache= Defines the cache properties for use by the optimizer. c must be one of the following: native (set parameters for the host environment) s1/l1/a1 s1/l1/a1:s2/l2/a2 s1/l1/a1:s2/l2/a2:s3/l3/a3 The si/li/ai are defined as follows: si: The size of the data cache at level i, in kilobytes. li: The line size of the data cache at level i, in bytes. ai:The associativity of the data cache at level i. -xchip= Specifies the target processor for use by the optimizer. c must be one of: generic, generic64, native, native64, old, super, super2, micro, micro2, hyper, hyper2, powerup, ultra, ultra2, ultra2i, ultra3, 386, 486, pentium, pentium_pro, 603, 604. -xcrossfile Enable cross-file inlining. -xdepend Analyze loops for data dependencies. -xO1 Does basic local optimization (peephole). -xO2 xO1 and more local and global optimizations. -xO3 Besides what xO2 does, it optimizes references or definitions for external variables. Loop unrolling and software pipelining are also performed. -xO4 xO3 plus function inlining. -xO5 Besides what xO4 does, it enables speculative code motion. -xopenmp Accept the openMP directives -xpad=common[:] Pad common block variables, for better use of cache. n specifies the amount of padding to apply. If no parameter is specified then the compiler selects one automatically. -xpad=local[:] Pad local variables only, for better use of cache. n specifies the amount of padding to apply. If no parameter is specified then the compiler selects one automatically. -xparallel Use parallel processing to improve performance. -xprefetch Enable generation of prefetch instructions. -xprofile=collect Collect profile data for feedback directed optimizations. -xprofile=use Use data collected for profile feedback. -xreduction Parallelize loops containing reductions. -xregs=syst Allows use of the system reserved registers %g6 and %g7, and %g5 if not already allowed by -xarch value. -xrestrict[=f1,...,f2,%all, Treat pointer-valued function parameters as %none] restricted pointers. The default is %none. Specifying -xrestrict is equivalent to specifying -xrestrict=%all. -xsafe=mem Enables the use of non-faulting loads when used in conjunction with -xarch=v8plus is set, assumes that no memory based traps will occur. -Xt Assume K & R conformance, allow ANSI C. -xvector Enable vectorization of loops with calls to math routines. -xunroll=n Synonym for -unroll=n -unroll=n Enable unrolling of DO loops n times where possible. n is a positive integer. -xtypemap=spec Specify default data mappings. This option provides a flexible way to specify the byte sizes for default data types. The syntax of the string spec is: type:bits,type:bits,... The allowable data types are REAL, DOUBLE, INTEGER. The data sizes accepted are 32, 64,and 128. -xlibmopt This chooses the math library that is optimized for speed