------------------------------------------------------- Hewlett-Packard Company SPEC CPU2000 FLAG DESCRIPTIONS - Portland Group International (PGI) FORTRAN COMPILERS 6.0-8 - hp-20061003-PGI60-Windows.txt ---------------------------------------------------------------------------- Description of compiler flags for PGI Compiler 6.0 ---------------------------------------------------------------------------- The optimization levels and their meanings are as follows: -O0 Creates a basic block for each Fortran statement. Neither scheduling nor global optimization is done. -O1 Schedules within basic blocks and performs some register allocations, but does no global optimization. -O2 Performs all level 1 optimization, and also performs global scalar optimizations such as induction variable elimination and loop invariant movement. -O3 Level-three specifies aggressive global optimizations. This level performs all level-one and level two optimizations and enables more aggressive hoisting and scalar replacement optimizations that may or may not be profitable. -fast Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" -fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" -Munix (Windows NT only) Use UNIX argument passing and symbol naming conventions. -Mcache_align Align unconstrained data objects of size greater than or equal to 16 bytes on cache-line boundaries. An unconstrained object is a variable or array that is not a member of an aggregate structure or common block, is not allocatable, and is not an automatic array. Note: To effect cache-line alignment of stack-based local variables, the main program or function must be compiled with -Mcache_align. -Mfixed Process Fortran source using fixed form specifications. The -Mfree options specify free form formatting. By default files with a .f or .F extension use fixed form formatting. -Mflushz Set SSE MXCSR register to flush-to-zero mode. -M[no]ipa[=option[,option,...]] (-Mnoipa default) Enable and specify options for InterProcedural Analysis (IPA). This also sets the optimization level to a minimum of 2; see -O. If no option list is specified, then it is equivalent to -Mipa=const. The options are: [no]align (noalign default) Enable [disable] recognition when pointer targets are all cache- line aligned, allowing better SSE code generation. [no]arg (noarg default) Remove [don't remove] arguments replaced by -Mipa=ptr,const. -Mipa=noarg implies -Mipa=nolocalarg. [no]const (const default) Enable [disable] propagation of constants across procedure calls. [no]f90ptr (nof90ptr default) Enable [disable] Fortran 90 pointer disambiguation across procedure calls. fast Chooses generally optimal -Mipa flags for the target platform; use pgf90 -Mipa -help to see the equivalent options. force Force all objects to recompile regardless of whether IPA information has changed. [no]globals (noglobals default) Analyze [don't analyze] which globals are modified by procedure calls. inline:n Determine additional functions to inline, allowing up to n levels of inlining. ipofile Save IPA information in a .ipo file instead of the default of appending the information to the object file. [no]keepobj (keepobj default) Keep [don't keep] the optimized object files, using file name mangling, to reduce recompile time in subsequent application builds. [no]libinline (nolibinline default) Allow [don't allow] inlining from routines in libraries; -Mipa=libinline implies -Mipa=inline. [no]libopt (nolibopt default) Allow [don't allow] recompiling and reoptimizing routines from libraries with IPA information. [no]localarg (nolocalarg default) Enable [disable] feature to externalize local variables to allow arguments to be replaced by -Mipa=ptr. -Mipa=localarg implies -Mipa=arg. main:func Specify a function to serve as a global entry point; may appear multiple times; disables linking. [no]ptr (noptr default) Enable [disable] pointer disambiguation across procedure calls. [no]pure (nopure default) Detect (don't detect) pure functions. required Return an error condition if IPA is inhibited for any reason, rather than the default behavior of linking without IPA optimization. safe:[function|library] Declares that the named function, or all functions in the named library are safe; a safe procedure does not call back into the known procedures and does not change any known global variables. Without -Mipa=safe, any unknown procedures will cause IPA to fail. [no]safeall (nosafeall default) Declares that all unknown functions are safe (not safe); see -Mipa=safe. [no]shape (noshape default) Perform [don't perform] Fortran 90 shape propagation. summary Only collect IPA summary information when compiling; this prevents IPA optimization of this file, but allows optimization for other files linked with this file. [no]vestigial (novestigial default) Remove [don't remove] functions that are not called. -M[no]lre[=assoc|noassoc] -Mnolre Enable [disable] loop-carried redundancy elimination. The assoc option allows expression reassociation, and the noassoc option disallows expression reassociation. -M[no]frame (-Mnoframe default) Set up [don't set up] a true stack frame pointer for functions; -Mnoframe allows slightly more efficient operation when a stack frame is not needed, but some options override -Mnoframe. -Mnosmart Don't run the Smart assembly re-write tool to enable post-compilation linear assembly scheduling and optimization -M[no]scalarsse Utilize [don't use] SSE (Pentium 3, 4, AthlonXP/MP, Opteron) and SSE2 (Pentium 4, Opteron) instructions to perform the operations coded. This requires the assembler to be capable of interpreting SSE/SSE2 instructions. The default is -Mscalarsse for Opteron in 64-bit mode, and -Mnoscalarsse otherwise. -M[no]unroll[=option[,option...]] (-Mnounroll default) Invoke [don't invoke] the loop unroller. This also sets the optimization level to a minimum of 2; see -O. The option is one of the following: c:m Instructs the compiler to completely unroll loops with a constant loop count less than or equal to m, a supplied constant. If this value is not supplied, the m count is set to 4. n:u Instructs the compiler to unroll u times, a loop which is not completely unrolled, or has a non-constant loop count. If u is not supplied, the unroller computes the number of times a candidate loop is unrolled. -Mnounroll instructs the compiler not to unroll loops. -Mpfi Generate profile feedback instrumentation; this includes extra code to collect run-time statistics to be used in a subsequent compile; -Mpfi must also appear when the program is linked. When the program is run, a profile feedback file pgfi.out will be generated; see -Mpfo. -Mpfo Enable profile feedback optimizations; there must be a profile feedback file pgfi.out in the current directory, which contains the result of an execution of the program compiled with -Mpfi. -Mvect[=option[,option,...]] Pass options to the internal vectorizer. This also sets the optimization level to a minimum of 2; see -O. If no option list is specified, then the following vector optimizations are used: assoc,cachesize:262144,nosse. The vect options are: [no]altcode:n (noaltcode default) Generate (don't generate) alternate scalar code for vectorized loops. If altcode is specified without arguments, the vectorizer determines an appropriate cutoff length and generates scalar code to be executed whenever the loop count is less than or equal to that length. If altcode:n is specified, the scalar altcode is executed whenever the loop count is less than or equal to n. [no]assoc (assoc default) Enable (disable) certain associativity conversions that can change the results of a computation due to floating point roundoff error differences. A typical optimization is to change the order of additions, which is mathematically correct, but can be computationally different, due to roundoff error. cachesize:number (default=automatic) Instructs the vectorizer, when performing cache tiling optimizations, to assume a cache size of number. prefetch Use prefetch instructions in loops where profitable. [no]sse (nosse default) Use (don't use) SSE, SSE2, 3Dnow, and prefetch instructions in loops where possible. -Mnovect disables the vectorizer, and is the default. Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca -F10000000 : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -Mfixed : Fixed-format F90 source code. 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executible without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executible to be as close as possible to the Unix-based one. -MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes. ------------------------------------------------------ General Options and Libraries ------------------------------------------------------ RM_SOURCES= Tells the SPEC tools not to use a certain source file, normally because it will be replaced by a math library. -lacml AMD Core Math Library available from http://developer.amd.com