This is the flag description file for AMD compiled binaries using
the gcc compiler version 3.3 from SuSE Linux Enterprise Server 8 
Service Pack 3. It also includes flag descriptions for The PGI 5.1
compiler.

----------------------------------------------------------------------------
Flags for gcc 3.3 (from SLES8 SP3)
----------------------------------------------------------------------------

O0
`-O0'
     Do not optimize.  This is the default.

O3
`-O'
`-O1'
     Optimize.  Optimizing compilation takes somewhat more time, and a
     lot more memory for a large function.

     With `-O', the compiler tries to reduce code size and execution
     time, without performing any optimizations that take a great deal
     of compilation time.

     `-O' turns on the following optimization flags:
          -fdefer-pop
          -fmerge-constants
          -fthread-jumps
          -floop-optimize
          -fcrossjumping
          -fif-conversion
          -fif-conversion2
          -fdelayed-branch
          -fguess-branch-probability
          -fcprop-registers

     `-O' also turns on `-fomit-frame-pointer' on machines where doing
     so does not interfere with debugging.

`-O2'
     Optimize even more.  GCC performs nearly all supported
     optimizations that do not involve a space-speed tradeoff.  The
     compiler does not perform loop unrolling or function inlining when
     you specify `-O2'.  As compared to `-O', this option increases
     both compilation time and the performance of the generated code.

     `-O2' turns on all optimization flags specified by `-O'.  It also
     turns on the following optimization flags:
          -fforce-mem
          -foptimize-sibling-calls
          -fstrength-reduce
          -fcse-follow-jumps  -fcse-skip-blocks
          -frerun-cse-after-loop  -frerun-loop-opt
          -fgcse   -fgcse-lm   -fgcse-sm
          -fdelete-null-pointer-checks
          -fexpensive-optimizations
          -fregmove
          -fschedule-insns  -fschedule-insns2
          -fsched-interblock -fsched-spec
          -fcaller-saves
          -fpeephole2
          -freorder-blocks  -freorder-functions
          -fstrict-aliasing
          -falign-functions  -falign-jumps
          -falign-loops  -falign-labels

     Please note the warning under `-fgcse' about invoking `-O2' on
     programs that use computed gotos.

`-O3'
     Optimize yet more.  `-O3' turns on all optimizations specified by
     `-O2' and also turns on the `-finline-functions', `-fweb',
     `-funit-at-time', `-ftracer', `-funswitch-loops' and
     `-frename-registers' options.

-funroll-all-loops
`-funroll-all-loops'
     Unroll all loops, even if their number of iterations is uncertain
     when the loop is entered.  This usually makes programs run more
     slowly.  `-funroll-all-loops' implies the same options as
     `-funroll-loops'
-fprofile-arcs/ -fbranch-probabilities
`-fprofile-arcs'
     Instrument "arcs" during compilation to generate coverage data or
     for profile-directed block ordering.  During execution the program
     records how many times each branch is executed and how many times
     it is taken.  When the compiled program exits it saves this data
     to a file called `AUXNAME.da' for each source file.  AUXNAME is
     generated from the name of the output file, if explicitly
     specified and it is not the final executable, otherwise it is the
     basename of the source file. In both cases any suffix is removed
     (e.g.  `foo.da' for input file `dir/foo.c', or `dir/foo.da' for
     output file specified as `-o dir/foo.o').

     For profile-directed block ordering, compile the program with
     `-fprofile-arcs' plus optimization and code generation options,
     generate the arc profile information by running the program on a
     selected workload, and then compile the program again with the same
     optimization and code generation options plus
     `-fbranch-probabilities' (*note Options that Control Optimization:
     Optimize Options.).

     With `-fprofile-arcs', for each function of your program GCC
     creates a program flow graph, then finds a spanning tree for the
     graph.  Only arcs that are not on the spanning tree have to be
     instrumented: the compiler adds code to count the number of times
     that these arcs are executed.  When an arc is the only exit or
     only entrance to a block, the instrumentation code can be added to
     the block; otherwise, a new basic block must be created to hold
     the instrumentation code.

-ffast-math
`-ffast-math'
     Sets `-fno-math-errno', `-funsafe-math-optimizations',
     `-fno-trapping-math', `-ffinite-math-only' and
     `-fno-signaling-nans'.

     This option causes the preprocessor macro `__FAST_MATH__' to be
     defined.

     This option should never be turned on by any `-O' option since it
     can result in incorrect output for programs which depend on an
     exact implementation of IEEE or ISO rules/specifications for math
     functions.

`-fno-math-errno'
     Do not set ERRNO after calling math functions that are executed
     with a single instruction, e.g., sqrt.  A program that relies on
     IEEE exceptions for math error handling may want to use this flag
     for speed while maintaining IEEE arithmetic compatibility.

     This option should never be turned on by any `-O' option since it
     can result in incorrect output for programs which depend on an
     exact implementation of IEEE or ISO rules/specifications for math
     functions.

     The default is `-fmath-errno'.

`-funsafe-math-optimizations'
     Allow optimizations for floating-point arithmetic that (a) assume
     that arguments and results are valid and (b) may violate IEEE or
     ANSI standards.  When used at link-time, it may include libraries
     or startup files that change the default FPU control word or other
     similar optimizations.

     This option should never be turned on by any `-O' option since it
     can result in incorrect output for programs which depend on an
     exact implementation of IEEE or ISO rules/specifications for math
     functions.

     The default is `-fno-unsafe-math-optimizations'.

`-ffinite-math-only'
     Allow optimizations for floating-point arithmetic that assume that
     arguments and results are not NaNs or +-Infs.

     This option should never be turned on by any `-O' option since it
     can result in incorrect output for programs which depend on an
     exact implementation of IEEE or ISO rules/specifications.

     The default is `-fno-finite-math-only'.

`-fno-trapping-math'
     Compile code assuming that floating-point operations cannot
     generate user-visible traps.  These traps include division by
     zero, overflow, underflow, inexact result and invalid operation.
     This option implies `-fno-signaling-nans'.  Setting this option
     may allow faster code if one relies on "non-stop" IEEE arithmetic,
     for example.

     This option should never be turned on by any `-O' option since it
     can result in incorrect output for programs which depend on an
     exact implementation of IEEE or ISO rules/specifications for math
     functions.

     The default is `-ftrapping-math'.

`-fsignaling-nans'
     Compile code assuming that IEEE signaling NaNs may generate
     user-visible traps during floating-point operations.  Setting this
     option disables optimizations that may change the number of
     exceptions visible with signaling NaNs.  This option implies
     `-ftrapping-math'.

     This option causes the preprocessor macro `__SUPPORT_SNAN__' to be
     defined.

     The default is `-fno-signaling-nans'.

     This option is experimental and does not currently guarantee to
     disable all GCC optimizations that affect signaling NaN behavior.

-m32
   These `-m' switches are supported in addition to the above on AMD
x86-64 processors in 64-bit environments.

`-m32'
`-m64'
     Generate code for a 32-bit or 64-bit environment.  The 32-bit
     environment sets int, long and pointer to 32 bits and generates
     code that runs on any i386 system.  The 64-bit environment sets
     int to 32 bits and long and pointer to 64 bits and generates code
     for AMD's x86-64 architecture.


          -fdefer-pop
`-fno-defer-pop'
     Always pop the arguments to each function call as soon as that
     function returns.  For machines which must pop arguments after a
     function call, the compiler normally lets arguments accumulate on
     the stack for several function calls and pops them all at once.

     Disabled at levels `-O', `-O2', `-O3', `-Os'.

          -fmerge-constants
`-fmerge-constants'
     Attempt to merge identical constants (string constants and
     floating point constants) across compilation units.

     This option is the default for optimized compilation if the
     assembler and linker support it.  Use `-fno-merge-constants' to
     inhibit this behavior.

     Enabled at levels `-O', `-O2', `-O3', `-Os'.

          -fthread-jumps
`-fthread-jumps'
     Perform optimizations where we check to see if a jump branches to a
     location where another comparison subsumed by the first is found.
     If so, the first branch is redirected to either the destination of
     the second branch or a point immediately following it, depending
     on whether the condition is known to be true or false.

     Enabled at levels `-O', `-O2', `-O3', `-Os'.

          -floop-optimize
`-floop-optimize'
     Perform loop optimizations: move constant expressions out of
     loops, simplify exit test conditions and optionally do
     strength-reduction and loop unrolling as well.

     Enabled at levels `-O', `-O2', `-O3', `-Os'.

          -fcrossjumping
`-fcrossjumping'
     Perform cross-jumping transformation. This transformation unifies
     equivalent code and save code size. The resulting code may or may
     not perform better than without cross-jumping.

     Enabled at levels `-O', `-O2', `-O3', `-Os'.

          -fif-conversion
`-fif-conversion'
     Attempt to transform conditional jumps into branch-less
     equivalents.  This include use of conditional moves, min, max, set
     flags and abs instructions, and some tricks doable by standard
     arithmetics.  The use of conditional execution on chips where it
     is available is controlled by `if-conversion2'.

     Enabled at levels `-O', `-O2', `-O3', `-Os'.


          -fif-conversion2
`-fif-conversion2'
     Use conditional execution (where available) to transform
     conditional jumps into branch-less equivalents.

     Enabled at levels `-O', `-O2', `-O3', `-Os'.

          -fdelayed-branch
`-fdelayed-branch'
     If supported for the target machine, attempt to reorder
     instructions to exploit instruction slots available after delayed
     branch instructions.

     Enabled at levels `-O', `-O2', `-O3', `-Os'.

          -fguess-branch-probability
`-fno-guess-branch-probability'
     Do not guess branch probabilities using a randomized model.

     Sometimes gcc will opt to use a randomized model to guess branch
     probabilities, when none are available from either profiling
     feedback (`-fprofile-arcs') or `__builtin_expect'.  This means that
     different runs of the compiler on the same program may produce
     different object code.

     In a hard real-time system, people don't want different runs of the
     compiler to produce code that has different behavior; minimizing
     non-determinism is of paramount import.  This switch allows users
     to reduce non-determinism, possibly at the expense of inferior
     optimization.

     The default is `-fguess-branch-probability' at levels `-O', `-O2',
     `-O3', `-Os'.

          -fcprop-registers
`-fno-cprop-registers'
     After register allocation and post-register allocation instruction
     splitting, we perform a copy-propagation pass to try to reduce
     scheduling dependencies and occasionally eliminate the copy.

     Disabled at levels `-O', `-O2', `-O3', `-Os'.

          -fforce-mem
`-fforce-mem'
     Force memory operands to be copied into registers before doing
     arithmetic on them.  This produces better code by making all memory
     references potential common subexpressions.  When they are not
     common subexpressions, instruction combination should eliminate
     the separate register-load.

     Enabled at levels `-O2', `-O3', `-Os'.

          -foptimize-sibling-calls
`-foptimize-sibling-calls'
     Optimize sibling and tail recursive calls.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fstrength-reduce
`-fstrength-reduce'
     Perform the optimizations of loop strength reduction and
     elimination of iteration variables.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fcse-follow-jumps  
`-fcse-follow-jumps'
     In common subexpression elimination, scan through jump instructions
     when the target of the jump is not reached by any other path.  For
     example, when CSE encounters an `if' statement with an `else'
     clause, CSE will follow the jump when the condition tested is
     false.

     Enabled at levels `-O2', `-O3', `-Os'.

	  -fcse-skip-blocks
`-fcse-skip-blocks'
     This is similar to `-fcse-follow-jumps', but causes CSE to follow
     jumps which conditionally skip over blocks.  When CSE encounters a
     simple `if' statement with no else clause, `-fcse-skip-blocks'
     causes CSE to follow the jump around the body of the `if'.

     Enabled at levels `-O2', `-O3', `-Os'.

          -frerun-cse-after-loop  
`-frerun-cse-after-loop'
     Re-run common subexpression elimination after loop optimizations
     has been performed.

     Enabled at levels `-O2', `-O3', `-Os'.

	  -frerun-loop-opt
`-frerun-loop-opt'
     Run the loop optimizer twice.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fgcse   
`-fgcse'
     Perform a global common subexpression elimination pass.  This pass
     also performs global constant and copy propagation.

     _Note:_ When compiling a program using computed gotos, a GCC
     extension, you may get better runtime performance if you disable
     the global common subexpression elimination pass by adding
     `-fno-gcse' to the command line.

     Enabled at levels `-O2', `-O3', `-Os'.

	  -fgcse-lm   
`-fgcse-lm'
     When `-fgcse-lm' is enabled, global common subexpression
     elimination will attempt to move loads which are only killed by
     stores into themselves.  This allows a loop containing a
     load/store sequence to be changed to a load outside the loop, and
     a copy/store within the loop.

     Enabled by default when gcse is enabled.


	  -fgcse-sm
`-fgcse-sm'
     When `-fgcse-sm' is enabled, A store motion pass is run after
     global common subexpression elimination.  This pass will attempt
     to move stores out of loops.  When used in conjunction with
     `-fgcse-lm', loops containing a load/store sequence can be changed
     to a load before the loop and a store after the loop.

     Enabled by default when gcse is enabled.

          -fdelete-null-pointer-checks
`-fdelete-null-pointer-checks'
     Use global dataflow analysis to identify and eliminate useless
     checks for null pointers.  The compiler assumes that dereferencing
     a null pointer would have halted the program.  If a pointer is
     checked after it has already been dereferenced, it cannot be null.

     In some environments, this assumption is not true, and programs can
     safely dereference null pointers.  Use
     `-fno-delete-null-pointer-checks' to disable this optimization for
     programs which depend on that behavior.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fexpensive-optimizations
`-fexpensive-optimizations'
     Perform a number of minor optimizations that are relatively
     expensive.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fregmove
`-fregmove'
     Attempt to reassign register numbers in move instructions and as
     operands of other simple instructions in order to maximize the
     amount of register tying.  This is especially helpful on machines
     with two-operand instructions.

     Note `-fregmove' and `-foptimize-register-move' are the same
     optimization.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fschedule-insns  
`-fschedule-insns'
     If supported for the target machine, attempt to reorder
     instructions to eliminate execution stalls due to required data
     being unavailable.  This helps machines that have slow floating
     point or memory load instructions by allowing other instructions
     to be issued until the result of the load or floating point
     instruction is required.

     Enabled at levels `-O2', `-O3', `-Os'.

	  -fschedule-insns2
`-fschedule-insns2'
     Similar to `-fschedule-insns', but requests an additional pass of
     instruction scheduling after register allocation has been done.
     This is especially useful on machines with a relatively small
     number of registers and where memory load instructions take more
     than one cycle.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fsched-interblock 
`-fno-sched-interblock'
     Don't schedule instructions across basic blocks.  This is normally
     enabled by default when scheduling before register allocation, i.e.
     with `-fschedule-insns' or at `-O2' or higher.

	  -fsched-spec
`-fno-sched-spec'
     Don't allow speculative motion of non-load instructions.  This is
     normally enabled by default when scheduling before register
     allocation, i.e.  with `-fschedule-insns' or at `-O2' or higher.

          -fcaller-saves
`-fcaller-saves'
     Enable values to be allocated in registers that will be clobbered
     by function calls, by emitting extra instructions to save and
     restore the registers around such calls.  Such allocation is done
     only when it seems to result in better code than would otherwise
     be produced.

     This option is always enabled by default on certain machines,
     usually those which have no call-preserved registers to use
     instead.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fpeephole2
`-fno-peephole'
`-fno-peephole2'
     Disable any machine-specific peephole optimizations.  The
     difference between `-fno-peephole' and `-fno-peephole2' is in how
     they are implemented in the compiler; some targets use one, some
     use the other, a few use both.

     `-fpeephole' is enabled by default.  `-fpeephole2' enabled at
     levels `-O2', `-O3', `-Os'.

          -freorder-blocks  
`-freorder-blocks'
     Reorder basic blocks in the compiled function in order to reduce
     number of taken branches and improve code locality.

     Enabled at levels `-O2', `-O3', `-Os'.

	  -freorder-functions
`-freorder-functions'
     Reorder basic blocks in the compiled function in order to reduce
     number of taken branches and improve code locality. This is
     implemented by using special subsections `text.hot' for most
     frequently executed functions and `text.unlikely' for unlikely
     executed functions.  Reordering is done by the linker so object
     file format must support named sections and linker must place them
     in a reasonable way.

     Also profile feedback must be available in to make this option
     effective.  See `-fprofile-arcs' for details.

     Enabled at levels `-O2', `-O3', `-Os'.

          -fstrict-aliasing
`-fstrict-aliasing'
     Allows the compiler to assume the strictest aliasing rules
     applicable to the language being compiled.  For C (and C++), this
     activates optimizations based on the type of expressions.  In
     particular, an object of one type is assumed never to reside at
     the same address as an object of a different type, unless the
     types are almost the same.  For example, an `unsigned int' can
     alias an `int', but not a `void*' or a `double'.  A character type
     may alias any other type.

     Pay special attention to code like this:
          union a_union {
            int i;
            double d;
          };

          int f() {
            a_union t;
            t.d = 3.0;
            return t.i;
          }
     The practice of reading from a different union member than the one
     most recently written to (called "type-punning") is common.  Even
     with `-fstrict-aliasing', type-punning is allowed, provided the
     memory is accessed through the union type.  So, the code above
     will work as expected.  However, this code might not:
          int f() {
            a_union t;
            int* ip;
            t.d = 3.0;
            ip = &t.i;
            return *ip;
          }

     Every language that wishes to perform language-specific alias
     analysis should define a function that computes, given an `tree'
     node, an alias set for the node.  Nodes in different alias sets
     are not allowed to alias.  For an example, see the C front-end
     function `c_get_alias_set'.

     Enabled at levels `-O2', `-O3', `-Os'.

          -falign-functions  
`-falign-functions'
`-falign-functions=N'
     Align the start of functions to the next power-of-two greater than
     N, skipping up to N bytes.  For instance, `-falign-functions=32'
     aligns functions to the next 32-byte boundary, but
     `-falign-functions=24' would align to the next 32-byte boundary
     only if this can be done by skipping 23 bytes or less.

     `-fno-align-functions' and `-falign-functions=1' are equivalent
     and mean that functions will not be aligned.

     Some assemblers only support this flag when N is a power of two;
     in that case, it is rounded up.

     If N is not specified, use a machine-dependent default.

     Enabled at levels `-O2', `-O3'.

	  -falign-jumps
`-falign-jumps'
`-falign-jumps=N'
     Align branch targets to a power-of-two boundary, for branch targets
     where the targets can only be reached by jumping, skipping up to N
     bytes like `-falign-functions'.  In this case, no dummy operations
     need be executed.

     If N is not specified, use a machine-dependent default.

     Enabled at levels `-O2', `-O3'.

          -falign-loops  
`-falign-loops'
`-falign-loops=N'
     Align loops to a power-of-two boundary, skipping up to N bytes
     like `-falign-functions'.  The hope is that the loop will be
     executed many times, which will make up for any execution of the
     dummy operations.

     If N is not specified, use a machine-dependent default.

     Enabled at levels `-O2', `-O3'.

	  -falign-labels
`-falign-labels'
`-falign-labels=N'
     Align all branch targets to a power-of-two boundary, skipping up to
     N bytes like `-falign-functions'.  This option can easily make
     code slower, because it must insert dummy operations for when the
     branch target is reached in the usual flow of the code.

     If `-falign-loops' or `-falign-jumps' are applicable and are
     greater than this value, then their values are used instead.

     If N is not specified, use a machine-dependent default which is
     very likely to be `1', meaning no alignment.

     Enabled at levels `-O2', `-O3'.

`-finline-limit=N'
     By default, gcc limits the size of functions that can be inlined.
     This flag allows the control of this limit for functions that are
     explicitly marked as inline (i.e., marked with the inline keyword
     or defined within the class definition in c++).  N is the size of
     functions that can be inlined in number of pseudo instructions
     (not counting parameter handling).  The default value of N is 600.
     Increasing this value can result in more inlined code at the cost
     of compilation time and memory consumption.  Decreasing usually
     makes the compilation faster and less code will be inlined (which
     presumably means slower programs).  This option is particularly
     useful for programs that use inlining heavily such as those based
     on recursive templates with C++.

     Inlining is actually controlled by a number of parameters, which
     may be specified individually by using `--param NAME=VALUE'.  The
     `-finline-limit=N' option sets some of these parameters as follows:

`max-inline-insns'
     is set to N.

`max-inline-insns-single'
     is set to N/2.

`max-inline-insns-single-auto'
     is set to N/2.

`min-inline-insns'
     is set to 130 or N/4, whichever is smaller.

`max-inline-insns-rtl'
     is set to N.

     Using `-finline-limit=600' thus results in the default settings
     for these parameters.  See below for a documentation of the
     individual parameters controlling inlining.

     _Note:_ pseudo instruction represents, in this particular context,
     an abstract measurement of function's size.  In no way, it
     represents a count of assembly instructions and as such its exact
     meaning might change from one release to an another.

`-freduce-all-givs'
     Forces all general-induction variables in loops to be
     strength-reduced.

     _Note:_ When compiling programs written in Fortran,
     `-fmove-all-movables' and `-freduce-all-givs' are enabled by
     default when you use the optimizer.

     These options may generate better or worse code; results are highly
     dependent on the structure of loops within the source code.


`-fprefetch-loop-arrays'
     If supported by the target machine, generate instructions to
     prefetch memory to improve the performance of loops that access
     large arrays.

     Disabled at level `-Os'.

rm -f *.da *.life analyz_prbrob.out 
Remove any profile feedback information from previous runs. 


----------------------------------------------------------------------------
Portability flags used with gcc 3.3 compiler
----------------------------------------------------------------------------

-DFMAX_IS_DOUBLE
     Denotes the availability of "double fmax(double, double)" in system library.
     Used in 252.eon.

-DHAS_ERRLIST
     Tells that the system provides the "sys_nerr" and "sys_errlist[]" variables 
     in 252.eon.

-DLINUX_i386
     Used to enable LINUX specific defines in 186.crafty.

-DPSEC_CPU2000_GLIBC22        
     Compatibility with 2.2 & later versions of glibc (253.perlbmk).

-DSPEC_CPU2000_LINUX_I386
     Specifies to compile for LINUX system (253.perlbmk).

-DSPEC_CPU2000_LP64 (Portability)
     Used to make longs and pointers 64 bit (Used in all benchmarks, except peak runs
     of 181.mcf, 197.parser and 300.twolf).

-DSPEC_CPU2000_NEED_BOOL
     Use SPEC provided definition of the boolean type (253.perlbmk).

-DSYS_IS_USG                  
    Specifies that the operating system is USG compliant. Used in 254.gap. 

-DSYS_HAS_TIME_PROTO          
    Do not explicitly declare time(). Used in 254.gap.
    
-DSYS_HAS_SIGNAL_PROTO        
    Do not explicitly include the contents of <signal.h>. Used in 254.gap.
    
-DSYS_HAS_IOCTL_PROTO
    Do not explicitly declare ioctl(). Used in 254.gap.
    
-DSYS_HAS_ANSI
    System is ANSI compliant. Used in 254.gap.
    
-DSYS_HAS_CALLOC_PROTO
   Do not explicitly declare calloc(). Used in 254.gap.


----------------------------------------------------------------------------
PGI (Portland Group International) compiler 5.1 flags
----------------------------------------------------------------------------

+ACML  Linking with AMD Core Math Library (version 1.5). Supplied with the 
       PGI compiler 5.1

RM_SOURCES=lapak.f90
       Remove the source file 'lapak.f90' in 178.galgel.

-DSPEC_CPU2000_LP64 (Portability)
     Used to make longs and pointers 64 bit


The optimization levels and their meanings are as follows:	

-O0	A basic block is generated for each Fortran statement.  No scheduling 
	is done between statements.  No global optimizations are performed.

-O1	Scheduling within extended basic blocks is performed.  Some register 
	allocation is performed.  No global optimizations are performed.

-O2	All level 1 optimizations are performed.  In addition,  scalar
	optimizations such as induction recognition and loop invariant motion 
	are performed by the global optimizer. 
                
-O3	This level performs all level-one and level-two optimizations and 
	enables more aggressive hoisting and scalar replacement optimizations.



-fast	 Equivalent to "-O2 -Munroll -Mnoframe -Mlre" 

-fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 


-Mcache_align    
     Align unconstrained objects of length greater than or equal to 16 bytes on
     cache-line boundaries. An unconstrained object is a data object that is not
     a member of an aggregate structure or common block. This option does
     not affect the alignment of allocatable or automatic arrays.

     Note: To effect cache-line alignment of stack-based local variables, the
     main program or function must be compiled with -Mcache_align.

-Mfixed 
     Process source using Fortran90 freeform specifications.

-Mflushz 	 
     Set SSE MXCSR register to flush-to-zero mode.

-Mipa=[option]  Enables interprocedural analysis with the specified option. The valid options are:

-Mipa=align  
     Instructs the IPA to recognize when pointer targets are all cache-line 
     aligned, allowing better SSE code generation.

-Mipa=arg  
     Instructs the IPA to remove arguments replaced by -Mipa=ptr,const 

-Mipa=const  
     Enable propagation of constants across procedure calls.

-Mipa=fast  
     Equivalent to: -Mipa=const,globals,localarg,ptr,vestigial 
              	
-Mipa=globals  
     Instructs the IPA to optimize references to globals when not used in procedure calls.		

-Mipa=localarg  
      Externalizes local variables for use with -Mipa=arg

-Mipa=ptr  
     Instructs the IPA to perform pointer disambiguation across procedure calls.

-Mipa=vestigial  
     Instructs the IPA to eliminate functions that are not called.
	
-mp  Enable OpenMP
	
-Mnoframe  
     Eliminate operations that set up a true stack frame pointer for functions.

-Mnosmart   
     Don't run the Smart assembly re-write tool to enable post-compilation 
     linear assembly scheduling and optimization

-Mscalarsse   
     Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
     Extensions) and SSE2  instructions to perform the operations  coded. 
     This assumes the user has an assembler capable of interpreting SSE/SSE2  
     instructions, as in later versions of Linux.  This implies -Mflushz.

-Munroll  
     Invokes the loop unroller.  This also sets the optimization level to 2 
     if the level is set to less than 2.
			
      c:m	Instructs the compiler to completely unroll loops with a
	constant loop count less than or equal to m, a supplied constant.
	If this value is not supplied, the m count is set to 4.

      n:u	Instructs the compiler to unroll u times, a loop which is
	not completely unrolled, or has a non-constant loop count.
	If u is not supplied, the unroller computes the number of times a
	candidate loop is unrolled.

-Mvect=sse  
     Instructs the vectorizer to search for loops, and where possible,
     use the SSE or SSE2 and prefetch instructions
     (depending on which processor is targeted).
     
----------------------------------------------------------------------------
Other Notes
----------------------------------------------------------------------------

taskset [options] [mask] [pid | command [arg] ... ]

     taskset is used to set or retreive the CPU affinity of a running process given its
     PID or to launch a new COMMAND with a given CPU affinity. The CPU affinity is 
     represented as a bitmask, with the lowest order bit corresponding to the first logical
     CPU and highest order bit corresponding to the last logical CPU.
     When the taskset returns, it is gauranteed that the given program has been scheduled to 
     a legal CPU.
     The default behaviour of taskset is to run a new command with a given affinity mask:
       taskset [mask] [command] [arguments]

     The taskset command is used in the following form in the config file:	
 
     submit= "MYNUM=$SPECUSERNUM" ; MYMASK=\$((1<<\$SPECUSERNUM)); /usr/bin/taskset \$MYMASK $command

     $MYMASK is the bitmask corresponding to a specific SPECUSERNUM. For example, $MYMASK value for the first copy
     of a rate run will be 0x00000001, for the second copy of the rate will be 0x00000002
     etc. Thus, the first copy of the rate run will have a CPU affinity of CPU0, the second copy will have the
     affinity CPU1 etc.


BIOS Setting Definitions -

DRAM Interleave defines whether data will be interleaved among the four data 
    banks within individual DRAMs.

Node Interleave defines whether or not data addresses will be alternating
     between both processors in 4KB blocks.

ACPI SRAT defines whether the Static Resource Allocation Table is exported by
   the BIOS to a location where the operating system can see it.  The SRAT may
     only be exported when Node Interleave is disabled.