IBM Linux Flag Disclosure SPEC OMP2001 For use with Linux submissions with the IBM XL compilers. Last Revised 04 October, 2012 Notes ===== The IBM C/C++ & Fortran compilers produce 32-bit binaries by default. Flags are described below which cause the compilers to produce 64-bit binaries. Source Level Portability Options ================================ Compiler Invocation =================== xlc Invokes the compiler for C source files with a default language level of ansi and specifies that it allow type-based aliasing. xlc_r The same as "xlc" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. xlf90 Invokes the compiler for Fortran source files with a default language of Fortran 90. xlf90_r The same as "xlf90" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. cleanpdf Erase the information in the PDF directory if any exists to ensure no feedback information is reused between compilations. Compiler Options ================ -O Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. -O3 Perform some memory and compile time intensive optimizations in addition to those executed with -O. The -O3 specific optimizations have the potential to slightly alter the semantics of a user's program. Optimizations may include, but are not limited to: Aggressive code motion, and scheduling on computations that have the potential to raise an exception, but no valid exceptions will be suppressed; Relaxed conformance to IEEE rules in cases where the difference in the results is not important to an application; Rewriting of floating point expressions. -O4 Equivalent to -O3 -qipa -qhot with automatic generation of architecture ( -qarch= )and tuning ( -qtune= )options ideal for that platform. The qipa level defaults to level=1. -O5 Equivalent to -O3 -qipa=level=2 -qhot with automatic generation of architecture ( -qarch= ) and tuning ( -qtune= ) options ideal for that platform. -Q, -qinline The -Q option without any list inlines all appropriate procedures, subject to limits on the number of inlined calls and the amount of code size increase as a result. -qinline is an alias for -Q. -q32 Selects 32-bit compiler mode. -q64 Selects 64-bit compiler mode. -qalign=struct=natural The compiler maps structure members to their -qalign=natural natural boundaries. The first form is used by the Fortran compiler; the second form is used by the C compiler and is a deprecated form for the Fortran compiler. -qarch=pwr6 Produces object code containing instructions that will run on power6 processors. -qarch=pwr6e Produces object code containing instructions that will run on power6 processors executing in "Enhanced" mode which includes instructions that are part of the optional instructions in the PowerPC standard. -qarch=pwr7 Produces object code containing instructions that will run on POWER7 processors. -qarch=auto Produces object code containing instructions that will run on the hardware platform on which the program is compiled. -qessl Specifies that, if either -lessl or -lesslsmp are also specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so. -qfixed Indicates that the input source program is in fixed form. Allows fixed format Fortran 77 programs to be compiled using the xlf90 compiler invocation. -qfixed= States that Fortran code is in fixed source form, with optional argument specifying the maximum line length. -qhot Perform high-order transformations on loops during optimization. -qhot=arraypad Pad the sizes of arrays to align better in cache. -qipa=level=1 Turns on interprocedural analysis with inlining, limited alias analysis, and limited call-site tailoring. This is the default level of -qipa. -qipa=level=2 Turns on interprocedural analysis with inlining, cloning, full alias analysis, constant propagation, call-site tailoring, and dead code removal. -qipa=noobject Do not generate object files during the first stage of inter- procedural analysis. -qinline Alias for -Q. See -Q. -qipa=partition=large Specifies the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize. -qmaxmem=-1 Allows the compiler to use as much memory as it needs to execute. -qpdf1 The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. -qpdf2 The option used in the second pass of a profile directed feedback compile that causes pdf information to be utilized during optimization. -qsmp=omp Enable OpenMP parallelization directives. -qsuffix=f=f90 Sets the suffix for source files to be .f90. The .f90 suffix is required by xlf90 to compile Fortran 90 programs. -qsuppress=cmpmsg Suppress the output of the specified message(s). cmpmsg is the message put out at the compilation completion of each Fortran routine. -qtune=pwr6 Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting. -qunroll=n Unrolls inner loops in th program by a factor of n. -w Suppress warning messages from the C, C++, and Fortran compilers. Linker Options ============== -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT Enables the usage of the libhugetlbfs "ld" script in place of normal linker. BDT will link the application to store text, initialized data, and bss data into hugepages. -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=B Enables the usage of the libhugetlbfs "ld" script in place of normal linker. B will link the application to bss data into hugepages. -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-align Pass the --hugetlbfs-align flag to the linker so that we can control (by environment variable HUGETLB_ELFMAP) which program segments are placed in hugepages. -lessl Link the Engineering and Scientifc Subroutine Library (ESSL). -lesslsmp Link the Parallel Engineering and Scientifc Subroutine Library (ESSL). -lmass Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions. Linux Environment Variables: ========================== HUGETLB_MORECORE=yes Enables the libhugetlbfs functionality hugepage malloc() feature, instructing libhugetlbfs to override libc's normal morecore() function with a hugepage version and use it for malloc(). From sourceforge libhugetlbfs Version 1 product. HUGETLB_ELFMAP=R Instructs libhugetlbfs to place text segment in hugepages. HUGETLB_ELFMAP=W Instructs libhugetlbfs to place data and BSS segments in hugepages. HUGETLB_ELFMAP=RW Instructs libhugetlbfs to place all segments in hugepages. HUGETLB_ELFMAP=no Instructs libhugetlbfs not to place any segment in hugepages. HUGETLB_VERBOSE=0 Instructs libhugetlbfs to silence the library completely, even in the case of errors - the only exception is in cases where the library has to abort(), which can happen if something goes wrong in the middle of unmapping and remapping segments for the text/data/bss feature. HUGETLB_SHM=yes Instructs libhugetlbfs to add SHM_HUGETLB flag to the shmget() call and the size parameter is aligned to back the shared memory segment with hugepages. In the event hugepages cannot be used, small pages will be used instead and a warning will be printed to explain the failure. The pagesize cannot be specified with this parameter. To change the kernels default hugepage size, use the pagesize= kernel boot parameter (2.6.26 or later required). LD_PRELOAD=libhugetlbfs.so This tells the dynamic linker to load the libhugetlbfs shared library, even though the program wasn't originally linked against it. Enables HUGETLB_MORECORE processing. OMP_DYNAMIC=FALSE Disables dynamic adjustment of the number of available threads. OMP_NUM_THREADS=... The exact number of threads available to be used, or if OMP_DYNAMIC is TRUE, the upper limit on the number of available threads. XLFRTEOPTS=intrinthds={num_threads} Specifies the number of threads for parallel execution for parallel execution of the MATMUL and RANDOM_NUMBER intrinsic procedures. The default value for num_threads when using the MATMUL intrinsic equals the number of processors online. The default value for num_threads when using the RANDOM_NUMBER intrinsic is equal to the number of processors online*2. Changing the number of threads available to the MATMUL and RANDOM_NUMBER intrinsic procedures can influence performance. XLSMPOPTS A list of runtime settings affecting SMP execution. Here are some of the possibilities: SCHEDULE=STATIC Work is scheduled to threads round-robin. SPINS=0 Allows work-requests to spin indefinitely without the thread having to yield the time-slice. STACK=.... Specifies the largest allowable size of a thread's stack, in bytes. YIELDS=0 Allows the thread to yield an indefinite number of times without being driven into a sleep state. STARTPROC=n When assigning threads to processor's, begin with thread n. STRIDE=X When assigning the next thread to a processor, add X to the current processor index instead of using (processor+1). PROCS= Enables thread binding and specifies a list of cpu_id to which the threads are bound. If the number of CPU IDs specified is less than the number of threads used by the program, the remaining threads are not bound. Stack Size Information: ======================= Stack size set to unlimited using the command "ulimit -s unlimited".