Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

All compilers except GNU Compiler 8.3.1 (system default) are available via the module command

Code Block
languagebash
[20212022-0704-2228 1509:4027.3651] ~
[alp514.sol](11151046): module av gcc intel nvhpc oneapi

------------------------------------------------------------------------------ /share/Apps/lusoft/share/spack/lmod/avx2/linux-centos8-x86_64/Core -------------------------------------------------------------------------------
   gcc/9.3.0     intel-mkl/2020.3.279      intel/19.0.3    intel-mkl/202021.3.0.3 (D)    nvhpc/20.9intel-tbb/2021.3.0 (D)    intel/20.0.3   (D)    nvhpc/20.9                 oneapi-inspector/2021.3.0    oneapi-mpi/2021.3.0      oneapi/2021.3.0
   intel-mkl/2020.3.279    intel-tbb/2020.3          intel/19.0.3              intel/2021.3.0        oneapi-advisor/2021.3.0    oneapi-itac/2021.3.0         oneapi-vtune/2021.5.0

  Where:
   D:  Default Module

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

...

The following are common optimization flags to use when compiling with the PGI Compiler

Flag

Description

-accEnable OpenACC directives.
=tesla(:tesla_suboptions) hostSpecify the target accelerator.

-tp <processor>

Specify the type(s) of the target processor(s). processor can be sandybridge-64, haswell-64, or skylake-64.

-mtune=processor

Tune to processor everything applicable about the generated code, except for the ABI and the set of available instructions. processor can be sandybridge, ivybridge, haswell, broadwell or skylake-avx512. Some older programs/makefile might use -mcpu that is deprecated

-fast

Generally optimal set of flags.

-fastsseGenerally optimal set of flags for targets that include SSE/SSE2 capability.
-MipaInvokes interprocedural analysis and optimization.

-Munroll

Controls loop unrolling.

-MinfoPrints informational messages regarding optimization and code generation to standard output as compilation proceeds.

-shared

Passed to the linker. Instructs the linker to generate a shared object file. Implies -⁠fpic.

-Bstatic

Statically link all libraries, including the PGI runtime.

See https://www.pgroup.com/resources/docs/19.5/x86/pgi-ref-guide/index.htm for more detailed description and other options

Multi-architecture CPU optimization

CPU architectures change from generation to generation, affecting the data/instruction processing and adding/modifying CPU instructions. A common trend recently has been improving vectorization capabilities of the CPUs. Currently, Research Computing supports three generations of Intel CPUs - Sandybridge, Haswell/Broadwell and Skylake CPUs -- each of which shows incremental improvement in vectorization processing power. Starting with AVX, Intel CPUs feature increasingly complex logic of clock speed adjustments depending on how many CPU cores and vector units are bring used. Each core can have its frequency adjusted independently allowing for multiple users to run different workloads. In general, if less cores and vectorization is utilized, the CPU can run faster than when all cores and vector units are used. It therefore is important to optimize the code for the architecture of the CPU being used. 


Intel and PGI compilers support building multiple optimized codes for various architectures into a single executable. GNU compilers do not support this option. Any application (except GROMACS) built with Intel and PGI compilers are optimized for Skylake, Haswell and Skylake CPU. Applications built with GNU compilers are optimized to run on Haswell (base CPU architecture of Sol). GROMACS compile option do not permit building multiple architecture executables. By default, GROMACS is built for Haswell/Broadwell CPUs with Sklylake optimized builds available (see modules with -avx512 suffix).


Intel Compilers

Intel builds executable optimized for a particular architecture by using the -ax flag, also known as automatic cpu dispatch. To build executables that vectorizes optimally to run on Sol (Haswell/Broadwell and Skylake), and supported faculty clusters (SandyBridge/IvyBridge), you need to add the -axCORE-AVX512,CORE-AVX2,AVX as a compiler option.

Code Block
languagebash
[2019-07-16 15:11.16] ~/Workshop/sum2017/saxpy/solution
[alp514.sol](1029): ifort -axCOMMON-AVX512,CORE-AVX512,CORE-AVX2,CORE-AVX-I,AVX -o saxpy saxpy.f90
saxpy.f90(1): (col. 9) remark: MAIN__ has been targeted for automatic cpu dispatch


NVIDIA HPC SDK Compilers

NVIDIA compilers builds executable optimized for a particular architecture by bundling different architecture name to the -tp flag, also known as unified binary. To build executables that vectorizes optimally to run on Sol (Haswell/Broadwell and Skylake), and supported faculty clusters (SandyBridge/IvyBridge), you need to add the -tp=sandybridge-64,haswell-64,skylake-64 as a compiler option. To check if the code is being vectorized, add the -Minfo=vect flag

Code Block
languagebash
[2021-07-22 15:47.47] ~/Workshop/2021HPC/parprog/solution/saxpy
[alp514.sol](1134): nvfortran -fastsse -tp=haswell-64 -Minfo -o saxpy saxpy.f90
saxpy:
     11, Memory set idiom, loop replaced by call to __c_mset4
     12, Memory set idiom, loop replaced by call to __c_mset4
     16, Generated vector simd code for the loop


GNU Compilers

GNU compilers do not permit building a single optimized executable for multiple architectures. You need to build a separate executable for each CPU architecture using the -march=cpuarch flag where cpuarch can be sandybridge, ivybridge, haswell, broadwell or skylake. 

OpenMP

OpenMP is an Application Program Interface (API) for thread based parallelism. It supports Fortran, C and C++ and uses a fork-join execution model. OpenMP structures are built with program directives, runtime libraries and environment variables. OpenMP is implemented in all major compiler suites and no separate module needs to be loaded. OpenMP permits incremental parallelization of serial by adding compiler directive that appear as comments and are only activated when the appropriate flags are added to compile command. 

Compiling OpenMP Code

Different compilers have different OpenMP compile flags.

...