Table of Contents |
---|
...
All compilers except GNU Compiler 8.3.1 (system default) are available via the module command
Code Block | ||
---|---|---|
| ||
[20212022-0704-2228 1509:4027.3651] ~ [alp514.sol](11151046): module av gcc intel nvhpc oneapi ------------------------------------------------------------------------------ /share/Apps/lusoft/share/spack/lmod/avx2/linux-centos8-x86_64/Core ------------------------------------------------------------------------------- gcc/9.3.0 intel-mkl/2020.3.279 intel/19.0.3 intel-mkl/202021.3.0.3 (D) nvhpc/20.9intel-tbb/2021.3.0 (D) intel/20.0.3 (D) nvhpc/20.9 oneapi-inspector/2021.3.0 oneapi-mpi/2021.3.0 oneapi/2021.3.0 intel-mkl/2020.3.279 intel-tbb/2020.3 intel/19.0.3 intel/2021.3.0 oneapi-advisor/2021.3.0 oneapi-itac/2021.3.0 oneapi-vtune/2021.5.0 Where: D: Default Module Use "module spider" to find all possible modules and extensions. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". |
...
The following are common optimization flags to use when compiling with the PGI Compiler
Flag | Description |
---|---|
-acc | Enable OpenACC directives. |
=tesla(:tesla_suboptions) host | Specify the target accelerator. |
-tp <processor> | Specify the type(s) of the target processor(s). processor can be sandybridge-64, haswell-64, or skylake-64. |
-mtune=processor | Tune to processor everything applicable about the generated code, except for the ABI and the set of available instructions. processor can be sandybridge, ivybridge, haswell, broadwell or skylake-avx512. Some older programs/makefile might use -mcpu that is deprecated |
-fast | Generally optimal set of flags. |
-fastsse | Generally optimal set of flags for targets that include SSE/SSE2 capability. |
-Mipa | Invokes interprocedural analysis and optimization. |
-Munroll | Controls loop unrolling. |
-Minfo | Prints informational messages regarding optimization and code generation to standard output as compilation proceeds. |
-shared | Passed to the linker. Instructs the linker to generate a shared object file. Implies -fpic. |
-Bstatic | Statically link all libraries, including the PGI runtime. |
See https://www.pgroup.com/resources/docs/19.5/x86/pgi-ref-guide/index.htm for more detailed description and other options
Multi-architecture CPU optimization
CPU architectures change from generation to generation, affecting the data/instruction processing and adding/modifying CPU instructions. A common trend recently has been improving vectorization capabilities of the CPUs. Currently, Research Computing supports three generations of Intel CPUs - Sandybridge, Haswell/Broadwell and Skylake CPUs -- each of which shows incremental improvement in vectorization processing power. Starting with AVX, Intel CPUs feature increasingly complex logic of clock speed adjustments depending on how many CPU cores and vector units are bring used. Each core can have its frequency adjusted independently allowing for multiple users to run different workloads. In general, if less cores and vectorization is utilized, the CPU can run faster than when all cores and vector units are used. It therefore is important to optimize the code for the architecture of the CPU being used.
Intel and PGI compilers support building multiple optimized codes for various architectures into a single executable. GNU compilers do not support this option. Any application (except GROMACS) built with Intel and PGI compilers are optimized for Skylake, Haswell and Skylake CPU. Applications built with GNU compilers are optimized to run on Haswell (base CPU architecture of Sol). GROMACS compile option do not permit building multiple architecture executables. By default, GROMACS is built for Haswell/Broadwell CPUs with Sklylake optimized builds available (see modules with -avx512 suffix).
Intel Compilers
Intel builds executable optimized for a particular architecture by using the -ax flag, also known as automatic cpu dispatch. To build executables that vectorizes optimally to run on Sol (Haswell/Broadwell and Skylake), and supported faculty clusters (SandyBridge/IvyBridge), you need to add the -axCORE-AVX512,CORE-AVX2,AVX as a compiler option.
Code Block | ||
---|---|---|
| ||
[2019-07-16 15:11.16] ~/Workshop/sum2017/saxpy/solution [alp514.sol](1029): ifort -axCOMMON-AVX512,CORE-AVX512,CORE-AVX2,CORE-AVX-I,AVX -o saxpy saxpy.f90 saxpy.f90(1): (col. 9) remark: MAIN__ has been targeted for automatic cpu dispatch |
NVIDIA HPC SDK Compilers
NVIDIA compilers builds executable optimized for a particular architecture by bundling different architecture name to the -tp flag, also known as unified binary. To build executables that vectorizes optimally to run on Sol (Haswell/Broadwell and Skylake), and supported faculty clusters (SandyBridge/IvyBridge), you need to add the -tp=sandybridge-64,haswell-64,skylake-64 as a compiler option. To check if the code is being vectorized, add the -Minfo=vect flag
Code Block | ||
---|---|---|
| ||
[2021-07-22 15:47.47] ~/Workshop/2021HPC/parprog/solution/saxpy [alp514.sol](1134): nvfortran -fastsse -tp=haswell-64 -Minfo -o saxpy saxpy.f90 saxpy: 11, Memory set idiom, loop replaced by call to __c_mset4 12, Memory set idiom, loop replaced by call to __c_mset4 16, Generated vector simd code for the loop |
GNU Compilers
GNU compilers do not permit building a single optimized executable for multiple architectures. You need to build a separate executable for each CPU architecture using the -march=cpuarch flag where cpuarch can be sandybridge, ivybridge, haswell, broadwell or skylake.
OpenMP
OpenMP is an Application Program Interface (API) for thread based parallelism. It supports Fortran, C and C++ and uses a fork-join execution model. OpenMP structures are built with program directives, runtime libraries and environment variables. OpenMP is implemented in all major compiler suites and no separate module needs to be loaded. OpenMP permits incremental parallelization of serial by adding compiler directive that appear as comments and are only activated when the appropriate flags are added to compile command.
Compiling OpenMP Code
Different compilers have different OpenMP compile flags.
...