Flags and Targets - Algorithmica

The first step of getting high performance from the compiler is to ask for it, which is done with over a hundred different compiler options, attributes, and pragmas.

#Optimization Levels

There are 4 and a half main levels of optimization for speed in GCC:

-O0 is the default one that does no optimizations (although, in a sense, it does optimize: for compilation time).
-O1 (also aliased as -O) does a few “low-hanging fruit” optimizations, almost not affecting the compilation time.
-O2 enables all optimizations that are known to have little to no negative side effects and take a reasonable time to complete (this is what most projects use for production builds).
-O3 does very aggressive optimization, enabling almost all correct optimizations implemented in GCC.
-Ofast does everything in -O3, plus a few more optimizations flags that may break strict standard compliance, but not in a way that would be critical for most applications (e.g., floating-point operations may be rearranged so that the result is off by a few bits in the mantissa).

There are also many other optimization flags that are not included even in -Ofast, because they are very situational, and enabling them by default is more likely to hurt performance rather than improve it — we will talk about some of them in the next section.

#Specifying Targets

The next thing you may want to do is to tell the compiler more about the computer(s) this code is supposed to be run on: the smaller the set of platforms is, the better. By default, it will generate binaries that can run on any relatively new (>2000) x86 CPU. The simplest way to narrow it down is to pass -march flag to specify the exact microarchitecture: -march=haswell. If you are compiling on the same computer that will run the binary, you can use -march=native for auto-detection.

The instruction sets are generally backward-compatible, so it is often enough to just use the name of the oldest microarchitecture you need to support. A more robust approach is to list specific features that the CPU is guaranteed to have: -mavx2, -mpopcnt. When you just want to tune the program for a particular machine without using any instructions that may crash it on incompatible CPUs, you can use the -mtune flag (by default -march=x also implies -mtune=x).

These options can also be specified for a compilation unit with pragmas instead of compilation flags:

#pragma GCC optimize("O3")
#pragma GCC target("avx2")

This is useful when you need to optimize a single high-performance procedure without increasing the build time for the entire project.

#Multiversioned Functions

Sometimes you may also want to provide several architecture-specific implementations in a single library. You can use attribute-based syntax to select between multiversioned functions automatically during compile time:

__attribute__(( target("default") )) // fallback implementation
int popcnt(int x) {
    int s = 0;
    for (int i = 0; i < 32; i++)
        s += (x>>i&1);
    return s;
}

__attribute__(( target("popcnt") )) // used if popcnt flag is enabled
int popcnt(int x) {
    return __builtin_popcount(x);
}

In Clang, you can’t use pragmas to set target and optimization flags from the source code, but you can use attributes the same way as in GCC.