Table of Contents

Optimisation Level

Flag Description
-O0 Disables all optimisations and is useful for debugging.
-O1 Optimize.
-Os Enables all the flags for -O2 but disables the flags that increase the binary size.
-O2 Safest optimisation level for speed.
-O3 Enables aggressive optimisations.
-Ofast Only in recent compilers, enables even more optimisations by violating standards compliance.

The most common optimisation levels are -Os and -O2. Anything more aggressive than than that is bound to break code on a large scale. The optimisation levels are cumulative such that specifying -O3 will include the optimisations from -O2.

Note that you will find many instances along the following lines:

-O2 -fomit-frame-pointer

however, -fomit-frame-pointer is already included in -O1 which is included in -O2 which makes specifying the flag redundant. When in doubt, consult the official optimisation options manual.

Specifying the Architecture, CPU Tune and CPU Type

The safest -march that can be set is -march=native which allows the compiler to guess the processor's features and use them during compilation. This is a feature supported from GCC 4 and up and eliminates the need to supply an -march manually.

However, if you need to specify a -march, -mcpu or -mtune, then a good procedure is as follows:

As an example, suppose we have an Intel i7 running on gcc version 4.2.4. Looking at the list on the manual page above, we find:

pentium4, pentium4m
    Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set support. 
prescott
    Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support. 
nocona
    Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support. 

Since we are running a 64-bit processor, and because the Intel i7 supports the MMX, SSE, SSE2 and SSE3 instruction set, the most suiting for us in gcc version 4.2.4 is nocona. However, prescott and even pentium4 will be fine.

64bit vs 32bit

To switch between 64bit and 32bit use the flag -m64, respectively -m32.

Streaming SIMD Extensions (SSE)

The following is the list of SSE flags for the gcc compiler.

-msse2 -msse3 -mssse3 -msse4.1 -msse4.2

They do not need to all be listed in the CFLAGS as the compiler will choose the most advanced one during compilation.

3DNow!

For AMD-based systems:

-m3dnow

can be added in order to enable multimedia extensions.

Intel i5 and i7

Both support SSE4.1:

-O2 -m64 -flto -msse4.1 -mfpmath=sse -ffast-math -funroll-loops

For i7 with Nehalem processors which support SSE4.2, replace -msse4.1 with -msse4.2.

Intel Atom N2xx and Z5xx

-O2 -flto -mssse3 -mfpmath=sse