Flag | Description |
---|---|
-O0 | Disables all optimisations and is useful for debugging. |
-O1 | Optimize. |
-Os | Enables all the flags for -O2 but disables the flags that increase the binary size. |
-O2 | Safest optimisation level for speed. |
-O3 | Enables aggressive optimisations. |
-Ofast | Only in recent compilers, enables even more optimisations by violating standards compliance. |
The most common optimisation levels are -Os
and -O2
. Anything more aggressive than than that is bound to break code on a large scale. The optimisation levels are cumulative such that specifying -O3
will include the optimisations from -O2
.
Note that you will find many instances along the following lines:
-O2 -fomit-frame-pointer
however, -fomit-frame-pointer
is already included in -O1
which is included in -O2
which makes specifying the flag redundant. When in doubt, consult the official optimisation options manual.
The safest -march
that can be set is -march=native
which allows the compiler to guess the processor's features and use them during compilation. This is a feature supported from GCC 4
and up and eliminates the need to supply an -march
manually.
However, if you need to specify a -march
, -mcpu
or -mtune
, then a good procedure is as follows:
CPU
you have and on Linux you can issue cat /proc/cpuflags
to see the flags supported by your CPU
. A description of those features can be found on the cpuflags page.gcc -v
in a terminal to determine which compiler version you have.gcc
corresponding to your version and locate the architecture you are on. For example, for i386
and x86_64
with gcc
version 4.2.4
we find the manual page listing the modifiers.-march
, -mcpu
, or -mtune
that closely matches the CPU
features you have. It is important to specify a processor that has less or the same set of features matching your CPU
.
As an example, suppose we have an Intel i7
running on gcc
version 4.2.4
. Looking at the list on the manual page above, we find:
pentium4, pentium4m Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set support. prescott Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support. nocona Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support.
Since we are running a 64
-bit processor, and because the Intel i7
supports the MMX
, SSE
, SSE2
and SSE3
instruction set, the most suiting for us in gcc
version 4.2.4
is nocona
. However, prescott
and even pentium4
will be fine.
To switch between 64bit and 32bit use the flag -m64
, respectively -m32
.
The following is the list of SSE
flags for the gcc
compiler.
-msse2 -msse3 -mssse3 -msse4.1 -msse4.2
They do not need to all be listed in the CFLAGS
as the compiler will choose the most advanced one during compilation.
For AMD
-based systems:
-m3dnow
can be added in order to enable multimedia extensions.
Both support SSE4.1
:
-O2 -m64 -flto -msse4.1 -mfpmath=sse -ffast-math -funroll-loops
For i7
with Nehalem
processors which support SSE4.2
, replace -msse4.1
with -msse4.2
.
-O2 -flto -mssse3 -mfpmath=sse