## Thursday, November 24, 2016

### Tentative variable definitions, and -fno-common

A comment in a Reddit thread on my previous blog post claims that the code is optimized if the variables are declared in the same translation unit as the usage. That is, the claim is that
int a;
int b;

int foo(void)
{
return &a != &b;
}

compiles to code returning a constant value. This is true for C++
_Z3foov:
movl    $1, %eax ret  but compiling as C still generates foo: movl$a, %eax
cmpq    \$b, %rax
setne   %al
movzbl  %al, %eax
ret

The reason is that an external declaration without an initializer is what the C standard calls a tentative definition. The tentative definition works in essentially the same way as if it were defined extern (but with the difference that the linker creates the variable if it is not defined in any translation unit), so GCC must assume that the linker may assign the same address for a and b in the same way as for normal extern-declared variables. It is possible to make the variables "real" definitions by initializing them with a value, so changing the declarations to
int a = 0;
int b = 0;

enables the compiler to optimize the function in C too.

One other way to make GCC optimize this is to specify the command-line option -fno-common that disables generation of "common symbols" which are used for emitting tentative definitions. This has the benefit of improving code generation for some CPU architectures, such as ARM. As an example, consider the function
int a;
int b;

int foo(void)
{
return a + b;
}

The ARM architecture need the variables' addresses placed in registers in order to access the values, so the compiler will generate code similar to
foo:
ldr     r0, [r2]        /* Load a */
ldr     r3, [r3]        /* Load b */
bx      lr
.L2:
.word   a
.word   b

The compiler could do a better job if it knew how the variables are laid out in memory, as it then only needs to load one address and use relative addressing for the other. But that is not possible when a and b are tentative definitions, as the linker may place them wherever it chooses. Using -fno-common will, however, place a and b in the data (or BSS) segment exactly as they are laid out by the compiler, and the code can be optimized based on knowledge that b is placed right after a in memory1
foo:
ldr     r0, [r2]        /* Load a */
ldr     r3, [r2, #4]    /* Load b */

1. A real ARM compiler would use an ldmia instruction in this case, which is even better. But I choose to use this naive code generation in the example in order to show the general principle, without discussing too many ARM-specific details.