A comment in a Reddit thread on my previous blog post claims that the code is optimized if the variables are declared in the same translation unit as the usage. That is, the claim is that
int a; int b; int foo(void) { return &a != &b; }compiles to code returning a constant value. This is true for C++
_Z3foov: movl $1, %eax retbut compiling as C still generates
foo: movl $a, %eax cmpq $b, %rax setne %al movzbl %al, %eax ret
The reason is that an external declaration without an initializer is what the C standard calls a tentative definition. The tentative definition works in essentially the same way as if it were defined
One other way to make GCC optimize this is to specify the command-line option
extern
(but with the difference that the linker creates the variable if it is not defined in any translation unit), so GCC must assume that the linker may assign the same address for a
and b
in the same way as for normal extern
-declared variables. It is possible to make the variables "real" definitions by initializing them with a value, so changing the declarations toint a = 0; int b = 0;enables the compiler to optimize the function in C too.
One other way to make GCC optimize this is to specify the command-line option
-fno-common
that disables generation of "common symbols" which are used for emitting tentative definitions. This has the benefit of improving code generation for some CPU architectures, such as ARM. As an example, consider the functionint a; int b; int foo(void) { return a + b; }The ARM architecture need the variables' addresses placed in registers in order to access the values, so the compiler will generate code similar to
foo: ldr r2, .L2 /* Load address of a */ ldr r3, .L2+4 /* Load address of b */ ldr r0, [r2] /* Load a */ ldr r3, [r3] /* Load b */ add r0, r0, r3 bx lr .L2: .word a .word bThe compiler could do a better job if it knew how the variables are laid out in memory, as it then only needs to load one address and use relative addressing for the other. But that is not possible when
a
and b
are tentative definitions, as the linker may place them wherever it chooses. Using -fno-common
will, however, place a
and b
in the data (or BSS) segment exactly as they are laid out by the compiler, and the code can be optimized based on knowledge that b
is placed right after a
in memory1foo: ldr r2, .L2 /* Load address of a */ ldr r0, [r2] /* Load a */ ldr r3, [r2, #4] /* Load b */ add r0, r0, r3 bx lr .L2: .word a
1. A real ARM compiler would use an
ldmia
instruction in this case, which is even better. But I choose to use this naive code generation in the example in order to show the general principle, without discussing too many ARM-specific details.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.