The way shared libraries work affect how the code can be optimized, so GCC must be more conservative with inlining when building shared libraries (i.e. when compiling with
-fpic or -fPIC).
Consider the functions
int foo(void)
{
return 23;
}
int bar(void)
{
return 19 + foo();
}
Compiling this with "gcc -O3" inlines foo into barfoo:
movl $23, %eax
ret
bar:
movl $42, %eax
ret
but that is not the case when compiling using "gcc -O3 -fPIC"foo:
movl $23, %eax
ret
bar:
subq $8, %rsp
call foo@PLT
addq $8, %rsp
addl $19, %eax
ret
The reason is that ELF permits symbols in shared libraries to be overridden by the dynamic linker — a typical use case is to use LD_PRELOAD to load a debug library that contains logging versions of some functions. This has the effect that GCC cannot know that it is the foo above that really is called by bar, and thus cannot inline it. It is only exported symbols that can be overridden, so anonymous namespaces and static functions are optimized as usual, as are functions defined as "extern inline" (the compiler is told to inline, so it may assume the function will not be overridden).The missed optimizations from this are especially noticeable when doing link-time optimization — the benefit of LTO is that the compiler can see the whole library and inline between files, but this is not possible if those functions may be replaced. This problem makes all interprocedural optimizations (such as devirtualization) ineffective, not only inlining.
There are two ways to get GCC to optimize shared libraries in the same way as normal code
- Use the command line option
-fno-semantic-interposition - Avoid exporting unnecessary symbols by passing
-fvisibility=hiddento the compiler, and manually export the needed symbols using a linker export map or by decorating the functions with__attribute__((__visibility__("default")))
There is this other interposition-type issue I ran into (in both gcc and clang) that you might find interesting: http://www.playingwithpointers.com/ipo-and-derefinement.html
ReplyDelete