The way shared libraries work affect how the code can be optimized, so GCC must be more conservative with inlining when building shared libraries (i.e. when compiling with
-fpic
or -fPIC
).
Consider the functions
int foo(void) { return 23; } int bar(void) { return 19 + foo(); }Compiling this with "
gcc -O3
" inlines foo
into bar
foo: movl $23, %eax ret bar: movl $42, %eax retbut that is not the case when compiling using "
gcc -O3 -fPIC
"foo: movl $23, %eax ret bar: subq $8, %rsp call foo@PLT addq $8, %rsp addl $19, %eax retThe reason is that ELF permits symbols in shared libraries to be overridden by the dynamic linker — a typical use case is to use
LD_PRELOAD
to load a debug library that contains logging versions of some functions. This has the effect that GCC cannot know that it is the foo
above that really is called by bar
, and thus cannot inline it. It is only exported symbols that can be overridden, so anonymous namespaces and static
functions are optimized as usual, as are functions defined as "extern inline
" (the compiler is told to inline, so it may assume the function will not be overridden).The missed optimizations from this are especially noticeable when doing link-time optimization — the benefit of LTO is that the compiler can see the whole library and inline between files, but this is not possible if those functions may be replaced. This problem makes all interprocedural optimizations (such as devirtualization) ineffective, not only inlining.
There are two ways to get GCC to optimize shared libraries in the same way as normal code
- Use the command line option
-fno-semantic-interposition
- Avoid exporting unnecessary symbols by passing
-fvisibility=hidden
to the compiler, and manually export the needed symbols using a linker export map or by decorating the functions with__attribute__((__visibility__("default")))
There is this other interposition-type issue I ran into (in both gcc and clang) that you might find interesting: http://www.playingwithpointers.com/ipo-and-derefinement.html
ReplyDelete