John Regehr tweeted the example
int foo(void) __attribute__((pure)); int bar(void) { int acc = 0; for (int i = 0; i < 100; ++i) acc += foo(); return acc; }that is optimized to
_Z3barv: pushq %rax callq _Z3foov imull $100, %eax, %eax popq %rcx retqby clang, but is not optimized by GCC. The obvious question is "why is it not optimized?"
GCC does actually optimize this if it is compiled as C, but not if compiled as C++. The reason is that GCC allows
pure
functions to throw exceptions in C++ which prevents some optimizations — bar
is optimized as expected in C++ too if foo
is marked as noexcept
, or if exceptions are disabled by passing the command-line option -fno-exceptions
to the compiler.There are additional restrictions on how
pure
functions may be optimized. The description of the pure
attribute in the GCC manual says
Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure
.
The possible use of global variables limits how calls to the
There is another attribute,
This means that the optimization opportunities of
So GCC is overly conservative when handling the loop in
pure
functions may be moved. For example,int foo(void) __attribute__((pure)); int a; int baz(void) { int acc = 0; acc += foo(); a = 0; acc += foo(); return acc; }cannot be optimized to the equivalent of
int baz(void) { a = 0; return 2 * foo(); }as it is possible that the result of
foo()
depends on the global variable a
.There is another attribute,
const
, for functions without side effects that do not depend on global variables
Many functions do not examine any values except their arguments, and have no effects except the return value. Basically this is just slightly more strict class than the pure
attribute below, since function is not allowed to read global memory.
Decorating foo
with the const
attribute would optimize the function baz
above, but the compiler is still restricted in how it can move functions, even if they are known not to throw exceptions and are marked as const
— they may take arbitrarily long time to execute, so the compiler should not move a function call so that it is executed on a path where it was not executed in the original code.This means that the optimization opportunities of
pure
and const
functions are rather limited, and the possibility of throwing exceptions does not affect it that much — a pure
or const
function that throws an exception will always do it for the same input, so they can still be optimized by subexpression elimination, memoization, etc. The only extra restriction is that the first call of a function must not be moved over some other construct that can throw an exception.So GCC is overly conservative when handling the loop in
bar
— the optimization is valid even if foo
may throw — but most more complex cases cannot be handled. For example,int foo() __attribute__((pure)); int foo2(int) __attribute__((pure)); int bar2() { int acc = 0; int acc2 = 0; for (int i = 0; i < 100; i++) { acc2 += foo2(i); acc += foo(); } return acc + acc2; }cannot be optimized to the equivalent of
int bar2() { int acc2 = 0; for (int i = 0; i < 100; i++) { acc2 += foo2(i); } return 100 * foo() + acc2; }as this will give a different result if both
foo()
and foo2(1)
throws (while foo2(0)
does not throw). There are ways to handle some of these more complex cases (e.g. peeling the first iteration of the loop to get the first calls in the correct order, and then optimizing the rest of the loop), but this is not common enough that it is worth complicating the optimizers by special-casing the few cases that can be optimized.
In many cases, a guarantee of semantically-precise error trapping would seem simultaneously expensive and unnecessary. Optimization could be greatly enhanced by allowing programmers to specify regions of loose exception semantics: if one operations within such a region could not affect the other in any way other than by throwing an exception, a compiler may overlap the operations in any fashion it sees fit. In the event that two or more operations would throw an exception, the compiler could choose among them in Unspecified fashion.
ReplyDeleteSuch optimizations should generally be limited to programmer-designated regions, though it may be useful to have a (non-conforming) command-line option to automatically behave as though every function in a particular source file was thus marked. If a programmer wouldn't care which exception gets throws, or whether the throw prevents the execution of some other code which would have no side-effects, reducing code efficiency for the purpose of upholding such semantics would be wasteful. Allowing the programmer to waive such semantics would eliminate such waste.