Friday, December 9, 2016

GCC __attribute__((pure)) and C++ exceptions

John Regehr tweeted the example
int foo(void) __attribute__((pure));

int bar(void) {
  int acc = 0;
  for (int i = 0; i < 100; ++i)
    acc += foo();
  return acc;
}
that is optimized to
_Z3barv:
    pushq   %rax
    callq   _Z3foov
    imull   $100, %eax, %eax
    popq    %rcx
    retq
by clang, but is not optimized by GCC. The obvious question is "why is it not optimized?"

GCC does actually optimize this if it is compiled as C, but not if compiled as C++. The reason is that GCC allows pure functions to throw exceptions in C++ which prevents some optimizations — bar is optimized as expected in C++ too if foo is marked as noexcept, or if exceptions are disabled by passing the command-line option -fno-exceptions to the compiler.

There are additional restrictions on how pure functions may be optimized. The description of the pure attribute in the GCC manual says
Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.
The possible use of global variables limits how calls to the pure functions may be moved. For example,
int foo(void) __attribute__((pure));
int a;

int baz(void)
{
    int acc = 0;
    acc += foo();
    a = 0;
    acc += foo();
   return acc;
}
cannot be optimized to the equivalent of
int baz(void)
{
    a = 0;
    return 2 * foo();
}
as it is possible that the result of foo() depends on the global variable a.

There is another attribute, const, for functions without side effects that do not depend on global variables
Many functions do not examine any values except their arguments, and have no effects except the return value. Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
Decorating foo with the const attribute would optimize the function baz above, but the compiler is still restricted in how it can move functions, even if they are known not to throw exceptions and are marked as const — they may take arbitrarily long time to execute, so the compiler should not move a function call so that it is executed on a path where it was not executed in the original code.

This means that the optimization opportunities of pure and const functions are rather limited, and the possibility of throwing exceptions does not affect it that much — a pure or const function that throws an exception will always do it for the same input, so they can still be optimized by subexpression elimination, memoization, etc. The only extra restriction is that the first call of a function must not be moved over some other construct that can throw an exception.

So GCC is overly conservative when handling the loop in bar — the optimization is valid even if foo may throw — but most more complex cases cannot be handled. For example,
int foo() __attribute__((pure));
int foo2(int) __attribute__((pure));

int bar2() 
{
    int acc = 0;
    int acc2 = 0;
    for (int i = 0; i < 100; i++) {
        acc2 += foo2(i);
        acc += foo();
    }
    return acc + acc2;
}
cannot be optimized to the equivalent of
int bar2() 
{
    int acc2 = 0;
    for (int i = 0; i < 100; i++) {
        acc2 += foo2(i);
    }
    return 100 * foo() + acc2;
}
as this will give a different result if both foo() and foo2(1) throws (while foo2(0) does not throw). There are ways to handle some of these more complex cases (e.g. peeling the first iteration of the loop to get the first calls in the correct order, and then optimizing the rest of the loop), but this is not common enough that it is worth complicating the optimizers by special-casing the few cases that can be optimized.

1 comment:

  1. In many cases, a guarantee of semantically-precise error trapping would seem simultaneously expensive and unnecessary. Optimization could be greatly enhanced by allowing programmers to specify regions of loose exception semantics: if one operations within such a region could not affect the other in any way other than by throwing an exception, a compiler may overlap the operations in any fashion it sees fit. In the event that two or more operations would throw an exception, the compiler could choose among them in Unspecified fashion.

    Such optimizations should generally be limited to programmer-designated regions, though it may be useful to have a (non-conforming) command-line option to automatically behave as though every function in a particular source file was thus marked. If a programmer wouldn't care which exception gets throws, or whether the throw prevents the execution of some other code which would have no side-effects, reducing code efficiency for the purpose of upholding such semantics would be wasteful. Allowing the programmer to waive such semantics would eliminate such waste.

    ReplyDelete