Sunday, June 4, 2017

-fipa-pta

My previous blog post had a minimal description of -fipa-pta and I have received several questions about what it actually do. This blog post will try to give some more details...

Points-to analysis

Many optimizations need to know if two operations may access the same memory address. For example, the if-statement in
i = 5;
*p = -1;
if (i < 0)
do_something();
can be optimized away if *p cannot modify i.

GCC tracks what the pointers may point to using the general ideas from the paper “Efficient Field-sensitive pointer analysis for C”. I will not describe the details – the first few pages of the paper do it better than I can do here – but the principle is that each pointer is represented by a set of locations it may point to, the compiler is generating set constraints representing each statement in the program, and then solving those constraints to get the actual set of locations the pointer may point to.

But this process is expensive, so GCC is normally doing this one function at a time and assumes that called functions may access any memory visible to them.

-fipa-pta

The -fipa-pta optimization takes the bodies of the called functions into account when doing the analysis, so compiling
void __attribute__((noinline))
bar(int *x, int *y)
{
*x = *y;
}

int foo(void)
{
int a, b = 5;
bar(&a, &b);
return b + 10;
}
with -fipa-pta makes the compiler see that bar does not modify b, and the compiler optimizes foo by changing b+10 to 15
int foo(void)
{
int a, b = 5;
bar(&a, &b);
return 15;
}

A more relevant example is the “slow” code from the “Integer division is slow” blog post
std::random_device entropySource;
std::mt19937 randGenerator(entropySource());
std::uniform_int_distribution<int> theIntDist(0, 99);

for (int i = 0; i < 1000000000; i++) {
volatile auto r = theIntDist(randGenerator);
}
Compiling this with -fipa-pta makes the compiler see that theIntDist is not modified within the loop, and the inlined code can thus be constant-folded in the same way as the “fast” version – with the result that it runs four times faster.