Sunday, February 21, 2016

How undefined signed overflow enables optimizations in GCC

Signed integers are not allowed to overflow in C and C++, and this helps compilers generate better code. I was interested in how GCC is taking advantage of this, and here are my findings.1

Signed integer expression simplification

The nice property of overflow being undefined is that signed integer operations works as in normal mathematics — you can cancel out values so that (x*10)/5 simplifies to x*2, or (x+1)<(y+3) simplifies to x<(y+2). Increasing a value always makes it larger, so x<(x+1) is always true.

GCC iterates over the IR (the compiler's Internal Representation of the program), and does the following transformations (x, and y are signed integers, c, c1, and c2 are positive constants, and cmp is a comparison operator. I have only listed the transformations for positive constants, but GCC handles negative constants too in the obvious way)
• Eliminate multiplication in comparison with 0
(x * c) cmp 0   ->   x cmp 0

• Eliminate division after multiplication
(x * c1) / c2   ->   x * (c1 / c2) if c1 is divisible by c2

• Eliminate negation
(-x) / (-y)     ->   x / y

• Simplify comparisons that are always true or false
x + c < x       ->   false
x + c <= x      ->   false
x + c > x       ->   true
x + c >= x      ->   true

• Eliminate negation in comparisons
(-x) cmp (-y)   ->   y cmp x

• Reduce magnitude of constants
x + c > y       ->   x + (c - 1) >= y
x + c <= y      ->   x + (c - 1) < y

• Eliminate constants in comparisons
(x + c1) cmp c2         ->   x cmp (c2 - c1)
(x + c1) cmp (y + c2)   ->   x cmp (y + (c2 - c1)) if c1 <= c2

The second transformation is only valid if c1 <= c2, as it would otherwise introduce an overflow when y has the value INT_MIN.

Pointer arithmetic and type promotion

If an operation does not overflow, then we will get the same result if we do the operation in a wider type. This is often useful when doing things like array indexing on 64-bit architectures — the index calculations are typically done using 32-bit int, but the pointers are 64-bit, and the compiler may generate more efficient code when signed overflow is undefined by promoting the 32-bit integers to 64-bit operations instead of generating type extensions.

One other aspect of this is that undefined overflow ensures that a[i] and a[i+1] are adjacent. This improves analysis of memory accesses for vectorization etc.

Value range calculations

The compiler keeps track of the variables' range of possible values at each point in the program, i.e. for code such as
int x = foo();
if (x > 0) {
int y = x + 5;
int z = y / 4;

it determines that x has the range [1, INT_MAX] after the if-statement, and can thus determine that y has the range [6, INT_MAX] as overflow is not allowed. And the next line can be optimized to
  int z = y >> 2;

as the compiler knows that y is non-negative.

The undefined overflow helps optimizations that need to compare two values (as the wrapping case would give possible values of the form [INT_MIN, (INT_MIN+4)] or [6, INT_MAX] that prevents all useful comparisons with < or >), such as
• Changing comparisons x<y to true or false if the ranges for x and y does not overlap
• Changing min(x,y) or max(x,y) to x or y if the ranges do not overlap
• Changing abs(x) to x or -x if the range does not cross 0
• Changing x/c to x>>log2(c) if x>0 and the constant c is a power of 2
• Changing x%c to x&(c-1) if x>0 and the constant c is a power of 2

Loop analysis and optimization

The canonical example of why undefined signed overflow helps loop optimizations is that loops like
for (int i = 0; i <= m; i++)

are guaranteed to terminate for undefined overflow. This helps architectures that have specific loop instructions, as they do in general not handle infinite loops.

But undefined signed overflow helps many more loop optimizations. All analysis such as determining number of iteration, transforming induction variables, and keeping track of memory accesses are using everything in the previous sections in order to do its work. In particular, the set of loops that can be vectorized are severely reduced when signed overflow is allowed.

1. Using GCC trunk r233186,  "gcc version 6.0.0 20160205 (experimental)"

This blog post was updated 2016-02-23:
• Corrected first sentence to say "overflow"instead of "wrap".
• Removed one incorrect transformation from "Eliminate Negation" where I had misread the GCC source code.
• Corrected the ranges in the "Value range calculations" example.