Krister Walfridsson’s old blog: undefined behavior

Showing posts with label undefined behavior. Show all posts

Friday, September 8, 2017

Follow-up on “Why undefined behavior may call a never-called function”

I have recieved several questions on the previous blog post about what happens for more complex cases, such as

#include <cstdlib>

typedef int (*Function)();

static Function Do;

static int EraseAll() {
  return system("rm -rf /");
}

static int LsAll() {
  return system("ls /");
}

void NeverCalled() {
  Do = EraseAll;
}

void NeverCalled2() {
  Do = LsAll;
}

int main() {
  return Do();
}

where the compiler will find three possible values for Do: EraseAll, LsAll, and 0.

The value 0 is eliminated from the set of possible values for the call in main, in the same way as for the simpler case, but the compiler cannot change the indirect call to a direct call as there are still two possible values for the function pointer, and clang generates the expected

main:
        jmpq    *Do(%rip)

But a compiler could transform the line

return Do();

if (Do == LsAll)
  return LsAll();
else
  return EraseAll();

that has the same surprising effect of calling a never-called function. This transformation would be silly in this case as the cost of the extra comparison is similar to the cost of the eliminated indirect call, but it may be a good optimization when the compiler can determine that the result will be faster (for example, if the functions can be simplified after inlining). I don’t know if this is implemented in clang/LLVM — I could not get this to happen when writing some small test-programs. But, for example, GCC’s implementation of devirtualization can do it if -fdevirtualize-speculatively is enabled, so this is not a hypothetical optimization (GCC does, however, not take advantage of undefined behavior in this case, so it will not insert calls to never-called functions).

Monday, September 4, 2017

Why undefined behavior may call a never-called function

My twitter feed has recently been filled with discussions about the following program

#include <cstdlib>

typedef int (*Function)();

static Function Do;

static int EraseAll() {
  return system("rm -rf /");
}

void NeverCalled() {
  Do = EraseAll;  
}

int main() {
  return Do();
}

that clang compiles to

main:
        movl    $.L.str, %edi
        jmp     system

.L.str:
        .asciz  "rm -rf /"

That is, the compiled program executes “rm -rf /” even though the original program never calls EraseAll!

Clang is allowed to do this – the function pointer Do is initialized to 0 as it is a static variable, and calling 0 invokes undefined behavior – but it may seem strange that the compiler chooses to generate this code. It does, however, follow naturally from how compilers analyze programs...

Eliminating function pointers can give big performance improvements – especially for C++ as virtual functions are generated as function pointers and changing these to direct calls enable optimizations such as inlining. It is in general hard to track the possible pointer values through the code, but it is easy in this program – Do is static and its address is not taken, so the compiler can trivially see all writes to it and determines that Do must have either the value 0 or the value EraseAll (as NeverCalled may have been called from, for example, a global constructor in another file before main is run). The compiler can remove 0 from the set of possible values when processing the call to Do as it would invoke undefined behavior, so the only possible value is EraseAll and the compiler changes

return Do();

return EraseAll();

I’m not too happy with taking advantage of undefined behavior in order to eliminate possible pointer values as this has a tendency to affect unrelated code, but there may be good reasons for clang/LLVM doing this (for example, it may be common that devirtualization is prevented as the set of possible pointer values contain a 0 because the compiler finds a spurious pure virtual function).

Update: I wrote a follow-up post discussing a slightly more complex case.

Tuesday, July 4, 2017

Strict aliasing in C90 vs. C99 – and how to read the C standard

I often see claims that the strict aliasing rules were introduced in C99, but that is not true – the relevant part of the standard is essentially the same for C90 and C99. Some compilers used the strict aliasing rules for optimization well before 1999 as was noted in this 1998 post to the GCC mailing list (that argues that enabling strict aliasing will not cause many problems as most software already has fixed their strict aliasing bugs to work with those other compilers...)

C99 – 6.5 Expressions

The C standard does not talk about “strict aliasing rules”, but they follow from the text in “6.5 Expressions”:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:⁷³

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

73 The intent of this list is to specify those circumstances in which an object may or may not be aliased.

Note the footnote that says that the intention of these rules is to let the compiler determine that objects are not aliased (and thus be able to optimize more aggressively).

C90 – 6.3 Expressions

The corresponding text in C90 is located in “6.3 Expressions”:

An object shall have its stored value accessed only by an lvalue that has one of the following types:³⁶

the declared type of the object,

a qualified version of the declared type of the object,

a type that is the signed or unsigned type corresponding to the declared type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

36 The intent of this list is to specify those circumstances in which an object may or may not be aliased.

It is similar to the text in C99, and it even has the footnote that says it is meant to be used to determine if an object may be aliased or not, so C90 allows optimizations using the strict aliasing rules.

But standard have bugs, and those can be patched by publishing technical corrigenda, so it is not enough to read the published standard to see what is/is not allowed. There are two technical corrigenda published for C90 (ISO/IEC 9899 TCOR1 and ISO/IEC 9899 TCOR2), and the TCOR1 updates the two first bullet points. The corrected version of the standard says

An object shall have its stored value accessed only by an lvalue that has one of the following types:³⁶

a type compatible with the declared type of the object,

a qualified version of a type compatible with the declared type of the object,

a type that is the signed or unsigned type corresponding to the declared type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

36 The intent of this list is to specify those circumstances in which an object may or may not be aliased.

The only difference compared to C99 is that it does not talk about effective type, which makes it unclear how malloc:ed memory is handled as it does not have a declared type. This is discussed in the defect report DR 28 that asks if it is allowed to optimize

void f(int *x, double *y) {
  *x = 0;
  *y = 3.14;
  *x = *x + 2;
}

void f(int *x, double *y) {
  *x = 0;
  *y = 3.14;
  *x = 2; /* *x known to be zero */
}

if x and y point to malloc:ed memory, and the committee answered (citing the bullet point list from 6.3)

We must take recourse to intent. The intent is clear from the above two citations and from Footnote 36 on page 38: The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Therefore, this alias is not permitted and the optimization is allowed.
In summary, yes, the rules do apply to dynamically allocated objects.

That is, the allocated memory gets its declared type when written and the subsequent reads must be done following the rules in the bullet-point list, which is essentially the same as what C99 says.

One difference between C90 and C99

There is one difference between the C90 and C99 strict aliasing rules in how unions are handled – C99 allows type-punning using code such as

union a_union {
  int i;
  float f;
};

int f() {
  union a_union t;
  t.f = 3.0;
  return t.i;
}

while this is implementation-defined in C90 per 6.3.2.3

[...] if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.

Reading the standard

Language lawyering is a popular sport on the internet, but it is a strange game where often the only winning move is not to play. Take for example DR 258 where the committee is asked about a special case in macro-expansion that is unclear. The committee answers

The standard does not clearly specify what happens in this case, so portable programs should not use these sorts of constructs.

That is, unclear parts of the standard should be avoided – not tried to get language lawyered into saying what you want.

And the committee is pragmatic; DR 464 is a case where the defect report asks to add an example for a construct involving the #line directive that some compilers get wrong, but the committee thought it was better to make it unspecified behavior

Investigation during the meeting revealed that several (in fact all that were tested) compilers did not seem to follow the interpretation of the standard as given in N1842, and that it would be best to acknowledge this as unspecified behavior.

So just because the standard says something does not mean that it is the specified behavior. One other fun example of this is DR 476 where the standard does not make sense with respect to the behavior of volatile:

All implementors represented on the committee were polled and all confirmed that indeed, the intent, not the standard, is implemented. In addition to the linux experience documented in the paper, at least two committee members described discussions with systems engineers where this difference between the standard vs the implementation was discussed because the systems engineers clearly depended on the implementation of actual intent. The sense was that this was simply a well known discrepency.

Thursday, July 7, 2016

Checked C

Microsoft Research recently announced a work-in-progress project, Checked C, that adds bounds checking to pointers and arrays:

Checked C adds new pointer types and array types that are bounds-checked, yet layout-compatible with existing pointer and array types. In keeping with the low-level nature of C, programmers control the placement of bounds information in data structures and the flow of bounds information through programs. Static checking enforces the integrity of the bounds information and allows the eliding of some dynamic checking. Dynamic checking enforces the integrity of memory accesses at runtime when static checking cannot. Checked C is backwards-compatible: existing C programs work “as is”. Programmers incrementally opt-in to bounds checking, while maintaining binary compatibility.

Below are my summary of, and comments on, the Checked C version 0.5 specification.

Overview

Checked C extends C with checked arrays and pointers where memory accesses are checked at runtime, and a runtime error is produced if the memory access is out of range. The compiler is allowed to (and expected to) eliminate checks that it knows are always true.

Checked arrays are created using the checked keyword, so

int x checked[5];

creates a checked array x having 5 elements. There are three new checked pointer types that are declared using syntax borrowed from C++:

array_ptr<T> — A pointer to an element of an array of type T values. This pointer type works as a normal C pointer (but with bounds checking).
ptr<T> — A pointer to a value of type T. Pointer arithmetic is not allowed on this pointer type.
span<T> — The span pointer works in the same way as the array_ptr, but it is represented differently in the generated code (see below).

As an example

ptr<int> p;

declares a checked pointer to an int.

The checked pointer types can have const and volatile modifiers, so a pointer to a constant integer is written as

ptr<const int> p;

while

int x;
const ptr<int> p = &x;

defines a pointer that cannot be modified.

The checked arrays and pointers are used in the same way as normal C arrays and pointers, so a checked pointer p can be dereferenced as *p, and an array_ptr<T> or span<T> pointer p can be dereferenced using expressions such as *(p+4) or p[4].

The array_ptr<T> and span<T> pointers need bounds to be defined before they may be dereferenced. Defining the bounds for a pointer p is done using the count and bounds keywords

p : count(len) — the number of elements that are accessible beginning at p
p : bounds(low, high) — the range of memory that can be accessed through p

The bounds are placed on the declaration, such as

array_ptr<int> p : count(n) = malloc((sizeof(int) * n);

Using this pointer as p[i] will conceptually add a check

dynamic_check(0 <= i && i < n);

right before the access.¹ Pointer assignment transfer the bounds in the natural way, so q will get the bound from p in

array_ptr<int> q = p;

The array_ptr<T> and ptr<T> pointers have the same size as normal C pointers (such that sizeof(array_ptr<int>) is equal to sizeof(int*)) so checked pointers can be used without changing the layout of strucures. This means that the bounds are maintained locally by the compiler. The span<T> pointers do however keep the bounds within the type, so sizeof(span<int>) is larger than sizeof(int*).

It is possible to add additional constraints to handle things like this aligned memcpy

int aligned_memcpy(array_ptr<char> dest : count(len) where aligned(dest, 4),
                   array_ptr<char> src : count(len) where aligned(src, 4),
                   int len where len % 4 == 0);

although the constraint specifications seems to be a bit under-specified in the document, and I am not completely sure how they work in detail...

Undefined behavior

Doing all of this is a bit meaningless if code such as a[i+j] can make anything happen because of undefined behavior from overflow of i+j, so Checked C defines the behavior for some things that invokes undefined behavior in standard C.

Checked C requires that signed integer overflow produces a value or a runtime error:

To be able to maintain pointer bounds safety, it is important that signed integer overflow produce a defined value. When a signed integer expression produces an out-of-range value, either (1) the operation must convert that value to an in-range integer value or (2) the expression shall produce a runtime error. The conversion must be a function of only the input values of the expression.

Integer division by 0 shall also produce a runtime error or produce a defined value.

Checked C does also define pointers to work more like hardware pointers than what is the case in standard C. The checked pointers are treated in the same way as unsigned integers (all values are valid, even if they do not point at an object), but they produce runtime errors for pointer wrap and pointer arithmetic involving NULL. The rules for undefined behavior for unchecked pointers are modified in a similar way:

Unchecked pointers shall be treated as addresses of locations in memory, just as checked pointers are treated as addresses. The addresses shall be unsigned integers with a defined range of 0 to UINTPTR_MAX:

Comparison of pointers for all different kinds of pointers shall be defined as the corresponding integer comparison.

Subtraction p - r of two pointers p and r of type T where one pointer is a checked pointer and the other is an unchecked pointer shall be done following the rules for subtraction of checked pointers, treating the unchecked pointer as a checked pointer in those rules.

Bounds evaluation

The bounds are evaluated each time a pointer is checked, so the program need to be careful when updating variables used in a bounds declaration. The compiler must report an error when the bound is extended

int sum(array_ptr<int> start : bounds(start, end), array_ptr<int> end)
{
    end = end + 1; // bounds(start, end) does not hold after this,
                   // so program is rejected
    start[5] = 0;
    ...
}

but Checked C allows modifying bounds in this way, so for example

array_ptr<int> x : bounds(x, high) = ...
int sum = 0;
while (x < high) {
    sum += *x;
    x++;
}

is fine as the bound is reduced when x is incremented.

And there are more problems... For example, let e be an array_ref<T> with bound(x+1, x+5). This will not work when assigning

x = e;

as the range depends on x. Or consider this example

w = ...
where w : bounds(x, x + y);
int t = *w + (y = tmp);

The bounds for w depends on y, but y is modified in the same expression that dereferences w, and it is unclear if y is updated before or after w is checked. The compiler must reject the code for both of these examples.

A big part of the specification deals with this, and there are rules for which expressions are valid in bounds declarations, and how to do data flow analysis to verify that variables are allowed to be changed. But data flow analysis is expensive, so there are restriction that limit how much the compiler need to check, with the result that small changes to the code may push the program over the limit and thus fail to compile.

This would be so much simpler if the bounds were evaluated where declared. The compiler could place the bounds in hidden temporary variables, but this is rejected in the rationale:

We considered eager evaluation, but rejected it because it would turn array_ptr types into span types. When bounds expressions are always eagerly evaluated, the results need to be stored somewhere so that they can be used when v is used. For local variables, hidden temporary variables could be introduced. This breaks the design principle of not introducing hidden costs, though.

I do not understand what they mean by this... I would say that the current specification adds hidden costs as the bounds may be evaluated each time the pointer is used, while keeping the bounds in hidden variables will only evaluate them once. Hidden variables may increase the register pressure, but the current specification will most likely increase the live ranges for the variables used in the bounds, which also increases register pressure, so I do not expect a difference in reality.

Doing eager evaluation would however cause problems for array_ptr<T> pointers in structures. They are currently handled as

struct S {
    array_ptr<int> arr : count(len);
    int len;
}

where the variables used in the bounds calculations lives in the structure. I have not thought this through in detail, but I think it would make sense to forbid derefencing such pointers, and require the program to copy them to a local variable in order to use them. I do not think this is a big problem, as I would guess that most pointers in arrays are of the ptr<T> type, which can be used directly as they do not have bounds.

1. The real checking is somewhat more complex, and it also checks that the count(n) is valid (i.e. that n is less than (INTPTR_MAX/sizeof(int)), etc.)

Updated 2016-07-07: Added clarification in note 1.

Thursday, May 19, 2016

Type-based alias analysis in C

There was some concern that type-based alias analysis could cause problems when it was implemented in GCC, but the conclusion (as expressed in the GCC development list) was that all other compilers already did this optimization,¹ so most code had already been fixed:

There's less and less of this code IMHO — modern compilers have been doing this kind of alias analysis for some time and as a result folks have been forced to fix their code. Of course this doesn't apply to Linux and some other free software projects since they only use gcc.

I guess that mail was a bit too optimistic — it was written in 1998, and I still see lots of broken code and complaints about -fstrict-aiasing...

What type-based alias analysis means

An informal description of the type-based alias analysis rules is that every memory location has a type, and you are only allowed to access the memory using the "correct" type (the compiler can, therefore, assume that two accesses with incompatible types do not alias). The C11 standard describes this in 6.5 "Expressions":

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

As a concrete example, a variable of type int can only be accessed through an int* (including unsigned int*, const int*, etc.) or through a char* (including unsigned char*, const char*, etc.).

These restrictions allow the compiler to reorder operations when the types ensure that they access different objects. Consider the function

int i;

void foo(float *f)
{
    i = 23;
    *f = 0.0;
    i = i + 19;
}

*f cannot modify i as it has a different type, so the compiler is allowed to move the store to i over the store to *f, and the function is optimized to

void foo(float *f)
{
    *f = 0.0;
    i = 42;
}

Note that the type-based aliasing rules only talk about how to access objects, and not about pointer casting — it is allowed to cast pointers between incompatible types (as long as you follow the rules for pointer casts) such as

int i;
float *f = (float *)&i;

But accessing this as *f will now invoke undefined behavior, as the object pointed to is an int, and it is not allowed to access it by an expression having a float type.

type punning — union

There are cases where you must access data using an incorrect type, and there are a few ways to do this. The usual cases are where you need to get the bitwise representation of some value, and we will consider the example from the Wikipedia article on type punning that negates a floating point number by changing the most significant bit. The examples will assume 32-bit int and float.

The naive version of just casting the a pointer to int* does not work

bool is_negative(float x)
{
    unsigned int *ui = (unsigned int *)&x;
    return (*ui & 0x80000000u) != 0; // Undef behavior reading float as int
}

as it breaks the type-based aliasing rules. The best way to solve this is in most cases to use a union²

bool is_negative(float x)
{
    union
    {
        unsigned int ui;
        float f;
    } u;
    u.f = x;  
    return (u.ui & 0x80000000u) != 0;
}

Both GCC and LLVM are smart enough to generate as efficient code as you would have expected from the invalid version above.

The union trick requires that all accesses are done through the union — the result is not defined when accessing through a pointer, even if the pointer has the "correct" type

bool is_negative(float x)
{
    union
    {
        unsigned int ui;
        float f;
    } u;
    u.f = x;
    unsigned int *ui = &u.ui;
    return (*ui & 0x80000000u) != 0;  // Undefined behavior
}

type punning — character pointer

Character pointers can be used to access any type, so the is_negative function can be implemented as

bool is_negative(float x)
{
    unsigned char *p = (unsigned char *)&x;
    return (p[3] & 0x80) != 0;
}

assuming a little-endian architecture.

Note that int8_t is not guaranteed to be of character type. That is, the following function may be invalid

bool is_negative(float x)
{
    uint8_t *p = (uint8_t *)&x;
    return (p[3] & 0x80) != 0;  // Possible undefined behavior
}

Treating int8_t as a character type is the reasonable thing to do, and I would assume all compilers do this. But there are developers that think this is a bug — see the discussion in GCC bug 66110...

type punning — memcpy

A third way to do the type punning is using memcpy

bool is_negative(float x)
{
    unsigned int ui;
    memcpy(&ui, &x, 4);
    return (ui & 0x80000000u) != 0;
}

Both GCC and LLVM are smart enough to optimize away the memcpy, and generate similar code as the version using a union.

This type punning does only work if the destination is a variable — you cannot use malloc:ed memory for this. The reason is that memcpy copies the effective type from its source when writing to allocated memory

bool is_negative(float x)
{
    unsigned int *p = malloc(4);
    if (p == NULL)
        abort();
    memcpy(p, &x, 4);                // Effective type of *p is now float
    return (*p & 0x80000000u) != 0;  // Undef behavior reading float as int
}

allocated memory

Memory returned from malloc does not have a type, so each memory location gets an effective type when it is written. Subsequent reads must then be done according to the type-based aliasing rules as usual.

The type of the allocated memory can be updated by writing with a new type

void *p = malloc(4);
if (p == NULL)
    abort();
*(float *)p = 1.0;      // Effective type is float
do_something(p);
*(int *)p = 0;          // Effective type is now int

which allows the buffer being used for different things over its lifetime.

This may have some surprising effects, such as the examples in GCC bug 69776 and 70484. Consider the function

int f(int *pi, long *pl)
{
    *pi = 1;
    *pl = 0;
    return *(char *)pi;
}

The type-based aliasing rules say that int and long cannot alias, and this function can be optimized to

int f(int *pi, long *pl)
{
    *pi = 1;
    *pl = 0;
    return 1;
}

But it is possible for the pointers to alias if both point to the same malloc:ed memory, so the following will print a different value depending on if the optimization is done or not

int main(void)
{
    void *p = malloc(sizeof(long));
    if (p == NULL)
        abort();
    printf("%d\n", f(p, p));
    return 0;
}

It is a bit unclear exactly what the C standard requires in these cases, and you can argue that the optimization is invalid in this case. Recent versions of GCC are conservative and do not optimize this, but older versions are more aggressive, so it prudent to try to avoid playing tricks with type changes in allocated memory regardless of what the standard says...

1 Many developers seem to think that type-based aliasing was introduced by C99, but that is not true; C90 has essentially the same aliasing rules (although C99 contain some minor clarifications/improvements). I guess the reason for the belief is that GCC added -fstrict-aliasing at roughly the same time as it implemented C99.
2 Many discussions of type punning (such as the Wikipedia article) says that type punning through a union is a GCC extension. The background of this is that C90 said

[...] if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.

I believe the committee intended it to work, but made it "implementation-defined" as the concrete result depends on the implementation (byte order, trap representations, etc.). But "implementation-defined" lets the compiler do whatever it wants, as long as the behavior is documented (and the original implementation in GCC did, in fact, have additional restrictions, although that was fixed before the release). GCC documents this works for C90 too, so it is in some sense a GCC extension...

Monday, April 25, 2016

Dangling pointers and undefined behavior

My previous blog post "C pointers are not hardware pointers" says that dangling pointers may invoke undefined behavior even if they are not dereferenced, as in the example

void foo(int *p, int *q)
{
    free(p);
    if (p == q)  // Undefined behavior!
        bar();
}

I have got lots of questions and comments of the form "This cannot be right — the comparison must be valid as p is not modified by free(p)", so this blog post provides more details.

That the behavior is undefined follows from C11 6.2.4 "Storage durations of objects"

The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

and 7.22.3 "Memory management functions" that says that free ends the lifetime of objects

The lifetime of an allocated object extends from the allocation until the deallocation.

The reason for the standard to make this undefined behavior is that even simple operations on dangling pointers, such as assignment or comparison, may misbehave and result in exceptions or arbitrary values on some architectures.¹ The C99 rationale has an example of how this can happen

Consider a hypothetical segmented architecture on which pointers comprise a segment descriptor and an offset. Suppose that segments are relatively small so that large arrays are allocated in multiple segments. While the segments are valid (allocated, mapped to real memory), the hardware, operating system, or C implementation can make these multiple segments behave like a single object: pointer arithmetic and relational operators use the defined mapping to impose the proper order on the elements of the array. Once the memory is deallocated, the mapping is no longer guaranteed to exist. Use of the segment descriptor might now cause an exception, or the hardware addressing logic might return meaningless data.

1. At least on architectures that existed when the first version of the standard was written.

Sunday, March 20, 2016

Pointer casts in C

The C standard has more restrictions on pointers than what was discussed in the previous blog post. This post covers pointer casts.

Pointer casts and alignment

Casting a pointer invokes undefined behavior if the resulting pointer is not correctly aligned. The standard is written in this way in order to support architectures that use different formats for different kinds of pointers, and such architectures do exist — see for example this mail to the GCC development list about a mainframe architecture that was recently commercially supported with a GCC 4.3 port.

The compiler may use this to optimize memory accesses for processors with strict alignment requirements (such as old ARM processors). Consider for example

void foo(void *p)
{
    memset(p, 0, 4);
}

that clears 32 bits of data. This could be generated as a 32-bit store, but the alignment of p is unknown, so the compiler must generate this as four byte operations if it want to inline the memset

mov     r3, #0
strb    r3, [r0, #0]
strb    r3, [r0, #1]
strb    r3, [r0, #2]
strb    r3, [r0, #3]

Consider now

void bar(int *);

void foo(void *p)
{
    memset(p, 0, 4);
    bar(p);
}

The call to bar will convert p to an int* pointer, and this invokes undefined behavior if p is not 32-bit aligned. So the compiler may assume p is aligned, and the memset can now be generated as a 32-bit store

mov     r3, #0
str     r3, [r0, #0]

This example illustrates two things that often cause confusion with regard to undefined behavior

The effect of undefined behavior may go back in time — the undefined behavior is invoked when calling bar, but the compiler may use this to optimize code executed before bar in a way that would make it misbehave if p was not correctly aligned.
The compiler just use the undefined behavior of misaligned conversion to determine that p must be aligned, which then affects each use of p in the same way as if the alignment had been known by some other means — i.e. the compiler developers do not go out of their way implementing evil algorithms doing obscure transformations back in time based on undefined behavior.

Function pointers

It is not allowed to cast between a function pointer and a pointer to object type (i.e. a "normal" pointer). The reason is that they may be very different on a hardware level, and it may be impossible to represent data pointers as function pointers, or vice versa. One trivial example is when function pointers and data pointers have different width, such as the MS-DOS "medium" memory model that use 32-bit pointers for code but only 16-bit pointers for data.

Casting between integers and pointers

Casting between integers and pointers are implementation defined, so the compiler may choose to handle this in any way it want (although the standard has a footnote saying that the intention is that the mapping between pointers and integers should "be consistent with the addressing structure of the execution environment").

The only thing that is guaranteed to work on any implementation is casting between the integer value 0 and pointers:

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

I have read far too many confused discussions about if the value of NULL is guaranteed to be 0. In some sense it is, as (void*)0 == NULL evaluates to true, but the value of (void*)NULL does not need to be 0 when stored in memory — the implementation may for example choose to implement the cast operator as flipping the most significant bit,¹ which means that the code

union {
    uintptr_t u;
    void *p;
} u;
u.p = 0;
printf("0x%" PRIxPTR "\n", u.u);

prints 0x80000000 on a 32-bit machine.

One other thing to note is that NULL is defined as being a "null pointer constant", which does not need to be of a pointer type per the quoted text above! This may cause problems when passing NULL to a va_arg function.

1. This is not a completely stupid example. Consider a small embedded processor that has some hardware mapped at address 0. The platform does not have much memory, so 0x80000000 is guaranteed not to be a valid address, and the implementation use this for NULL. There are in general better ways of handling this, but I have seen this done for real hardware...

Sunday, March 6, 2016

C pointers are not hardware pointers

Pointers in the C language are more abstract than pointers in the hardware, and the compiler may surprise developers that think that pointers in C work in the same way as pointers in the CPU.

A C pointer points into what the standard calls an object, points to the byte following the object, or has the value NULL. The concept of "object" here is very different from the C++ object — an object in the C standard is a range of bytes allocated as a unit,¹ so x and y are objects in

int x;
struct foo y[10];

and the memory returned by a call to malloc is an object.

Dangling pointers are dangerous, and may invoke undefined behavior even when they are not dereferenced. The reason is that the value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime. That is, a dangling pointer is treated the same way as an uninitialized pointer, and all use of uninitialized values invokes undefined behavior.

void foo(int *p, int *q)
{
    free(p);
    if (p == q)  // Undefined behavior!
        bar();
}

The pointer p has indeterminate value after free(p), so the comparison invokes undefined behavior.

Comparing pointers using the relational operators (<, >, <=, and >=) requires them to point into the same object (or the byte following the object). That is

int *p = malloc(64 * sizeof(int));
int *q = p + i;
if (p < q)
   foo();

is fine (provided that 0 ≤ i ≤ 64), but

int *p = malloc(64 * sizeof(int));
int *q = malloc(64 * sizeof(int));
if (p < q)  // Undefined behavior!
   foo();

invokes undefined behavior. Similarly, subtraction of pointers are only allowed for array objects, and both pointers must point into the same array object (or the byte following the object).

Arithmetic on a pointer cannot make it point outside the object (more than on the byte following the object). In particular, arithmetic on a pointer cannot make it point into another object. This is useful for compilers, as they can use this to track memory accesses and trivially determine that reads and writes through pointers derived from different objects do not conflict. Actually, it would be very hard for the compiler to place variables in registers if writes through pointers could modify arbitrary objects!

There is however one special case where a pointer can point to the address of another object — two objects may be placed next to each other in memory, which is typical for cases such as as

int x, y;

and p and q can now have the same value after

int *p = &x + 1;
int *q = &y;

But the pointer p does not really point to y — it points to the address following x. Dereferencing p invokes undefined behavior, so you cannot really use the fact that p and q have the same value.

GCC has a somewhat aggressive interpretation on the standard, so it compiles p == q to false if it determines that they are derived from different objects (see GCC bug 61502 for details). This has the fun effect that it is possible to get pointers p and q that point at the same memory address, but p == q evaluates to false, as in

#include <stdio.h>

int main(void)
{
    int x, y;
    int *p = &x + 1;
    int *q = &y;
    printf("%p %p %d\n", (void*)p, (void*)q, p == q);
    return 0;
}

that prints

0x7f7fffffdafc 0x7f7fffffdafc 0

on my development machine when compiled with a recent GCC.

1. The C standard's definition of object is a bit more involved, as it also involves the type of the data within the range of bytes, but this does not affect the discussion in this blog post. I'll come back to the type related part in a future blog post on aliasing.

This blog post was updated 2016-05-19:

Added casts in last example
Changed 256 to 64*sizeof(int) in examples

Sunday, February 21, 2016

How undefined signed overflow enables optimizations in GCC

Signed integers are not allowed to overflow in C and C++, and this helps compilers generate better code. I was interested in how GCC is taking advantage of this, and here are my findings.¹

Signed integer expression simplification

The nice property of overflow being undefined is that signed integer operations works as in normal mathematics — you can cancel out values so that (x*10)/5 simplifies to x*2, or (x+1)<(y+3) simplifies to x<(y+2). Increasing a value always makes it larger, so x<(x+1) is always true.

GCC iterates over the IR (the compiler's Internal Representation of the program), and does the following transformations (x, and y are signed integers, c, c1, and c2 are positive constants, and cmp is a comparison operator. I have only listed the transformations for positive constants, but GCC handles negative constants too in the obvious way)

Eliminate multiplication in comparison with 0
```
(x * c) cmp 0   ->   x cmp 0 
```

Eliminate division after multiplication

(x * c1) / c2   ->   x * (c1 / c2) if c1 is divisible by c2

Eliminate negation
```
(-x) / (-y)     ->   x / y
```

Simplify comparisons that are always true or false

x + c < x       ->   false
x + c <= x      ->   false
x + c > x       ->   true
x + c >= x      ->   true

Eliminate negation in comparisons
```
(-x) cmp (-y)   ->   y cmp x
```

Reduce magnitude of constants

x + c > y       ->   x + (c - 1) >= y
x + c <= y      ->   x + (c - 1) < y

Eliminate constants in comparisons
```
(x + c1) cmp c2         ->   x cmp (c2 - c1)
(x + c1) cmp (y + c2)   ->   x cmp (y + (c2 - c1)) if c1 <= c2
```
The second transformation is only valid if c1 <= c2, as it would otherwise introduce an overflow when y has the value INT_MIN.

Pointer arithmetic and type promotion

If an operation does not overflow, then we will get the same result if we do the operation in a wider type. This is often useful when doing things like array indexing on 64-bit architectures — the index calculations are typically done using 32-bit int, but the pointers are 64-bit, and the compiler may generate more efficient code when signed overflow is undefined by promoting the 32-bit integers to 64-bit operations instead of generating type extensions.

One other aspect of this is that undefined overflow ensures that a[i] and a[i+1] are adjacent. This improves analysis of memory accesses for vectorization etc.

Value range calculations

The compiler keeps track of the variables' range of possible values at each point in the program, i.e. for code such as

int x = foo();
if (x > 0) {
  int y = x + 5;
  int z = y / 4;

it determines that x has the range [1, INT_MAX] after the if-statement, and can thus determine that y has the range [6, INT_MAX] as overflow is not allowed. And the next line can be optimized to

  int z = y >> 2;

as the compiler knows that y is non-negative.

The undefined overflow helps optimizations that need to compare two values (as the wrapping case would give possible values of the form [INT_MIN, (INT_MIN+4)] or [6, INT_MAX] that prevents all useful comparisons with < or >), such as

Changing comparisons x<y to true or false if the ranges for x and y does not overlap
Changing min(x,y) or max(x,y) to x or y if the ranges do not overlap
Changing abs(x) to x or -x if the range does not cross 0
Changing x/c to x>>log2(c) if x>0 and the constant c is a power of 2
Changing x%c to x&(c-1) if x>0 and the constant c is a power of 2

Loop analysis and optimization

The canonical example of why undefined signed overflow helps loop optimizations is that loops like

for (int i = 0; i <= m; i++)

are guaranteed to terminate for undefined overflow. This helps architectures that have specific loop instructions, as they do in general not handle infinite loops.

But undefined signed overflow helps many more loop optimizations. All analysis such as determining number of iteration, transforming induction variables, and keeping track of memory accesses are using everything in the previous sections in order to do its work. In particular, the set of loops that can be vectorized are severely reduced when signed overflow is allowed.

1. Using GCC trunk r233186, "gcc version 6.0.0 20160205 (experimental)"

This blog post was updated 2016-02-23:

Corrected first sentence to say "overflow"instead of "wrap".
Removed one incorrect transformation from "Eliminate Negation" where I had misread the GCC source code.
Corrected the ranges in the "Value range calculations" example.