Thursday, December 15, 2016

Pointer comparison — an invalid optimization in GCC

I wrote in an earlier blog post that GCC has a somewhat aggressive interpretation of the standard so that it compiles p == q to false if p and q are pointers derived from different objects. I'm now convinced GCC is wrong.

The problem

C does not permit pointers to point at any memory — a pointer must point to an address within an object or on the address immediately following the object, and pointer arithmetic cannot make it point outside that range. When comparing pointers, the C standard says (emphasis added)
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
That is, the comparison in the printf call
#include <stdio.h>

int main(void)
{
    int x[4], y[4];
    int *p = &x[4];
    int *q = &y[0];
    printf("%p %p %d\n", (void*)p, (void*)q, p == q);
    return 0;
}
evaluates to true if the compiler happens to place y directly after x in memory. GCC does, however, substitute address comparison to false if the pointers are derived from different objects, and the program above may give output such as
0x7fff66339290 0x7fff66339290 0
where the comparison evaluates to false even though both pointers are the same. The argument in bug 61502 is essentially that the standard is unclear what "immediately follow" means, and applications cannot rely on memory layout anyway (so there is no real use case). I believe both are wrong.

Use case

One use case for this kind of comparison in firmware for small embedded devices — you have a framework that supports different hardware configurations, but you want to avoid adding an #ifdef for every optional functionality. Instead, each implemented functionality exports a structure containing configuration data, and the linker collects them into an address range and generates a symbol __start for the start address, and __end for the address immediately following the array. The good thing about this is that the array contains exactly what is present at link time — elements are added to the array by just adding files to the link command, and functionality can be included/excluded by just relinking, without any need for additional configuration.

C does not have a convenient way of representing the __end label, so it is typically expressed as
extern struct my_struct __start[];
extern struct my_struct __end[];
where the linker guarantees that __end is immediately following __start. The array is processed as
void foo(void)
{
    for (struct my_struct *x = __start; x != __end; x++)
        do_something(x);
}
which is compiled to an infinite loop if GCC optimizes comparisons of pointers derived from two objects as always being different. The workaround suggested in the GCC mailing lists is to insert some inline assembly to confuse the compiler
void foo(void)
{
    struct my_struct *x = __start;
    struct my_struct *y = __end;
    asm("":"+r"(x));
    asm("":"+r"(y));
    for (; x != y; x++)
        do_something(x);
}
but this should not be needed for the natural reading of the standard excerpt quoted above.

This particular use case seems to work now as a side-effect by the "missing" optimization of global constant addresses, so the nasty workaround is not needed. But some GCC developers still claim the code is invalid...

The meaning of "following"

Comment #3 of the bug report argues that GCC is correct:
Except within a larger object, I'm not aware of any reason the cases of two objects following or not following each other in memory must be mutually exclusive. (If the implementation can track the origins of bit-patterns and where copies of those bit-patterns have got to, it might have a compacting garbage collector that relocates objects and changes what's adjacent to what, for example — I think such implementations are within the scope of what the C standard is intended to support. Or if you're concerned about how this changes bit-patterns of pointers, imagine that a C pointer is a (object key, offset) pair, and that comparison first converts the C pointer into a hardware address, where it's the mapping from object keys to hardware addresses that changes as a result of garbage collection rather than anything about the representation of the pointer.)
I do not think this is a correct interpretation. For example, DR #077 asks the question "Is the address of an object constant throughout its lifetime?", and the committee answers (emphasis added)
The C Standard does not explicitly state that the address of an object remains constant throughout the life of the object. That this is the intent of the Committee can be inferred from the fact that an address constant is considered to be a constant expression. The framers of the C Standard considered that it was so obvious that addresses should remain constant that they neglected to craft words to that effect.
That is, compacting garbage collection is not within the scope of what the C standard supports. I'm even less convinced by the key/offset example — this is the same situation as "normal" hardware, as you can view the virtual address as a pair (page, offset), and it is still the case that x[16] is immediately following x[15] for an array
uint32_t x[32];
even if x crosses a page boundary so that x[15] and x[16] are placed on different pages.

8 comments:

  1. I don't believe this is common knowledge, so it is worth noting that &x[4] is not undefined behavior due to §6.5.3.2 which has the following note(not normative):

    "Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2))."

    but the normative text agrees: http://stackoverflow.com/a/21247407/1708801

    ReplyDelete
  2. But how do you know that it is placed directly adjacent in the memory? The only way you could tell within the C++ memory model is by comparing the pointers - which is undefined behavior in and of itself. Casting them to intptr_t won't do the trick, either - the number that you get is just some number that can be converted back into a pointer at a later point, and there are no other guarantees about its meaning. Same thing with printing them - what you see printed is just some abstract representation, it doesn't carry any guarantee wrt physical layout (that it does in practice is immaterial - you cannot rely on that to make any judgment over said layout).

    Thus, per the as-if rule, gcc is correct, because they behave as-if those objects were not adjacent - and there's no way for you to tell otherwise. The address may be constant, but you don't have any means of determining if one address is adjacent to another that does not trigger U.B. (and so the result of the test is meaningless, and the compiler is not bound to respect it in any way).

    So far as I can tell, the only way you could really guarantee adjacency is by placing two arrays next to each other in a struct, like so:

    struct {
    int x[4], y[4];
    } foo;
    int* p = &foo.x[4];
    int* q = &foo.y[0];

    Does GCC still give you the "wrong" result in this case?

    ReplyDelete
    Replies
    1. GCC give the "correct" result for your code.

      When it comes to pointer comparison, it is defined behaviour in C. And I would claim that the text in the standard that says that they compare equal "if and only if they are immediately following in the address space" means the real layout should be taken into account when applying the as-if rule (i.e. I read the text as a (misguided) mechanism to let the application inspect the data layout).

      But I see no real use case for this on stack variables, so it does probably not matter... 😀

      Delete
    2. > When it comes to pointer comparison, it is defined behaviour in C.

      Two arbitrary pointers can only be compared via == and !=. But to determine which object is higher or lower in memory than the other one, you need to be comparing them using < and > - and those are undefined if you compare two pointers derived from different objects. This is from ISO C99 spec:

      "When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined."

      So C doesn't provide you the means to determine the relative layout of two random objects, with the exception of that "one past the last" rule. So whatever answer == gives to you, that is the definitive answer from the perspective of the C memory model for a given implementation.

      > means the real layout should be taken into account

      The question here becomes what "real layout" is. Remember that this phrase has to be interpreted in the context of the C memory model, not the actual OS and architecture. So you can only use the facilities provided to you by the language to establish said layout. And then the point above applies - the answer that == gives you is the only thing you can rely on to test. If it returns false, then the objects are not adjacent as far as your program is concerned, even if they're actually implemented as adjacent in the physical or virtual address space.

      Otherwise, the only way to test for adjacency is to set the code up in such a way that other language rules guarantee that it is (or isn't) adjacent in advance, like struct fields - then you can predict what == returns.

      Delete
    3. That is a strained interpretation of the standard that has the effect of limiting C utility. char *x = malloc(100); char *y = x+50; while(x < y) *x++ = 0;

      Delete
  3. Related topic: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78420

    ReplyDelete
  4. The quoted response to DR#077 hints at an important principle some compiler writers fail to recognize when they focus too much upon the wording of the Standard--that the Committee didn't always feel a need to say things it thought were obvious. That point needs to be applied, however, not just to the narrow issue of whether objects' addresses are constant throughout their lifetime, but rather to something much broader that should be obvious but isn't: the Standard makes no attempt to forbid all unreasonable behaviors, but instead relies upon people to use common sense and sound judgment in deciding what behaviors are reasonable and unreasonable for compilers targeting various platforms and application fields.

    From the point of view of the Standard, the question here is whether pointer comparisons are guaranteed to behave as an equivalence relation for the lifetime of the objects involved, or whether there may be certain combinations of pointer values where equality comparisons would yield Unspecified results without invoking Undefined Behavior. I don't think the Standard would forbid such a thing, but I also don't think compiler writers should care.

    Some application fields like systems programming need certain guarantees about pointer behavior that aren't needed in other fields like high-end number crunching. If making certain assumptions about pointers would make a compiler unsuitable for some purposes but improve its performance for some others, a compiler which is intended to serve both purposes should provide an option to enable or disable such assumptions, regardless of whether the mode enabled thereby is non-conforming, or is conforming but is nonetheless unsuitable for system programming.

    C could evolve from being a good language to being a really great one if compiler writers could re-recognize the principle that quality compiler suitable for particular target platforms and application fields should behave in a fashion which is reasonable for those platforms and fields, whether the Standard requires it or not, and programmers should be entitled to expect such behavior.

    ReplyDelete
    Replies
    1. The compiler writers appear to consider themselves adversaries of application programmers.

      Delete

Note: Only a member of this blog may post a comment.