Tuesday, July 4, 2017

Strict aliasing in C90 vs. C99 – and how to read the C standard

I often see claims that the strict aliasing rules were introduced in C99, but that is not true – the relevant part of the standard is essentially the same for C90 and C99. Some compilers used the strict aliasing rules for optimization well before 1999 as was noted in this 1998 post to the GCC mailing list (that argues that enabling strict aliasing will not cause many problems as most software already has fixed their strict aliasing bugs to work with those other compilers...)

C99 – 6.5 Expressions

The C standard does not talk about “strict aliasing rules”, but they follow from the text in “6.5 Expressions”:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:73
  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

73 The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Note the footnote that says that the intention of these rules is to let the compiler determine that objects are not aliased (and thus be able to optimize more aggressively).

C90 – 6.3 Expressions

The corresponding text in C90 is located in “6.3 Expressions”:
An object shall have its stored value accessed only by an lvalue that has one of the following types:36
  • the declared type of the object,
  • a qualified version of the declared type of the object,
  • a type that is the signed or unsigned type corresponding to the declared type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

36 The intent of this list is to specify those circumstances in which an object may or may not be aliased.
It is similar to the text in C99, and it even has the footnote that says it is meant to be used to determine if an object may be aliased or not, so C90 allows optimizations using the strict aliasing rules.

But standard have bugs, and those can be patched by publishing technical corrigenda, so it is not enough to read the published standard to see what is/is not allowed. There are two technical corrigenda published for C90 (ISO/IEC 9899 TCOR1 and ISO/IEC 9899 TCOR2), and the TCOR1 updates the two first bullet points. The corrected version of the standard says
An object shall have its stored value accessed only by an lvalue that has one of the following types:36
  • a type compatible with the declared type of the object,
  • a qualified version of a type compatible with the declared type of the object,
  • a type that is the signed or unsigned type corresponding to the declared type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the declared type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

36 The intent of this list is to specify those circumstances in which an object may or may not be aliased.
The only difference compared to C99 is that it does not talk about effective type, which makes it unclear how malloc:ed memory is handled as it does not have a declared type. This is discussed in the defect report DR 28 that asks if it is allowed to optimize
void f(int *x, double *y) {
  *x = 0;
  *y = 3.14;
  *x = *x + 2;
} 
to
void f(int *x, double *y) {
  *x = 0;
  *y = 3.14;
  *x = 2; /* *x known to be zero */
}
if x and y point to malloc:ed memory, and the committee answered (citing the bullet point list from 6.3)
We must take recourse to intent. The intent is clear from the above two citations and from Footnote 36 on page 38: The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Therefore, this alias is not permitted and the optimization is allowed.
In summary, yes, the rules do apply to dynamically allocated objects.
That is, the allocated memory gets its declared type when written and the subsequent reads must be done following the rules in the bullet-point list, which is essentially the same as what C99 says.

One difference between C90 and C99

There is one difference between the C90 and C99 strict aliasing rules in how unions are handled – C99 allows type-punning using code such as
union a_union {
  int i;
  float f;
};

int f() {
  union a_union t;
  t.f = 3.0;
  return t.i;
}
while this is implementation-defined in C90 per 6.3.2.3
[...] if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.

Reading the standard

Language lawyering is a popular sport on the internet, but it is a strange game where often the only winning move is not to play. Take for example DR 258 where the committee is asked about a special case in macro-expansion that is unclear. The committee answers
The standard does not clearly specify what happens in this case, so portable programs should not use these sorts of constructs.
That is, unclear parts of the standard should be avoided – not tried to get language lawyered into saying what you want.

And the committee is pragmatic; DR 464 is a case where the defect report asks to add an example for a construct involving the #line directive that some compilers get wrong, but the committee thought it was better to make it unspecified behavior
Investigation during the meeting revealed that several (in fact all that were tested) compilers did not seem to follow the interpretation of the standard as given in N1842, and that it would be best to acknowledge this as unspecified behavior.
So just because the standard says something does not mean that it is the specified behavior. One other fun example of this is DR 476 where the standard does not make sense with respect to the behavior of volatile:
All implementors represented on the committee were polled and all confirmed that indeed, the intent, not the standard, is implemented. In addition to the linux experience documented in the paper, at least two committee members described discussions with systems engineers where this difference between the standard vs the implementation was discussed because the systems engineers clearly depended on the implementation of actual intent. The sense was that this was simply a well known discrepency.

1 comment:

  1. The description of address-of and indirection operators in C90 strongly implies that using the address-of operator on an lvalue and then later using the indirection operator on the resulting pointer will yield an lvalue semantically equivalent to the original (within its lifetime, of course). Thus, given that:

    union u { int i; float f; };
    int test(union u *up1, union u *up2)
    {
    if (up1->i) up2->f = 1.0f;
    return up1->i;
    }

    would be Implementation-Defined in the case where up1==up2, the same should be true of:

    int test1(int *ip, float *fp)
    {
    if (*ip) *fp = 1.0f;
    return *ip;
    }
    union u { int i; float f; };
    int test(union u *up1, union u *up2)
    {
    return test1(&up1->i, &up2->f);
    }

    An implementation could define the behavior of the union accesses by saying that they will bitwise-translate when done directly via union types, or when code executes an __fp_wait() directive between, but yield Unspecified Value in other cases. A compiler that specified that could thus use aliasing optimization to omit the second read of *ip in the second example, and I have no particular beef with such behavior if there is nothing between the write and the read which would suggest aliasing

    On the other hand, gcc and clang employs the "optimization" even when given:

    int read_int(int *ip) { return *ip; }
    void write_float(float *fp, float f) { *fp = f;}
    union u { int i; float f; };
    int test(union u *up1, union u *up2)
    {
    if (read_int(&up1->i)) write_float(&up2->f, 1.0f);
    return read_int(&up1->i);
    }

    suggesting that any equivalence between an lvalue and the result of using address-of and indirection operators has vanished, even in cases where code uses the resulting pointer as the first operation after taking the object's address.

    BTW, the notion that Effective Type is set by writing an object effectively applies the C90 rules as though anything might be a pointer to a union member, except that implementation-Defined behavior is replaced with Undefined Behavior; from a practical perspective, however, it would be much better to instead apply a simple principle:

    1. If two or more accesses are made to the same lvalue and there is no evidence of any other accesses those accesses may be treated as unsequenced relative to anything that happens between them. Note that in the second example, there's no evidence to suggest that the access to *fp might affect *ip, but in the third example the address of &u2->f is taken *between* the two accesses to u1->i, so a compiler that isn't being willfully blind should have no trouble recognizing that it shouldn't move operations on "int*" across the function call (and could thus not consolidate the second read with the first).

    Note that applying that principle would permit some optimizations that the Standard doesn't presently allow, while reducing the number of cases where programs would need to disable aliasing optimizations altogether. Most notably:

    1. The Standard presently requires that writes via pointer of distinct types be performed in order in most cases, since each write will re-set the effective type of the storage referenced thereby.

    2. The Standard presently requires that compilers make extremely pessimistic aliasing assumptions when using character types, even in cases where there's no evidence that a character pointer will be used to access an object of some other particular type.

    ReplyDelete