Sunday, June 10, 2018

On an example from “What Else Has My Compiler Done For Me Lately?”

One of the examples in Matt Godbolt’s C++Now 2018 talk “What Else Has My Compiler Done For Me Lately?” is the function
void maxArray(double * __restrict x, double * __restrict y)
{
  for (int i = 0; i < 65536; i++) {
    if (y[i] > x[i])
      x[i] = y[i];
  }
}
The compiler generates vectorized code that processes four elements at a time – it reads the elements from x and y, compares the elements, and uses the result of the comparison as a mask in a masked move to write the elements from y that are larger than the corresponding element from x
vmovupd ymm0, ymmword ptr [rsi + rax]
vmovupd ymm4, ymmword ptr [rdi + rax]
vcmpltpd ymm4, ymm4, ymm0
vmaskmovpd ymmword ptr [rdi + rax], ymm4, ymm0
Modifying maxArray to use a more max-like construct (or std::max) as in
void maxArray2(double * __restrict x, double * __restrict y)
{
  for (int i = 0; i < 65536; i++) {
    x[i] = (y[i] > x[i]) ? y[i] : x[i];
  }
}
makes the compiler generate this using a “max” instruction instead of the compare and masked move
vmovupd ymm0, ymmword ptr [rsi + rax]
vmaxpd ymm0, ymm0, ymmword ptr [rdi + rax]
vmovupd ymmword ptr [rdi + rax], ymm0

Matt says he is a bit surprised that the compiler cannot see that the first version too can be generated in this way, but the compiler is doing the right thing – it is not allowed to change maxArray in this way! The reason is that maxArray only writes to x when the value changes while maxArray2 always writes to x, and the compiler would introduce problems if the generated code contain stores that are not in the original source code. Consider for example the program
const double a1[65536] = {0.0};
double a2[65536] = {0.0};

int main(void)
{
  maxArray((double*)a1, a2);
  return 0;
}
that is passing a constant array to maxArray. It is valid to cast away const as long as the object is not written to through the pointer, so this program is correct – y[i] is never bigger than x[i] for any i, so maxArray will never write to (the mask in the vectorized code is never set, so the vmaskmovpd instruction is essentially a nop). The code from maxArray2 does, however, always write to x so it would crash on this input as the compiler places a1 in read-only memory.

1 comment:

  1. It would seem like it might be helpful to have directives which would tell a compiler to feel free to behave as though it read and/or write a range of storage as though it were an array of a particular type, but without asking it to actually perform such reads or writes. This might be especially handy if there were a few variations which would behave differently in cases where the directive could tighten or loosen semantics, e.g.

    1. read everything and write it exactly as it was

    2. read everything and write all values that are not Indeterminate exactly as they were; replace Indeterminate values with Unspecified bit patterns.

    3. fell free to fill the storage with Indeterminate values.

    4. fill the storage with Unspecified bit patterns.

    5. read storage as though it contained bit patterns of unknown type, and interpret those bits as objects of the described type.

    6. read storage as though it contained objects of the given type, and write the bit patterns so as to be readable by any type.

    While a compiler might be able to recognize that it could process a loop like:

    // Read and write 1000 bytes at myArray, while converting any
    // Indeterminate values to Unspecified bit patterns.

    unsigned char *p = myArray;

    for (int i=0; i<1000; i++)
    {
    unsigned char temp = 0; readval = *p;
    if (readval & 1) temp |= 1;
    if (readval & 2) temp |= 2;
    if (readval & 4) temp |= 4;
    if (readval & 8) temp |= 8;
    if (readval & 16) temp |= 16;
    if (readval & 32) temp |= 32;
    if (readval & 64) temp |= 64;
    if (readval & 128) temp |= 128;
    *p++ = temp;
    }

    without having to generate any code, having a directive for that purpose would seem a lot cleaner.

    ReplyDelete