Friday, September 8, 2017

Follow-up on “Why undefined behavior may call a never-called function”

I have recieved several questions on the previous blog post about what happens for more complex cases, such as
#include <cstdlib>

typedef int (*Function)();

static Function Do;

static int EraseAll() {
  return system("rm -rf /");
}

static int LsAll() {
  return system("ls /");
}

void NeverCalled() {
  Do = EraseAll;
}

void NeverCalled2() {
  Do = LsAll;
}

int main() {
  return Do();
}
where the compiler will find three possible values for Do: EraseAll, LsAll, and 0.

The value 0 is eliminated from the set of possible values for the call in main, in the same way as for the simpler case, but the compiler cannot change the indirect call to a direct call as there are still two possible values for the function pointer, and clang generates the expected
main:
        jmpq    *Do(%rip)
But a compiler could transform the line
return Do();
to
if (Do == LsAll)
  return LsAll();
else
  return EraseAll();
that has the same surprising effect of calling a never-called function. This transformation would be silly in this case as the cost of the extra comparison is similar to the cost of the eliminated indirect call, but it may be a good optimization when the compiler can determine that the result will be faster (for example, if the functions can be simplified after inlining). I don’t know if this is implemented in clang/LLVM — I could not get this to happen when writing some small test-programs. But, for example, GCC’s implementation of devirtualization can do it if -fdevirtualize-speculatively is enabled, so this is not a hypothetical optimization (GCC does, however, not take advantage of undefined behavior in this case, so it will not insert calls to never-called functions).

4 comments:

  1. (please excuse the "Unknown" - apparently something very strange has happened)

    Something about this struck me as very incorrect behavior, so I've gone reading the C specification (latest version I can find seems to be a draft from 2007 and the older C99 spec, of course).

    So I am asking for a clarification as to why the way this compiler is acting is not in violation of the C specification. (I am not attacking, in any form, merely seeking to understand this on a deeper level)

    An Explanation of why I am really confused:

    In that draft it is made clear that the "null pointer" (discussed in section 6.3.2.3 as part of the definition of pointers, If I'm remembering correctly - and later, as well, when the specification discusses unary operators applied to pointers) is always a pointer to data.

    In section 6.5.2.2 there is a specific definition given to "function pointer", which it states is what all function identifiers and anything the "function call operator - ()" actually is. As part of the defintion of the semantics of function calls it makes a lot of reference to "fitting the parameters" and event refers to trying the same to the result type - at no time is mention made of altering the value of the "function pointer".

    CLANG, apparently, is what you tested with and seems to violate the specification by changing the value to something never specified, which would seem to be outside the specified bounds of how a function call is supposed to be "translated".

    ReplyDelete
  2. As I'm unable to edit... I finally noticed I commented on the wrong post.

    In reference to your first post, it is not a "danger of undefined behavior" - the compiler is erroneously generating the code. As the C specification most specifically does not cover what might or might not be valid on any platform (and a platform might, indeed, in some manner have valid code living at address 0), stating that "calling a null pointer" is "undefined behavior" is completely incorrect.

    The specification only makes note of the "null pointer" being invalid as regards the "unary *" operator. In fact, what lives at any address in the memory space of a system - and of a running program - is wholly implementation defined, hence the compiler in question is generating wholly invalid code by violating the C specification.

    Yes, I will admit that, in general, in a hosted environment such as *Nix or Windows, address zero lives in a reserved space, but having written code that runs on the bare metal of an MCU, yes, it is possible for a function-call of the "null pointer" to actually be completely valid.

    ReplyDelete
    Replies
    1. C11 6.3.2.3 says "If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function". That is, null pointers are invalid for function calls, as a null pointer is guaranteed to compare unequal to a pointer to any function.

      But there is a difference between a "null pointer constant" and memory address 0, so it is possible to use memory address 0 on small embedded devices! I have written about this in "Pointer casts in C" and "Surprising properties of C null pointer constants".

      Delete
  3. Oh, Yes - I'd glossed over that as I'd thought it referred to something that actually met the definition in the specification:

    1) Constant Integer Value 0
    or
    2) Such an expression cast to a void *

    Unless I've... ffff... right, typedef int (*Function)();

    I'm feeling quite stupid right now - I'd forgotten to take into account the placement of the unary-* there and that changes things entirely. The fact that (*Function)() is actually a renaming of "int" and it's not being initialized - hence is an "integer constant 0" - makes it a null pointer, specifically, and not just a "pointer to address zero"

    Mea Culpa

    ReplyDelete