Krister Walfridsson’s old blog: May 2016

I started with NetBSD in the mid-nineties, on a Sun SPARC ELC with 32Mbyte of memory, where I used GCC and Emacs on X11 with FVWM as the window manager. I'm still using GCC, Emacs, and FVWM with the same configuration files (updated for pointless changes in Emacs and FVWM), but I now need much more memory and CPU performance... I thought it would be interesting to investigate why.

Much of this is likely due to GCC — the GCC 2.7.2 cc1 from NetBSD 1.2 has a text segment size of 1073152 bytes, while GCC 4.8.5 used for today's NetBSD has a size of 11012557, and I believe the dynamic memory and instruction count used during compilation has increased much more than that. But the operating system has also increased in size (for example, NetBSD 1.2 libc.so is 331776 bytes, compared to 1390980 today), and I plan to start my investigation by looking at the OS.

Building NetBSD

One thing I like about NetBSD is that it avoids pointless changes, so the build process has been the same for a long time. In addition, is is easy to cross-compile it for different architectures — in fact, NetBSD is always cross-compiled, even when building e.g. a x86 distribution on a x86 architecture.

There is a script build.sh that sets up everything, so building a release for e.g. Sun SPARC can be done as

./build.sh -u -U -m sparc release

This first compiles the tools needed for the build, then builds the full release of NetBSD, and packages the result. This works on "any" operating system — you can build NetBSD on e.g. Linux and Mac OS X.

Cross-compiling individual components are also trivial. The tools are per default placed in a directory named after the host platform, in my case obj/tooldir.NetBSD-6.1.5-i386/bin. The most useful tool here is a wrapper for make that sets up the environment for cross-compiling. This can be used to compile any component, e.g.

cd lib/libc
../../obj/tooldir.NetBSD-6.1.5-i386/bin/nbmake-sparc

will build libc.so.

Some results

As a first step, I have built a NetBSD release from trunk 1 January each year 2002–2016, mostly just in order to see that it was as easy to build as I thought, and to try to get a feel for what can be interesting to investigate closer. I stopped at 2002, as the build.sh was not available earlier...

I built i386 (which is the NetBSD name for x86, even though it does not support i386 CPUs any longer) and sun3. The reason I choose them is that they represent two extremes of architectures — sun3 is a hardware platform that was obsolete already in the mid-nineties, so software size "should not" increase for it unless new functionality is added, while the i386 hardware is still used, and it has got more features, bigger caches etc., during this time, which may increase code size.

And the code size has increased during this time. For example, this is how libc.so, libm.so, and the kernel have grown:

These plots do not say much, and code increase is not necessarily "bad", even on constrained platforms, if unused functionality never get paged in. My plan is to look into the details of the reason for these increases (such as new functionality, support for more hardware, compiler changes, careless developers, etc.), to get a feel for how much each reason contributes. Please let me know if there are some specific questions you want me to investigate!

There was some concern that type-based alias analysis could cause problems when it was implemented in GCC, but the conclusion (as expressed in the GCC development list) was that all other compilers already did this optimization,¹ so most code had already been fixed:

There's less and less of this code IMHO — modern compilers have been doing this kind of alias analysis for some time and as a result folks have been forced to fix their code. Of course this doesn't apply to Linux and some other free software projects since they only use gcc.

I guess that mail was a bit too optimistic — it was written in 1998, and I still see lots of broken code and complaints about -fstrict-aiasing...

What type-based alias analysis means

An informal description of the type-based alias analysis rules is that every memory location has a type, and you are only allowed to access the memory using the "correct" type (the compiler can, therefore, assume that two accesses with incompatible types do not alias). The C11 standard describes this in 6.5 "Expressions":

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

As a concrete example, a variable of type int can only be accessed through an int* (including unsigned int*, const int*, etc.) or through a char* (including unsigned char*, const char*, etc.).

These restrictions allow the compiler to reorder operations when the types ensure that they access different objects. Consider the function

int i;

void foo(float *f)
{
    i = 23;
    *f = 0.0;
    i = i + 19;
}

*f cannot modify i as it has a different type, so the compiler is allowed to move the store to i over the store to *f, and the function is optimized to

void foo(float *f)
{
    *f = 0.0;
    i = 42;
}

Note that the type-based aliasing rules only talk about how to access objects, and not about pointer casting — it is allowed to cast pointers between incompatible types (as long as you follow the rules for pointer casts) such as

int i;
float *f = (float *)&i;

But accessing this as *f will now invoke undefined behavior, as the object pointed to is an int, and it is not allowed to access it by an expression having a float type.

type punning — union

There are cases where you must access data using an incorrect type, and there are a few ways to do this. The usual cases are where you need to get the bitwise representation of some value, and we will consider the example from the Wikipedia article on type punning that negates a floating point number by changing the most significant bit. The examples will assume 32-bit int and float.

The naive version of just casting the a pointer to int* does not work

bool is_negative(float x)
{
    unsigned int *ui = (unsigned int *)&x;
    return (*ui & 0x80000000u) != 0; // Undef behavior reading float as int
}

as it breaks the type-based aliasing rules. The best way to solve this is in most cases to use a union²

bool is_negative(float x)
{
    union
    {
        unsigned int ui;
        float f;
    } u;
    u.f = x;  
    return (u.ui & 0x80000000u) != 0;
}

Both GCC and LLVM are smart enough to generate as efficient code as you would have expected from the invalid version above.

The union trick requires that all accesses are done through the union — the result is not defined when accessing through a pointer, even if the pointer has the "correct" type

bool is_negative(float x)
{
    union
    {
        unsigned int ui;
        float f;
    } u;
    u.f = x;
    unsigned int *ui = &u.ui;
    return (*ui & 0x80000000u) != 0;  // Undefined behavior
}

type punning — character pointer

Character pointers can be used to access any type, so the is_negative function can be implemented as

bool is_negative(float x)
{
    unsigned char *p = (unsigned char *)&x;
    return (p[3] & 0x80) != 0;
}

assuming a little-endian architecture.

Note that int8_t is not guaranteed to be of character type. That is, the following function may be invalid

bool is_negative(float x)
{
    uint8_t *p = (uint8_t *)&x;
    return (p[3] & 0x80) != 0;  // Possible undefined behavior
}

Treating int8_t as a character type is the reasonable thing to do, and I would assume all compilers do this. But there are developers that think this is a bug — see the discussion in GCC bug 66110...

type punning — memcpy

A third way to do the type punning is using memcpy

bool is_negative(float x)
{
    unsigned int ui;
    memcpy(&ui, &x, 4);
    return (ui & 0x80000000u) != 0;
}

Both GCC and LLVM are smart enough to optimize away the memcpy, and generate similar code as the version using a union.

This type punning does only work if the destination is a variable — you cannot use malloc:ed memory for this. The reason is that memcpy copies the effective type from its source when writing to allocated memory

bool is_negative(float x)
{
    unsigned int *p = malloc(4);
    if (p == NULL)
        abort();
    memcpy(p, &x, 4);                // Effective type of *p is now float
    return (*p & 0x80000000u) != 0;  // Undef behavior reading float as int
}

allocated memory

Memory returned from malloc does not have a type, so each memory location gets an effective type when it is written. Subsequent reads must then be done according to the type-based aliasing rules as usual.

The type of the allocated memory can be updated by writing with a new type

void *p = malloc(4);
if (p == NULL)
    abort();
*(float *)p = 1.0;      // Effective type is float
do_something(p);
*(int *)p = 0;          // Effective type is now int

which allows the buffer being used for different things over its lifetime.

This may have some surprising effects, such as the examples in GCC bug 69776 and 70484. Consider the function

int f(int *pi, long *pl)
{
    *pi = 1;
    *pl = 0;
    return *(char *)pi;
}

The type-based aliasing rules say that int and long cannot alias, and this function can be optimized to

int f(int *pi, long *pl)
{
    *pi = 1;
    *pl = 0;
    return 1;
}

But it is possible for the pointers to alias if both point to the same malloc:ed memory, so the following will print a different value depending on if the optimization is done or not

int main(void)
{
    void *p = malloc(sizeof(long));
    if (p == NULL)
        abort();
    printf("%d\n", f(p, p));
    return 0;
}

It is a bit unclear exactly what the C standard requires in these cases, and you can argue that the optimization is invalid in this case. Recent versions of GCC are conservative and do not optimize this, but older versions are more aggressive, so it prudent to try to avoid playing tricks with type changes in allocated memory regardless of what the standard says...

1 Many developers seem to think that type-based aliasing was introduced by C99, but that is not true; C90 has essentially the same aliasing rules (although C99 contain some minor clarifications/improvements). I guess the reason for the belief is that GCC added -fstrict-aliasing at roughly the same time as it implemented C99.
2 Many discussions of type punning (such as the Wikipedia article) says that type punning through a union is a GCC extension. The background of this is that C90 said

[...] if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.

I believe the committee intended it to work, but made it "implementation-defined" as the concrete result depends on the implementation (byte order, trap representations, etc.). But "implementation-defined" lets the compiler do whatever it wants, as long as the behavior is documented (and the original implementation in GCC did, in fact, have additional restrictions, although that was fixed before the release). GCC documents this works for C90 too, so it is in some sense a GCC extension...

Krister Walfridsson’s old blog

Sunday, May 29, 2016

20 years of NetBSD code bloat