I saw on twitter that it takes more than twice as much time compiling the Ogre graphics engine using GCC than when using Clang. My experience is that GCC and Clang usually compile with similar speed, so I decided to look into why compiling Ogre is different.
It turned out that a big part of the difference comes from which C++ version the compilers use per default –
One reason for this difference is that some of the standard include files are more expensive in C++11 mode as they suck in more dependencies. For example, compiling a file containing just the line
It is a bit unclear to me exactly where the rest of the slowdown comes from, but it seems to be spread all over the code (I tried to remove various classes in the Ogre code base, and removing 10% of the source code seems to affect both the fast and slow version by about 10%) so I assume this is just because the templates in the C++11 STL are more complex and the compiler needs to work a bit harder each time they are used...
Anyway, the difference in compilation time between
Updated: The original blog post said that
It turned out that a big part of the difference comes from which C++ version the compilers use per default –
clang-5.0
defaults to C++98, while gcc-7
defaults to a newer version. Forcing the compilers to use C++98 by passing -std=c++98
makes GCC compile Ogre in about half the time (668s vs. 1135s), while passing -std=c++11
nearly doubles the time Clang needs to compile it!One reason for this difference is that some of the standard include files are more expensive in C++11 mode as they suck in more dependencies. For example, compiling a file containing just the line
#include <memory>takes 0.16 seconds on my computer when using C++11
> time -p g++ -O2 -c test.cpp -std=c++11 real 0.16 user 0.14 sys 0.01while compiling it as C++98 is faster
> time -p g++ -O2 -c test.cpp -std=c++98 real 0.02 user 0.01 sys 0.01The 0.14-second difference may not seem that big, but it makes a difference when, as for Ogre, you are compiling more than 500 files, each taking about one second. The increased cost of including standard header files for C++11 compared to C++98 adds about 20% to the Ogre build time.
It is a bit unclear to me exactly where the rest of the slowdown comes from, but it seems to be spread all over the code (I tried to remove various classes in the Ogre code base, and removing 10% of the source code seems to affect both the fast and slow version by about 10%) so I assume this is just because the templates in the C++11 STL are more complex and the compiler needs to work a bit harder each time they are used...
Anyway, the difference in compilation time between
-std=c++98
and -std=c++11
was much bigger than I had guessed, and I’ll now ensure I use -std=c++98
when building C++98 code.Updated: The original blog post said that
gcc-7
uses C++11 per default. That was wrong, it defaults to C++14.
I think you've just tested effects of the disk cache. Can you do this instead:
ReplyDelete```
#!/usr/bin/env bash
for i in `seq 1 100`;
do
cat test.cpp > /dev/null
done
for i in `seq 1 10`;
do
/usr/bin/time -f"%E %U %S" g++ -O2 -c test.cpp -std=c++98
done
echo
for i in `seq 1 10`;
do
/usr/bin/time -f"%E %U %S" g++ -O2 -c test.cpp -std=c++11
done
```
This requires `time` utility as built-in one doesn't allow you to customize the output. Here %E is real, %U is user, and %S is system time.
On my system it prints out:
```
0:00.03 0.00 0.02
0:00.01 0.01 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.01 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
0:00.01 0.00 0.00
```
- Yury
(I'm not sure why Blogger doesn't allow my to sign-in :( )
Hmm. It makes sense that the disk cache should handle this, but I get the same result when using the script:
Delete```
0:00.02 0.02 0.00
0:00.02 0.02 0.00
0:00.02 0.01 0.00
0:00.02 0.01 0.00
0:00.02 0.02 0.00
0:00.02 0.02 0.00
0:00.02 0.02 0.00
0:00.02 0.02 0.00
0:00.02 0.01 0.00
0:00.02 0.01 0.00
0:00.15 0.12 0.02
0:00.15 0.14 0.00
0:00.15 0.14 0.00
0:00.15 0.14 0.01
0:00.15 0.14 0.00
0:00.15 0.13 0.02
0:00.15 0.13 0.01
0:00.15 0.13 0.01
0:00.15 0.13 0.01
0:00.15 0.14 0.00
```
Did you do any tests to see if there much difference in terms of the generated code in terms of performance?
ReplyDelete