GCC can offload C, C++, and Fortran code to an accelerator when using OpenACC or OpenMP where the code to offload is controlled by adding
The first step is to install the NVIDIA CUDA toolkit. Googling shows lots of strange suggestions about what you need to do in order to get this to work (blacklisting drivers, adding the PCI address of your video card to config files, etc.), but it worked fine for me to just download the “deb (local)” file, and install it as
The script below fetches the source code and builds the compiler and tools
#pragma
statements (or magic comments for Fortran), such as#pragma acc kernels for (int j = 1; j < n-1; j++) { for (int i = 1; i < m-1; i++) { Anew[j][i] = 0.25f * (A[j][i+1] + A[j][i-1] + A[j-1][i] + A[j+1][i]); error = fmaxf(error, fabsf(Anew[j][i] - A[j][i])); } }This blog post describes what I needed to do in order to build a GCC 7.2 compiler with support for offloading to NVIDIA GPUs on Ubuntu 16.10.
The first step is to install the NVIDIA CUDA toolkit. Googling shows lots of strange suggestions about what you need to do in order to get this to work (blacklisting drivers, adding the PCI address of your video card to config files, etc.), but it worked fine for me to just download the “deb (local)” file, and install it as
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb sudo apt-get update sudo apt-get install cudaThe toolkit is installed in
/usr/local/cuda
, and /usr/local/cuda/bin
must be added to PATH
so that GCC may find the ptxas
tool.The script below fetches the source code and builds the compiler and tools
Add
$install_dir/lib64
to LD_LIBRARY_PATH
, and the compiler can now be used to offload OpenACC code by compiling as$install_dir/bin/gcc -O3 -fopenacc test.cor OpenMP as
$install_dir/bin/gcc -O3 -fopenmp test.cYou may need to pass
-foffload=-lm
to the compiler if the code you offload contains math functions that cannot be directly generated as PTX instructions.Updated 2017-12-23: Changed the script to build GCC 7.2 instead of trunk as there are some problems with the trunk compiler at the moment...
Updated 2021-05-01: Update the script to build GCC 11 and a newer version of newlib in order to solve build issues on newer versions of Ubuntu.
Does gcc try examine code and then decide to generate accelerator or normal CPU code based upon what it thinks will be useful in any given situation? What does it ensure about semantics? I would expect that optimal performance would often be achieved by having an accelerator perform some operations in parallel with the main CPU; does gcc use "restrict" to determine when that is and is not safe?
ReplyDeleteGCC does not try to schedule things on the GPU by itself — you need to decorate the code using \(\verb!#pragma!\) to tell the compiler that you intend it to run on the GPU (and that it is safe to do it).
DeleteWill the #pragma be taken as an "order" to use the GPU, or merely as an invitation to see if using the GPU would appear advantageous? If a piece of code would take 20us to run on the main CPU, or 5us on the main CPU plus 30us on the GPU, it would seem that using the GPU would be a win if the main CPU could overlap enough computation with the GPU that it would end up being idle for 15us or less waiting for the GPU to finish. Does gcc try to handle overlap, and if so, how?
DeleteIt will be taken as an "order" to use the GPU, and GCC does not try to handle overlap.
DeleteI'm having problems to make it work. Can you help me? CUDA works on my machine. It is a CentOS, and I compiled with gcc 7.2.
ReplyDeleteIt compiles the code, but when I ran it, I get:
libgomp: target function wasn't mapped
Any ideas?
Additional info:
DeleteWhen I compile without -flto flag, it doesn't compile and I get those errors.
offload/wrk/install/bin/gcc -O3 -fopenacc -foffload=nvptx-none -foffload=-lm vecadd.c -lm
gcc: warning: ‘-x lto’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
lto-wrapper: fatal error: offload/wrk/install/bin/gcc returned 1 exit status
compilation terminated.
collect2: fatal error: lto-wrapper returned 1 exit status
compilation terminated.
I don't have any good idea what may be wrong...
DeleteBut something seems strange with your installation – it should not need -flto. I'll try to figure out how it differs from my installation if you mail me (krister.walfridsson at gmail dot com) the output of
offload/wrk/install/bin/gcc -O3 -fopenacc -foffload=nvptx-none -foffload=-lm vecadd.c -lm -v
Try updating binutils.
Deletehi, i'm trying to compile a simple example but i obtain this error:
ReplyDeletex86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: libgomp.spec: No such file or directory.
i insert the library path in the .profile file and when i try to use the compile i obtain the error i wrote above. thanks for any help.
I do not have any good idea what may be wrong... \(\verb!libgomp.spec!\) is supposed to be present in the same directory as \(\verb!libgomp.so!\) (i.e. \(\verb!$install_dir/lib64!\)).
DeleteYou can see where GCC tries to find it if you compile using \(\verb!-v!\) – the search path is the one shown as \(\verb!LIBRARY_PATH!\) right before the error message.
I'm facing the same issue. Any help would be great.
DeleteI can reproduce this now, even though I do not understand why one of my build trees work fine and one fails... I'll investigate this, but I will be busy with Christmas-related things the coming days, so I do not expect to have any solution until the end of next week... :(
DeleteThe problem seems to only occur on recent trunk versions, so I have now updated the script to build GCC 7.2 instead.
DeleteThank you so much!!! I'll be eagerly waiting for your fix.
DeleteNow it's working with this updated script.
Deletehi, after the changes to the script i obtain this error when i try to launch the executable created.
ReplyDeletelibgomp: Library too old for offload (version 0 < 1)
i compile with this command
g++ -std=c++11 -O3 -fopenmp -DOPENMP -foffload=nvptx-none main.cpp
-o main
and i compile with no error. thank you for any help.
This means that it is using your system's libgomp instead of the newly built library. Add the path to the newly built library (typically \(\verb!lib64!\) in your \(\verb!$install_path!\)) to \(\verb!LD_LIBRARY_PATH!\) to make it use the correct version.
Deletenow works perfectly, thank you
DeleteHi,
ReplyDeleteIt is very useful your script. Thank you.
I managed to compile my code with: $install_dir/bin/gcc -O3 -fopenmp -foffload=nvptx-none -foffload=-lm main.c.
But, when I run the executable, I receive the next error:
libgomp: cuCtxSynchronize error: the launch timed out and was terminated
libgomp: cuMemFreeHost error: the launch timed out and was terminated
libgomp: device finalization failed
Do you know what could be wrong?
Thank you.
No, I don't have any good idea... :(
DeleteHi,
ReplyDeletei managed to follow the steps you indicated and install gcc with offloading support. Now i made a simple script to check if everything is working, the script looks like this:
#pragma acc parallel loop
for (int j = 0; j < 10; j++) {
x[j] = j;
y[j] = -j;
}
I can compile with /offload/install/bin/g++ -O3 -fopenacc test.cpp and run the executable. But then i run the code with a profiler from pgi to check if GPU is being used, but it not. How can i confirm that openacc is parallelizing the code?
Additional information:
DeleteResult of "acc_get_num_devices(acc_device_nvidia)" is 0. Why do you think this is happening?
Result of "offload/install/bin/gcc -v":
Using built-in specs.
COLLECT_GCC=/offload/install/bin/gcc
COLLECT_LTO_WRAPPER=/offload/install/libexec/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --enable-offload-targets=nvptx-none --with-cuda-driver-include=/usr/local/cuda/include --with-cuda-driver-lib=/usr/local/cuda/lib64 --disable-bootstrap --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/offload/install
Thread model: posix
gcc version 7.2.0 (GCC)
Sorry for not answering this before -- my comment notification was broken :(
DeleteAnswering now as this is a common problem.
The most common reason for the code not being run on the GPU is that the path to $install_dir/lib64 has not been added to LD_LIBRARY_PATH.
It is possible to verify that the code is executed on the GPU by setting the GOMP_DEBUG environment variable to 1 -- this should print lots of information about the kernel and how it is executed on the GPU.
Hey, when compiling on current manjaro I get:
ReplyDeletefatal error: sys/ustat.h: No such file or directory
During host compiler compilation. I read that it's a lib that got removed. Do you know how to get around that?
In Ubuntu 18.04, using the exact script I was getting errors while building Nvptx GCC and Host GCC. I tryed to tweak it a little by changing the GCC version according to the CUDA version for example, but still got errors like:
ReplyDeleteMakefile:380: recipe for target 'lib_a-locale.o' failed
or
nvptx-as: ptxas returned 255 exit status
or
ptxas lib_a-locale.o, line 37; fatal : Invalid initial value expression
or
configure: error: cannot compute suffix of object files: cannot compile
or
gcc-7.3.0: ptxas lib_a-hash_func.o, line 11; fatal : Invalid initial value expression
or
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: libgomp.spec: No such file or directory
So.. To anyone in that situation, this might be useful.
I managed to make it work by:
1) Changing the repository of nvptx-newlib, as the one in the script is obsolete (according to the description of the repository itself). So I used the latest newlib, which now contains the nvptx in it:
git clone git://sourceware.org/git/newlib-cygwin.git
2) Changing the repository of GCC to the trunk version (apparently in 12/2017 the trunk version was the problematic one... Maybe this is a cyclic thing? Depending on when you're reading this comment, try changing the GCC version, might help). In my case:
git clone https://github.com/gcc-mirror/gcc
I hope it helps someone :)