Krister Walfridsson’s old blog: Building GCC with support for NVIDIA PTX offloading

Monday, April 10, 2017

Building GCC with support for NVIDIA PTX offloading

GCC can offload C, C++, and Fortran code to an accelerator when using OpenACC or OpenMP where the code to offload is controlled by adding #pragma statements (or magic comments for Fortran), such as

#pragma acc kernels
for (int j = 1; j < n-1; j++) {
  for (int i = 1; i < m-1; i++) {
    Anew[j][i] = 0.25f * (A[j][i+1] + A[j][i-1] + A[j-1][i] + A[j+1][i]);
    error = fmaxf(error, fabsf(Anew[j][i] - A[j][i]));
  }
}

This blog post describes what I needed to do in order to build a GCC 7.2 compiler with support for offloading to NVIDIA GPUs on Ubuntu 16.10.

The first step is to install the NVIDIA CUDA toolkit. Googling shows lots of strange suggestions about what you need to do in order to get this to work (blacklisting drivers, adding the PCI address of your video card to config files, etc.), but it worked fine for me to just download the “deb (local)” file, and install it as

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

The toolkit is installed in /usr/local/cuda, and /usr/local/cuda/bin must be added to PATH so that GCC may find the ptxas tool.

The script below fetches the source code and builds the compiler and tools

	#!/bin/sh

	#
	# Build GCC with support for offloading to NVIDIA GPUs.
	#

	work_dir=$HOME/offload/wrk
	install_dir=$HOME/offload/install

	# Location of the installed CUDA toolkit
	cuda=/usr/local/cuda

	# Build assembler and linking tools
	mkdir -p $work_dir
	cd $work_dir
	git clone https://github.com/MentorEmbedded/nvptx-tools
	cd nvptx-tools
	./configure \
	--with-cuda-driver-include=$cuda/include \
	--with-cuda-driver-lib=$cuda/lib64 \
	--prefix=$install_dir
	make \|\| exit 1
	make install \|\| exit 1
	cd ..

	# Set up the GCC source tree
	git clone git://sourceware.org/git/newlib-cygwin.git nvptx-newlib
	git clone --branch releases/gcc-11 git://gcc.gnu.org/git/gcc.git gcc
	cd gcc
	contrib/download_prerequisites
	ln -s ../nvptx-newlib/newlib newlib
	cd ..
	target=$(gcc/config.guess)

	# Build nvptx GCC
	mkdir build-nvptx-gcc
	cd build-nvptx-gcc
	../gcc/configure \
	--target=nvptx-none --with-build-time-tools=$install_dir/nvptx-none/bin \
	--enable-as-accelerator-for=$target \
	--disable-sjlj-exceptions \
	--enable-newlib-io-long-long \
	--enable-languages="c,c++,fortran,lto" \
	--prefix=$install_dir
	make -j`nproc` \|\| exit 1
	make install \|\| exit 1
	cd ..

	# Build host GCC
	mkdir build-host-gcc
	cd build-host-gcc
	../gcc/configure \
	--enable-offload-targets=nvptx-none \
	--with-cuda-driver-include=$cuda/include \
	--with-cuda-driver-lib=$cuda/lib64 \
	--disable-bootstrap \
	--disable-multilib \
	--enable-languages="c,c++,fortran,lto" \
	--prefix=$install_dir
	make -j`nproc` \|\| exit 1
	make install \|\| exit 1
	cd ..

view raw build-gcc-offload-nvptx.sh hosted with ❤ by GitHub

Add $install_dir/lib64 to LD_LIBRARY_PATH, and the compiler can now be used to offload OpenACC code by compiling as

$install_dir/bin/gcc -O3 -fopenacc test.c

or OpenMP as

$install_dir/bin/gcc -O3 -fopenmp test.c

You may need to pass -foffload=-lm to the compiler if the code you offload contains math functions that cannot be directly generated as PTX instructions.

Updated 2017-12-23: Changed the script to build GCC 7.2 instead of trunk as there are some problems with the trunk compiler at the moment...

Updated 2021-05-01: Update the script to build GCC 11 and a newer version of newlib in order to solve build issues on newer versions of Ubuntu.

25 comments:

supercatApril 26, 2017 at 5:53 PM
Does gcc try examine code and then decide to generate accelerator or normal CPU code based upon what it thinks will be useful in any given situation? What does it ensure about semantics? I would expect that optimal performance would often be achieved by having an accelerator perform some operations in parallel with the main CPU; does gcc use "restrict" to determine when that is and is not safe?
ReplyDelete
Replies
UnknownSeptember 8, 2017 at 9:10 PM
I'm having problems to make it work. Can you help me? CUDA works on my machine. It is a CentOS, and I compiled with gcc 7.2.

It compiles the code, but when I ran it, I get:

libgomp: target function wasn't mapped

Any ideas?

ReplyDelete
Replies
ZanathosNovember 8, 2017 at 8:24 PM
hi, i'm trying to compile a simple example but i obtain this error:
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: libgomp.spec: No such file or directory.

i insert the library path in the .profile file and when i try to use the compile i obtain the error i wrote above. thanks for any help.
ReplyDelete
Replies
ZanathosDecember 26, 2017 at 9:13 PM
hi, after the changes to the script i obtain this error when i try to launch the executable created.

libgomp: Library too old for offload (version 0 < 1)

i compile with this command

g++ -std=c++11 -O3 -fopenmp -DOPENMP -foffload=nvptx-none main.cpp
-o main

and i compile with no error. thank you for any help.
ReplyDelete
Replies
AngelicaMarch 28, 2018 at 12:53 AM
Hi,

It is very useful your script. Thank you.

I managed to compile my code with: $install_dir/bin/gcc -O3 -fopenmp -foffload=nvptx-none -foffload=-lm main.c.

But, when I run the executable, I receive the next error:

libgomp: cuCtxSynchronize error: the launch timed out and was terminated
libgomp: cuMemFreeHost error: the launch timed out and was terminated
libgomp: device finalization failed

Do you know what could be wrong?

Thank you.

ReplyDelete
Replies
UnknownApril 17, 2018 at 4:16 PM
Hi,

i managed to follow the steps you indicated and install gcc with offloading support. Now i made a simple script to check if everything is working, the script looks like this:

#pragma acc parallel loop
for (int j = 0; j < 10; j++) {
x[j] = j;
y[j] = -j;
}

I can compile with /offload/install/bin/g++ -O3 -fopenacc test.cpp and run the executable. But then i run the code with a profiler from pgi to check if GPU is being used, but it not. How can i confirm that openacc is parallelizing the code?
ReplyDelete
Replies
CoderOctober 18, 2018 at 10:52 PM
Hey, when compiling on current manjaro I get:
fatal error: sys/ustat.h: No such file or directory
During host compiler compilation. I read that it's a lib that got removed. Do you know how to get around that?
ReplyDelete
Replies
Vinícius PachecoApril 26, 2020 at 12:37 AM
In Ubuntu 18.04, using the exact script I was getting errors while building Nvptx GCC and Host GCC. I tryed to tweak it a little by changing the GCC version according to the CUDA version for example, but still got errors like:

Makefile:380: recipe for target 'lib_a-locale.o' failed
or
nvptx-as: ptxas returned 255 exit status
or
ptxas lib_a-locale.o, line 37; fatal : Invalid initial value expression
or
configure: error: cannot compute suffix of object files: cannot compile
or
gcc-7.3.0: ptxas lib_a-hash_func.o, line 11; fatal : Invalid initial value expression
or
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: libgomp.spec: No such file or directory

So.. To anyone in that situation, this might be useful.
I managed to make it work by:
1) Changing the repository of nvptx-newlib, as the one in the script is obsolete (according to the description of the repository itself). So I used the latest newlib, which now contains the nvptx in it:
git clone git://sourceware.org/git/newlib-cygwin.git

2) Changing the repository of GCC to the trunk version (apparently in 12/2017 the trunk version was the problematic one... Maybe this is a cyclic thing? Depending on when you're reading this comment, try changing the GCC version, might help). In my case:
git clone https://github.com/gcc-mirror/gcc

I hope it helps someone :)
ReplyDelete
Replies

Note: Only a member of this blog may post a comment.