## Monday, September 12, 2016

### Mobile GPUs — power, performance, area

I read an IWOCL 2016 conference paper "OpenCL-Based Mobile GPGPU Benchmarking: Methods and Challenges" that gives some good advice on how to measure GPU performance on mobile devices. The paper focuses on micro benchmarks, and it is hard to use these to draw relevant conclusions — especially as mobile GPUs are rather different compared to the desktop CPUs that most developers are used to. Below are some of the things that were unclear to me when I started working on the shader compiler for a mobile GPU.

#### Power consumption and heat

The major design constraint for mobile devices is power consumption as it is hard to get rid of heat without large heatsinks and fans. The amount of heat that can be handled varies a lot between devices depending on the size and how they are built, but it corresponds to somewhere between 1W for a low-end phone and 7W for a tablet — and that includes power for CPU, GPU, memory, radio, etc.

It takes a while for the temperature to rise, so the hardware may temporarily run hotter (and faster), but the device will eventually throttle the performance if it is running too hot. This means that benchmarking results are not that interesting unless you know how long that performance can be sustained, and measurements such as peak performance tend to say just how over-dimensioned the hardware is.

I find it amusing that the discussion in the paper suggest that you want high performance numbers
[...] benchmarks with long running time may trigger thermal gating in certain cases, resulting in lower performance result.
I usually want realistic results, but wanting high numbers makes sense if you (as the authors) work at a hardware vendor and want a high number in the marketing material. They also suggest
Running the benchmark in a temperature-controlled environment is one option; if such an option is not available, adding idle periods between workloads may reduce chances of high system temperature.
which is especially important if your SoC has a heat problem. 😏

#### Dynamic Voltage and Frequency Scaling

Power consumption increases roughly quadratically with clock frequency1 — for example, raising the frequency from 600MHz to 700MHz on Exynos 5433 increases power consumption by 42%. This means it is better to lower the clock frequency and keep the GPU running 100% of the time instead of, for example, running it on full speed 75% of the time and being idle the remaining 25%.

This performance tuning is done by Dynamic Voltage and Frequency Scaling (DVFS). It is hard to make a good implementation as different applications have different requirements, and there is no "correct" tradeoff. For example, should a high end game run at full speed, and be throttled (i.e. reduced to unplayable frame rate) when the device overheats, or should it run on a lower sustainable speed from the beginning? Different device vendors implement DVFS in different ways, so two phones with the same GPU may behave differently.

Different operations need a different amount of power, and a good DVFS implementation uses this when adjusting the voltage and frequency. For example, memory operations consumes much more power than arithmetic operations, and this is used in Exynos to use lower voltage/frequency for shaders using more memory operations. This is "fun" when optimizing shaders, as a faster shader (as measured in number of clock cycles) does not need to run faster in reality if it uses more power hungry instructions and thus get a lower clock frequency.

#### Power- and area-efficiency

GPU workloads are embarrassingly parallel, so it is easy to double the performance of the hardware if you are allowed to increase the power and chip area — just place two identical GPUs in the package! In the same way, you can get much improved power efficiency by using two GPUs and running them with halved frequency. This means that you need to look at metrics such as "performance per area power" when comparing GPU architectures.

This is annoying when developing GPUs, as most ideas for improving the performance means that some hardware block becomes more complicated, with the result that the size increases and it consumes more power. And it does not make much sense making the GPU 10% faster if it also need 10% more area and power...

1. The dynamic power consumption is actually $$P_{dyn}=CV^{2}f$$ where $$C$$ is capacitance, $$V$$ is voltage, and $$f$$ is frequency. This varies linearly with the frequency, but increased frequency need a higher voltage, and the power consumption thus varies superlinearly with the frequency.