Tuesday, April 7, 2015

Comments on the SPIR-V provisional specification

Below are some random comments/thoughts/questions from my initial reading of the SPIR-V provisional specification (revision 30).

Many of my comments are that the specification is unclear. I may agree that it is obvious what the specification mean, but my experience from specification work is that it is often the case that everybody agree that it is obvious, but they do not agree on what the obvious thing is. So I think the specification need to be more detailed. Especially as one of the goals of SPIR-V is to "be targeted by new front ends for novel high-level languages", and those may generate constructs that are not possible in GLSL or OpenCL C, so it is important that all constraints are documented.

Some other comments are related to tradeoffs. I think the specification is OK, so my comments are mostly highlighting some limitations (and I may have chosen a different tradeoff for some of them...). It would be great to have the rationale described for this kind of decisions.

Const and Pure functions

Functions can be marked as Const or Pure. Const is described as
Compiler can assume this function has no side effects, and will not access global memory or dereference function parameters. Always computes the same result for the same argument values.
while Pure is described as
Compiler can assume this function has no side effect, but might read global memory or read through dereferenced function parameters. Always computes the same result for the same argument values.
I assume the intention is that the compiler is allowed to optimize calls to Const functions, such as moving function calls out of loops, CSE:ing function calls, etc. And similar for the Pure functions, as long as there are no writes to global memory that may affect the result.

But the specification talks about "global memory" without defining what it is. For example, is UniformConstant global variables included in this? Those cannot change, so we can do all the Const optimizations even if the function is reading from them.  And what about WorkgroupLocal? That name does not sound like global memory, but it does of course still prevent optimizations.

I would suggest the specification change to explicitly list the storage classes permitted in Const and Pure functions...

Storage Classes

I'm a bit confused by the Uniform and Function storage classes...

The Uniform storage class is a required capability for Shader. But the GLSL uniform is handled by the UniformConstant storage class, so what is the usage/semantics of Uniform?

Function is described as "A variable local to a function" and is also a required capability for Shader. But OpenCL does also have function-local variables... How are those handled? Why are they not handled in the same way for Shader and Kernel?

Restrict

The Restrict decoration is described as
Apply to a variable, to indicate the compiler may compile as if there is no aliasing.
This does not give you the full picture, as you can express that pointers do not alias as described in the Memory Model section. But pointers have different semantics compared to variables, and that introduces some complications.

OpenCL C defines restrict to work in the same way as for C99, and that is different from the SPIR-V specification. What C99 says is, much simplified, that a value pointed to by a restrict-qualified pointer cannot be modified through a pointer not based on that restrict-qualified pointer. So two pointers can alias if the have the correct "based-on" relationship, and are following some rules on how they are accessed. The frontend may of course decide to not decorate the pointers when it cannot express the semantics in the IR, but it is unclear to me that it is easy to detect the problematic cases.

I think this needs to be clarified along the line of what the LLVM Language Reference Manual does for noalias.

Volatile

There is a Memory Access value Volatile that is described as
This access cannot be optimized away; it has to be executed.
This does not really make sense... The memory model is still mostly TBD in the document, but the principle in GPU programming is that you need atomics or barriers in order to make memory accesses consistent. So there is no way you can observe the difference between the compiler respecting Volatile or not.

My understanding is that the rationale for Volatile in SPIR-V is to be able to work around compiler bugs by decorating memory operations with Volatile and in that way disable some compiler transformations. If so, then I think it would be useful to document this in order to make it more likely that compilers do the right thing. After all, I would expect the project manager to tell the team to do more useful work than fixing a bug for which you cannot see the difference between correct and incorrect behavior.

It has historically been rather common that C compilers miscompile volatile. A typical example is for optimizations such as store forwarding, that substitutes a loaded value by a previously stored value, where the developer forgets to check for volatility when writing the optimization. So a sequence such as
 7:             TypeInt 32 1
15:      7(int) Constant 0
                Store 14(tmp) 15 
16:      7(int) Load 11(b) 
17:      7(int) Load 14(tmp) 
18:      7(int) IMul 16 17
                Store 10(a) 18
corresponding to
volatile int tmp = 0;
a = b * tmp;
gets the ID 17 substituted by the constant 0, and is then optimized to
 7:             TypeInt 32 1
15:      7(int) Constant 0
                Store 14(tmp) 15 
17:      7(int) Load 14(tmp) 
                Store 10(a) 15
which is not what it is expected. But you can argue that this actually follows the SPIR-V specification — we have not optimized away the memory accesses!

Volatile and OpenCL

The OpenCL C specification says that
The type qualifiers const, restrict and volatile as defined by the C99 specification are supported.
which I interpret as volatile works in exactly the same way as for C99. And C99 says
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-defined.
That is, the compiler is not allowed to reorder volatile memory accesses, even if it know that they do not alias. So the definition of the SPIR-V Volatile need to be strengthened if that is meant to be used for implementing the OpenCL volatile. Although I guess you may get around this by a suitable implementation-defined definition of what constitutes an access to an object...

Differences between graphical shaders and OpenCL

The Validation Rules says that for graphical shaders
  • Scalar integer types can be parameterized only as:
  • – 32-bit signed
    – 32-bit unsigned
while OpenCL cannot use signed/unsigned
  • OpTypeInt validation rules
    – The bit width operand can only be parameterized as 8, 16, 32 and 64 bit.
    – The sign operand must always be 0
I guess this lack of signed/unsigned information is the reason why there are Function Parameter Attributes called Zext and Sext described as
Value should be zero/sign extended if needed.
Both choices regarding the signed/unsigned information are fine for an IR, but why is SPIR-V treating graphics and OpenCL differently?

Endianness

Khronos thinks that SPIR-V is an in-memory format, not a file format, which means that the words are stored in the host's native byte order. But one of of the goals of SPIR-V is "enabling shared tools to generate or operate on it", so it will be passed in files between tools. The specification has a helpful hint that you can use the magic number to detect endianness, but that means that all tools need to do the (admittedly simple) extra work to handle both big and little endian.

I think that the specification should define one file format encoding (preferably with a standardized file name extension), and say that all tools should use this encoding.

By the way, are there really any big endian platforms in the target market?

No comments:

Post a Comment