Tuesday, December 12, 2017

More about GCC instruction patterns

This is part five of a series “Writing a GCC back end”.

The previous post about GCC’s low-level IR did only contain the minimum to get started – this post continues with a bit more of the functionality used in machine descriptions.

define_expand

The define_insn definition describes an insn implementing the named functionality (used when converting the higher level IR to RTL), but there are cases where the named functionality must expand to more than one insn, or when the RTL should be generated differently depending on the operands – this is handled by define_expand.

The general format of define_expand is essentially the same as for define_insn – it consists of a name, an RTL template, a condition string containing C++ code to enable/disable the instruction pattern, and a section of C++ code (called preparation statements) that can be used to generate RTL.

Our first example of how to use define_expand is from the Arm handling of rotlsi3 (“rotate left”) – Arm does only have a “rotate right” instruction, but rotating left by x is the same as rotating right by 32-x, and we can generate this as:
(define_expand "rotlsi3"
  [(set (match_operand:SI              0 "s_register_operand" "")
        (rotatert:SI (match_operand:SI 1 "s_register_operand" "")
                     (match_operand:SI 2 "reg_or_int_operand" "")))]
  "TARGET_32BIT"
  {
    if (CONST_INT_P (operands[2]))
      operands[2] = GEN_INT ((32 - INTVAL (operands[2])) % 32);
    else
      {
        rtx reg = gen_reg_rtx (SImode);
        emit_insn (gen_subsi3 (reg, GEN_INT (32), operands[2]));
        operands[2] = reg;
      }
  })
The preparation statements are used to modify the second operand given to rotlsi3 to calculate the correct value for the code generated by the RTL template.

define_expand may choose to skip the generation of the RTL template by executing DONE in the preparation statements. An example of how this is used is the Moxie implementation of mulsidi3 that creates a 64-bit result from multiplying two 32-bit values. The Moxie back end generates this pattern as two instructions (one that calculates the upper 32 bits, and one that calculates the lower 32 bits) where all work is done by the preparation statements
(define_expand "mulsidi3"
  [(set (match_operand:DI 0 "register_operand" "")
        (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "0"))
                 (sign_extend:DI (match_operand:SI 2 "register_operand" "r"))))]
  ""
  {
    rtx hi = gen_reg_rtx (SImode);
    rtx lo = gen_reg_rtx (SImode);

    emit_insn (gen_mulsi3_highpart (hi, operands[1], operands[2]));
    emit_insn (gen_mulsi3 (lo, operands[1], operands[2]));
    emit_move_insn (gen_lowpart (SImode, operands[0]), lo);
    emit_move_insn (gen_highpart (SImode, operands[0]), hi);
    DONE;
  })
Note: This define_expand contains an RTL template, but it is not needed as RTL will not be generated from it. It would have been enough to specify the operands, as in
(define_expand "mulsidi3"
  [(match_operand:DI 0 "register_operand")
   (match_operand:SI 1 "register_operand")
   (match_operand:SI 2 "register_operand")]
  ""
  {
    rtx hi = gen_reg_rtx (SImode);
    rtx lo = gen_reg_rtx (SImode);

    emit_insn (gen_mulsi3_highpart (hi, operands[1], operands[2]));
    emit_insn (gen_mulsi3 (lo, operands[1], operands[2]));
    emit_move_insn (gen_lowpart (SImode, operands[0]), lo);
    emit_move_insn (gen_highpart (SImode, operands[0]), hi);
    DONE;
  })

DONE works as a return statement, and it can be used in conditional code to let the preparation statements override the RTL template for special cases while using the RTL template for the general case. The Arm smaxsi3 instruction pattern (calculating the maximum of two signed integers) uses this to handle two special cases more efficiently:
(define_expand "smaxsi3"
  [(parallel [
    (set (match_operand:SI 0 "s_register_operand" "")
         (smax:SI (match_operand:SI 1 "s_register_operand" "")
                  (match_operand:SI 2 "arm_rhs_operand" "")))
    (clobber (reg:CC CC_REGNUM))])]
  "TARGET_32BIT"
  {
    if (operands[2] == const0_rtx || operands[2] == constm1_rtx)
      {
        /* No need for a clobber of the condition code register here.  */
        emit_insn (gen_rtx_SET (operands[0],
                                gen_rtx_SMAX (SImode, operands[1],
                                              operands[2])));
        DONE;
      }
  })
The problem it solves is that the general case translates to the instructions
cmp   r1, r2
movge r0, r2
movlt r0, r1
which clobbers the condition code, but the special cases max(x,0) and max(x,-1) can be generated as
bic   r0, r1, r1, asr #31
and
orr   r0, r1, r1, asr #31
which do not clobber CC. This optimization could be handled by later optimization passes, but doing it as early as possible (i.e. when generating the RTL) gives more freedom to all later passes to optimize cases that otherwise would have been prevented by the clobbering.

Note: The RTL always have the constant as the second operand for commutative binary operations, so the code does not need to check the first operand.

One other way to return from the preparation statements is to execute FAIL which has the effect of ignoring the instruction pattern in the same way as if the pattern had been disabled by the condition string. An example of this is the AArch64 movmemdi instruction pattern implementing a memory block move:
(define_expand "movmemdi"
  [(match_operand:BLK 0 "memory_operand")
   (match_operand:BLK 1 "memory_operand")
   (match_operand:DI 2 "immediate_operand")
   (match_operand:DI 3 "immediate_operand")]
  "!STRICT_ALIGNMENT"
  {
    if (aarch64_expand_movmem (operands))
      DONE;
    FAIL;
  })
This calls the target-specific aarch64_expand_movmem function that checks if it makes sense to expand the block move inline (that is, if the result will be relatively small) and generates a sequence of move insns if that is the case. If not, it just returns false, which makes this pattern call FAIL, and GCC will ignore this instruction pattern and generate a call to memcpy instead.

The unspec and unspec_volatile expression codes

The RTL template in define_insn contains expressions describing the functionality of the instruction pattern, which enables the optimizers to reason about the insn. Many architectures have instructions that cannot be described by this (or where it does not make sense to describe the functionality, such as instructions for AES encryption – the optimizers cannot take advantage of this description anyway). These cases are handled by describing the instructions using an unspec or unspec_volatile expression which the compiler treats as a black box – the only knowledge the compiler has is what is described by the predicates and register constraints.

One example of how this is used the AArch64 set_fpsr insn that writes to the floating-point status register
(define_insn "set_fpsr"
  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] UNSPECV_SET_FPSR)]
  ""
  "msr\\tfpsr, %0")
This describes a volatile instruction (i.e. an instruction with side effects) that takes a register operand. The last operand to the unspec and unspec_volatile expressions is an integer that identifies the instruction (the backend may have several different unspec instructions, and each gets a different number) – these are by convention defined as enumerations called unspec and unspecv
(define_c_enum "unspecv" [
    UNSPECV_EH_RETURN           ; Represent EH_RETURN
    UNSPECV_GET_FPCR            ; Represent fetch of FPCR content.
    UNSPECV_SET_FPCR            ; Represent assign of FPCR content.
    UNSPECV_GET_FPSR            ; Represent fetch of FPSR content.
    UNSPECV_SET_FPSR            ; Represent assign of FPSR content.
    UNSPECV_BLOCKAGE            ; Represent a blockage
    UNSPECV_PROBE_STACK_RANGE   ; Represent stack range probing.
  ])

Attributes

It is possible to add extra information to an insn (such as the length or scheduling constraints) that the back end may take advantage of when generating the code. This is done by defining attributes
(define_attr name list-of-values default)
where
  • name is a string containing the name of the attribute.
  • list-of-values is either a string that specifies a comma-separated list of values that can be assigned to the attribute, or an empty string to specify that the attribute takes numeric values.
  • default is an expression that gives the value of this attribute for insns whose definition does not include an explicit value for the attribute.
For example,
(define_attr "length" "" (const_int 2))
defines a numeric attribute “length” having the default value 2. The back end may define attributes with any name, but a few names have a specific usage in GCC. For example, the length attribute contains the instruction’s length measuerd in bytes, and is used when calculating branch distance for architectures where different instruction are used for “short” and “long” branches.

Attribute values are assigned to insns by attaching a set_attr to the define_insn as in
(define_insn "*call"
  [(call (mem:QI (match_operand:SI 0 "nonmemory_operand" "i,r"))
         (match_operand 1 "" ""))]
  ""
  "@
   jsra\\t%0
   jsr\\t%0"
  [(set_attr "length" "6,2")])
This gives the length attribute the value 6 if the first alternative was matched, and 2 if the second alternative was matched.

The attributes can be accessed from C++ code by calling the auto-generated function get_attr_name, as in
int len = get_attr_length (insn);
The return type of get_attr_name for attributes defined with a list-of-values is an enum of the possible values.

Further reading

All functionality is described in “GNU Compiler Collection Internals”: