My twitter feed was full of “
printf
is Turing-complete” tweets after the 32c3 talk "New memory corruption attacks: why can't we have nice things?" (see printbf for the paper and the implementation discussed in the video), and that inspired me to read some more articles about Turing-complete1 mechanisms hiding in surprising places...C++ exception stack unwinding is Turing-complete
The DWARF debugging information format is complex, which is needed in order to describe optimized code. For example, an expression such as
Exception handling need to unwind the stack and restore registers, which is functionality that can be represented by the debug information, so DWARF was adopted (with some minor modifications) for the exception handling. The stack unwinding code can thus execute programs "hidden" in the DWARF information while unwinding the stack.
The Turing-completeness of this is not strange in the same way as the others I have written about previously; DWARF is designed to be general! But it is probably surprising to most developers that this functionality exists... The paper "Exploiting the hard-working DWARF: Trojan and Exploit Techniques With No Native Executable Code" looks at what can be done with this. The stack machine is rather easy to work with — here is an example from the paper that calculates the length of a string that is located just below the stack frame:
i+1
may be placed in a register reg
, and i
is not available anywhere in the registers or memory in the optimized program. But the debugger should be able to display the value of i
, so the debug information must be able to express that i
has the value reg-1
. And optimized code can be transformed in much more complex ways than this, so DWARF does (surprisingly) define a stack machine that can do arbitrary computation in order to be able to represent the information.Exception handling need to unwind the stack and restore registers, which is functionality that can be represented by the debug information, so DWARF was adopted (with some minor modifications) for the exception handling. The stack unwinding code can thus execute programs "hidden" in the DWARF information while unwinding the stack.
The Turing-completeness of this is not strange in the same way as the others I have written about previously; DWARF is designed to be general! But it is probably surprising to most developers that this functionality exists... The paper "Exploiting the hard-working DWARF: Trojan and Exploit Techniques With No Native Executable Code" looks at what can be done with this. The stack machine is rather easy to work with — here is an example from the paper that calculates the length of a string that is located just below the stack frame:
# Value at −0x8(%rbp) on stack DW_OP_breg6 −8 DW_OP_lit0 # initial strlen DW_OP_swap DW_OP_dup LOOP: DW_OP_deref_size 1 # Branch if top of stack nonzero DW_OP_bra NOT_DONE DW_OP_skip FINISH NOT_DONE: # Increment the counted length DW_OP_swap DW_OP_lit1 DW_OP_plus DW_OP_swap # Add length to char pointer DW_OP_plus DW_OP_skip LOOP FINISH: # Finally put the character count on the top of the stack # as return value DW_OP_swap
ELF metadata is Turing-complete
The Runtime Linker/Loader (RTLD) is responsible for loading shared libraries into a process' address space and updating the symbol references in the object code using the relocation and symbol tables. The RTLD can be viewed as a machine that is executing the relocation tables, and the paper "“Weird Machines” in ELF: A Spotlight on the Underappreciated Metadata" shows that this machine is Turing-complete by constructing an assembly-like language composed of three instructions
There are two kinds of structures involved in the relocation:
Branch instructions are implemented by modifying the data RTLD is using while processing the relocations. The RTLD is maintaining a doubly linked list of
The only thing remaining is a conditional branch instruction. The paper constructs the
mov
— moveadd
— additionjnz
— jump if not zero
There are two kinds of structures involved in the relocation:
- The symbol tables contain information about the symbols, where each symbol is described by the
ELF64_Sym
structuretypedef struct { Elf64_Word st_name; /* Symbol name (string table index) */ unsigned char st_info; /* Symbol type and binding */ unsigned char st_other; /* Unused */ Elf64_Section st_shndx; /* Section index */ Elf64_Addr st_value; /* Symbol value */ Elf64_Xword st_size; /* Symbol size */ } Elf64_Sym;
- The relocation tables specify how to modify the binary, where each relocation is described by the
Elf64_Rela
structure
typedef struct { Elf64_Addr r_offset; /* Address to patch */ Elf64_Xword r_info; /* Relocation type and symbol index */ Elf64_Sxword r_addend; /* Addend */ } Elf64_Rela;
R_X86_64_COPY
, R_X86_64_64
, and R_X86_64_RELATIVE
, which have the functionality
R_X86_64_COPY memcpy(base + r.r_offset, s.st_value, s.st_size) R_X86_64_64 *(base + r.r_offset) = s.st_value + r.addend + base R_X86_64_RELATIVE *(base + r.r_offset) = r.r_addend + basewhere
r
is the relocation, s
is the symbol referenced by the relocation, and base
is the ELF object’s base address (that is typically 0
for an executable). These relocations directly correspond to move and addition instructions of the form
mov *0xbeef0000, $0x02 # *(uint64_t*)0xbeef0000 = 2; mov *0xbeef0000, [%foo] # *(uint64_t*)0xbeef0000 = *(uint64_t*)foo; add *0xbeef0000, %foo, $0x02 # *(uint64_t*)0xbeef0000 = foo + 2;For example, an addition instruction
add *0xbeef0000, %foo, $0x02is encoded as a relocation entry with
r_offset=0xbeef0000
, r_addend=2
, and r_info
specifying relocation type R_X86_64_64
and the symbol foo
.Branch instructions are implemented by modifying the data RTLD is using while processing the relocations. The RTLD is maintaining a doubly linked list of
link_map
structures containing information for each ELF object, and it iterates over this list in reverse order when relocating the binary
while (lm != NULL) { r = lm->dyn[DT_RELA]; end = r + lm->dyn[DT_RELASZ]; for (r ; r < end; r++) relocate(lm, r, &dyn[DT_SYM]); lm = lm->prev; }A branch instruction can be implemented by relocations modifying
lm->prev
, lm->dyn[DT_RELASZ]
, or lm->dyn[DT_RELA]
so that an lm
structure is skipped, handled partially, or handled multiple times. There are some complications such as finding the address to modify, and to save/restore the original value to be able to exit the program. I refer to the paper for the details. The only thing remaining is a conditional branch instruction. The paper constructs the
jnz
instruction by noting that symbols of the type STT_GNU_IFUNC
are treated differently depending on if the section index is 0
or not, and the symbol can be initialized so that an add
instruction of the form
add *0xbeef0000, %foo, $0x00has the effect
*(uint64_t*)0xbeef0000 = (s.st_shindx != 0) ? CONSTANT : 0;where
s
is the Elf64_Sym
structure describing foo
. This can now be used when rewriting the end marker lm->dyn[DT_RELASZ]
so that RTLD does not enter the for-loop for the link map when the condition is 0
(and the machine is thus not jumping to the part of the program defined by those relocations), by relocations first writing the condition to test to s.st_shindx
of a temporary symbol s
, and then writing the symbol
(by adding 0
to it) to lm->dyn[DT_RELASZ]
.
1. I'm playing fast and loose with the concept “Turing-complete”, as does most papers I have read. I had expected computer scientists to be more careful, and use terms like "linear bounded automaton" or similar...
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.