The paper "The Page-Fault Weird Machine: Lessons in Instruction-less Computation" show that you can run programs on the X86 fault handling mechanism, i.e. without executing any instructions on the CPU!
The construction of this weird machine is using two properties of the IA-32 architecture:
A simplified description of the construction is that each register is represented by a task where the stack pointer contains the register's value multiplied by four. Each instruction is represented by an interrupt vector that switches to new tasks (including a new interrupt vector), where #DF switches to the task corresponding to
The stack pointer needs to point to valid memory (or we will get other interrupts), so the registers can only have values that are represented as valid stack addresses, and we cannot represent 32-bit values even if we map all memory for the stack (the decrement is done by subtracting four from the stack pointer, so we lose the two least significant bits...). It is natural to just map page 0 for the stack to give us registers that are 10 bits wide.
It is possible to make the registers wider by adding more pages to the stack, but it does not help much; the machine can only do subtraction, so e.g. addition
The blog post is continued in part 2 that describes the technical details.
1.
The construction of this weird machine is using two properties of the IA-32 architecture:
- Task switching
The IA-32 has hardware support for task switching — storing the CPU state (registers and status flags), and resuming execution with a new memory map and previously stored CPU state. The interrupt controller may be configured so that a task switch happens when an exception is raised. - Interrupts writes an error code to the stack
The Page-Fault Exception (#PF) writes a 32-bit error code to the stack and decrements the stack pointer before handling the exception, which lets us modify the stack pointer without executing any CPU instructions. The stack pointer cannot wrap, so a Double Fault Exception (#DF) is raised instead of #PF if the stack pointer is less than four.
r0
, r1
, ..., and a move-branch-if-zero-or-decrement instructionmovdbz rdest, rsrc, label_nonzero, label_zerothat is doing the work corresponding to
if (rsrc == 0) { rdest = rsrc; goto label_zero; } else { rdest = rsrc - 1; goto label_nonzero; }The
movdbz
instruction is Turing-complete (see e.g. Wikipedia "One instruction set computer"1, or this blog post that describes a BF compiler that only generates movdbz
instructions). The paper comes with an implementation in trapcc, and I have made a somewhat simpler and more user-friendly implementation in instless_comp.2A simplified description of the construction is that each register is represented by a task where the stack pointer contains the register's value multiplied by four. Each instruction is represented by an interrupt vector that switches to new tasks (including a new interrupt vector), where #DF switches to the task corresponding to
label_zero
and #PF switches to label_nonzero
. All tasks have invalid program counters, so the new task will raise a new #PF exception (or #DF if the stack pointer is zero), and the machine steps into the next instruction. The implementation is however much more involved, and I describe the construction in detail in a separate blog post.The stack pointer needs to point to valid memory (or we will get other interrupts), so the registers can only have values that are represented as valid stack addresses, and we cannot represent 32-bit values even if we map all memory for the stack (the decrement is done by subtracting four from the stack pointer, so we lose the two least significant bits...). It is natural to just map page 0 for the stack to give us registers that are 10 bits wide.
It is possible to make the registers wider by adding more pages to the stack, but it does not help much; the machine can only do subtraction, so e.g. addition
a = b + c;need to be done by subtracting from a maximum value
a = MAX_VAL - (MAX_VAL - b - c);And the machine can only decrement by one, so the subtractions need to be done by looping
tmp = MAX_VAL; while (b--) tmp--; while (c--) tmp--; a = MAX_VAL; while (tmp--) a--;That is, the running time of addition is exponential in the bit width of the registers, so wide registers are not that useful.
The blog post is continued in part 2 that describes the technical details.
1.
movdbz
reduces trivially to the subtract-and-branch-if-nonzero instruction.
2. The trapcc need to do global analysis (graph coloring etc.) when generating the instructions in order to satisfy some annoying hardware constraints. My implementation uses a less efficient instruction encoding that makes it possible to handle each instruction independently (and makes it easier to write assembly manually).
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.