Friday, August 4, 2017

Writing a GCC back end

It is surprisingly easy to design a CPU (see for example Colin Riley’s blog series) and I was recently asked how hard it is to write a GCC back end for your new architecture. That too is easy — provided you have done it once before. But the first time is quite painful...

I plan to write some blog posts the coming weeks that will try to ease the pain by showing what is involved in creating a “working” back end that is capable of compiling simple functions, give some pointers to how to proceed to make this production-ready, and in general provide the overview I would have liked before I started developing my backend (GCC has a good reference manual, “GNU Compiler Collection Internals”, describing everything you need to know, but it is a bit overwhelming when you start...)

The series will cover the following (I’ll update the list with links to the posts as they become available) 
  1. The structure of a GCC back end
    • Which files you need to write/modify
  2. Getting started with a GCC back end
    • Pointers to resources describing how to set up the initial back end
  3. Low-level IR and basic code generation
    • How the low-level IR works
    • How the IR is lowered to instructions
    • How to write simple instruction patterns
  4. Target description macros and functions
    • Working with the RTL
    • Describing the registers (names, register classes, allocation order, ...)
    • Addressing modes
  5. More about instruction patterns
    • define_expand, define_split, and define_peephole2
    • The unspec expression code
    • Attributes
  6. Pipeline description
  7. Cost model

10 comments:

  1. Replies
    1. Please write an article about how to do this with LLVM. Then we can compare both and make a decision to what is the better technology.

      Delete
    2. Consider reading https://llvm.org/docs/WritingAnLLVMBackend.html - looks like a fairly well defined task.

      Delete
    3. GPL vs BSD could be a reason. I know most people don't agree but that's a long debate. Citing Stallman: 'For GCC to be replaced by another technically superior compiler that defended freedom equally well would cause me some personal regret, but I would rejoice for the community's advance. The existence of LLVM is a terrible setback for our community precisely because it is not copylefted and can be used as the basis for nonfree compilers — so that all contribution to LLVM directly helps proprietary software as much as it helps us.'"

      Delete
    4. I write about GCC because I think there are too much focus on LLVM... Both GCC and LLVM are very capable compilers with different strengths and weaknesses – LLVM is the best choice for some use cases and GCC for others.

      For example, the GCC backend support is very flexible, and it is much easier to add “strange” architectures to GCC than to LLVM. I know that Embecosm has tried to improve the situation for LLVM (see e.g. their work with the AAP architecture), but my understanding is that there are still much work left to do in LLVM.

      Delete
    5. LLVM is good documented but GCC is hard to introduce to new people, Both preformance mostly same but only difference is License.

      Delete
    6. I think GCC in many ways have better documentation than LLVM, such as the user manual “Using the GNU Compiler Collection” and “GNU Compiler Collection Internals” that describes the optimization passes and IR.

      But I agree that some documentation is missing between those two documents (and this blog series is trying to improve the situation slightly...). What documentation do you think is missing?

      Delete
  2. Awesome! I've been toying with the idea of writing a GCC backend, but was always intimidated by sheer complexity of the task. Kudos! Looking forward to following your blog!

    ReplyDelete
  3. I am really looking forward to this series of articles. I like making instruction sets and implementing them. I tried to make a GCC backend for one of those instruction sets back early this year, but I never made it past assignment statements (without conditionals).

    ReplyDelete
  4. I have a dream of being able to compile C for the Soviet mainframe BESM-6 (48-bit word-oriented architecture with 6 chars/bytes per word), but several people who have attempted to write a GCC back-end for it, got cold feet. I wonder if it can be done at all if the number of chars per word is not a power of two, integers have reserved bit ranges, etc.

    ReplyDelete