I am now writing c compiler to learn how a compiler works.

C Compiler translates source code(strings) into assembly language.

First of all, this Compiler breaks strings into tokens, and then parses strings into nodes using tokens, and finally generates assembly.

  • Tokenizer
  • Parser
  • Code Generator

Tokenizer

Tokenizer traverses strings and creates a linkedlist of tokens.

Parser

Parser parses strings into nodes.

To implement parser, we need compiler design. Normally we use BNF(Backus-Naur Form) notation to design compiler.

BNF notation is like

expr = num ("+" num | "-" num)*

This is a plain example for addtion and subtraction. This expression takes a number, and add or subtract another number if it is provided(* means 0 or more).

Code Generator

Code generator generates assembly.

The preivous BNF example is translated to assembly like

.intel_syntax noprefix
.globl main
main:
    push 2
    push 3
    pop rdi
    pop rax
    add rax, rdi
    push rax
    pop rax
    ret

It means 2 + 3, which is implemented as stack machine.

It pushes 2 numbers, 2, 3, assigns 3 to rdi(a assembly register), 2 to rax(a assembly register). At last it calculates 2 + 3 and assigns result to rax. In this example the next 2 statements can be ignored, since it just does rax = rax. ret returns result, which is 5.

Progress

I implemented math calculation so far.

My source code is uploaded on github.