I am now writing c compiler to learn how a compiler works.
C Compiler translates source code(strings) into assembly language.
First of all, this Compiler breaks strings into tokens, and then parses strings into nodes using tokens, and finally generates assembly.
- Tokenizer
- Parser
- Code Generator
Tokenizer
Tokenizer traverses strings and creates a linkedlist of tokens.
Parser
Parser parses strings into nodes.
To implement parser, we need compiler design. Normally we use BNF(Backus-Naur Form) notation to design compiler.
BNF notation is like
expr = num ("+" num | "-" num)*
This is a plain example for addtion and subtraction. This expression takes a number, and add or subtract another number if it is provided(* means 0 or more).
Code Generator
Code generator generates assembly.
The preivous BNF example is translated to assembly like
.intel_syntax noprefix
.globl main
main:
push 2
push 3
pop rdi
pop rax
add rax, rdi
push rax
pop rax
ret
It means 2 + 3
, which is implemented as stack machine.
It pushes 2 numbers, 2
, 3
, assigns 3
to rdi
(a assembly register), 2
to rax
(a assembly register).
At last it calculates 2 + 3
and assigns result to rax
.
In this example the next 2 statements can be ignored, since it just does rax = rax
.
ret
returns result, which is 5
.
Progress
I implemented math calculation so far.
My source code is uploaded on github.