Modern Compilers
Modern compilers consist of a front-end and a back-end. The front-end usually maps a high-level source language to an intermediate representation (IR). The back-end performs optimizations on the IR and produces output in a low-level language such as Assembly.
1. Front-End
- Lexical analysis - a lexer takes a stream of characters as input and outputs a string of tokens. The lexer can also identify parts of speech, e.g. symbol name, reserved word, etc.
- Syntactic analysis - a parser takes a stream of tokens and outputs a syntax tree
- Semantic analysis - type checking
2. Intermediate Reprsentation
The syntax tree can be used as an IR. A more useful structure is a Control Flow Graph. Nodes in this graph are blocks of code and edges represent conditional branching.
3. Back-End
- Optimization - unused paths through the CFG can be removed; dead code can be removed; variables which are actually constants can be identified and filled in ahead of time
- Mapping to assembly: each block in the CFG can be translated into a labeled block of assembly. Variables can be loaded into registers