Compiler Example: C to Assembly
The gist: Use recursive descent to compile statements and expressions in C into assembly. Assume we have a stream of tokens as input. Two key things are needed:
- A way to parse:
- statements:
- Unconditional statements
- Conditional statements, e.g. if … else
- Compound statements, e.g. statement1; statement2;
- Iteration, e.g. while …
- expressions:
- constants
- variable, e.g.
- arithmetic operations
- assignment
- statements:
- A way to break C code into simple statements that can be parsed. This is recursive descent.
0.1. Examples of parsing expression
0.1.1. To parse the assignment a = <expr>
compile_expression(expr)
and store result in register x withCMOVE
ST(Rx, a)
Here, note that a
is an address into the stack. In assembly, the address would have earlier been set aside, as in a: LONG
0.1.2. To parse the operation <expr_1> + <expr_2>
compile_expression(<expr_1>)
and store result in register xcompile_expression(<expr_1>)
and store result in register yADD(rx, ry, rx)
addrx
andry
and store result inrx
0.2. Examples of parsing a statement
0.2.1. To parse a compound unconditional statement
Given {<statement_1>; <statement_2>}
The compiler does the following:
compile_statement(statement_1)
compile_statement(statement_2)
0.2.2. To parse a conditional
Given
if (<expr>) <statement>
The compiler produce the following:
compile_expression(expr)
and store the result of the computation in register xBF(rx, Lendif)
if rx is false, then branch to the label Lendifcompile_statement(statement)
- Add the label
Lendif:
These examples help me understand how to get from a high-level programming language to assembly, but modern compilers work slightly differently.
1. How to parse:
Recursive descent is a top-down strategy for parsing. Beginning with the start symbol \(S\), the parser attempts to fit all the rules that have \(S\) on the LHS to the input.
Other methods of building the parse tree include bottom up methods, such as the CYK Algorithm.