program semantics
From 6.820, see 2021-09-15
1. lambda calculus
For the lambda calculus, how do we give meanings to terms?
- the semantics should:
- distinguish between terms that are not equal
- e.g. church numeral for 0 and church numeral for 1
- equate terms that should be equal
- e.g. the terms that correspond to
(plus 0 1)
and1
- e.g. the terms that correspond to
- distinguish between terms that are not equal
- A semantics is said to be fully abstract if, for any two terms that the semantics deem un-equal, there is a term that can tell them apart
1.1. information content of a term
- instantaneous information – a term obtained by replacing each redex (reducible expression) with a \(\perp\)
- the meaning of a term is the maximum instaneous information you can get using \(\beta\) reductions
- NOTE: I haven't seen the term instantaneous information used outside of 6.820
1.1.1. example
The term \(\lambda p. \; p \;((\lambda z. \;z)\; a)\) has the instantaneous information: \(\lambda p. \; p \; \perp\)
1.2. Normal form
- if the term cannot be \(\beta\) reduced, then it is in normal form
- example: \(\Omega = (\lambda x. x x) (\lambda x. x x)\)
1.3. Church-Rosser property
- What if a term has more than one normal form?
- So far, our semantics have been defined non-deterministically: "just do \(\beta\) reduction wherever possible however you like"
- It turns out that this is not possible in \(\lambda\) -calculus because of confluence
- If \(E \twoheadrightarrow E_2\) and \(E \twoheadrightarrow E_1\), then there exists \(E_3\) such that \(E_2 \twoheadrightarrow E_3\) and \(E_1 \twoheadrightarrow E_3\)
- So if \(E\) reduces to two normal forms \(E_1\) and \(E_2\) which are not equal, then they cannot be reduced, which violates Church-Rosser
- NOTE: if you apply \(\beta\) reductions and reach a normal form, then any sequence of \(\beta\) reductions that reaches a normal form will reach the same normal form. However, this does not imply that any sequence of \(\beta\) reductions will reach a normal form. Sometimes they might diverge.
- example: \((\lambda x.y) \Omega\)
- could apply \(\beta\) reduction forever on \(\Omega\)
- Or could evaluate \((\lambda x. y)\) on \(\Omega\) to get \(y\) immediately
- example: \((\lambda x.y) \Omega\)
1.4. denotational semantics
Associate each term wtih an abstract mathematical object (see church numerals).
1.5. operational semantics
- see Big Step Semantics
- we want an interpreter – a program to reduce \(\lambda\) -expressions to "answers"
- so now, we should find
- the definition of an answer
- a reduction strategy
1.5.1. Defining "answers"
- Normal form – as discussed above: no redexes
- Head normal form (HNF)
- \(x\) is HNF
- \((\lambda x. E)\) is in HNF if \(E\) is in HNF
- \((x E_1 \dots E_n)\) is in HNF
- Informally: "if I have a function \(x\) applied to a series of arguments, but I don't actually know \(x\), then this is as applied as I can get. I don't actually need to reduce the arguments"
- everything in normal form is in head normal form. The converse doesn't hold, since \(E_1...E_n\) may contain terms that can be applied to each other.
- Weak head normal form
- \(x\) is WHNF
- \((\lambda x. E)\) is in WHNF
- \((x E_1 \dots E_n)\) is in WHNF
- Informally: "If I don't even know what the argument to a function is going to be, I'm not going to bother reducing it further"
1.5.2. Defining "reduction strategies"
- applicative order
- evaluate the right-most innermost redex first
- evaluate all params before evaluating function
- e.g. call by value
- this is what most programming languages (but not haskell) do
- normal order
- evaluate the left-most (outermost) redex first
- substitute in the params directly into the calling function
- e.g. call by name
1.5.3. Computing a normal form
- Not every \(\lambda\) -expression will have an answer (i.e. a NF, HNF, WHNF)
- Even if an expression does have an answer, not all reduction strategies will produce it
- This makes us ask: "If call by value can in some cases does not normalize, how do modern languages work?"
- Answer: even though they might treat most functions as call-by-value, the language also includes constructs that are not call-by-value, e.g.
if (e) then e1 else e2
. Here, we do not evaluatee
,e1
, ande2
before evaluating theif
.
- Answer: even though they might treat most functions as call-by-value, the language also includes constructs that are not call-by-value, e.g.
1.5.3.1. theorem
- Normalizing strategy
- if a reduction strategy always finds an answer where one exists, then it is normalizing
- It turns out that the leftmost redex strategy is normalizing for WHNF and the \(\lambda\) calculus