CYK Algorithm
A bottom-up parsing algorithm to turn a series of token into a parse tree for a grammar. Or, to be exact, it's an algorithm for telling whether a string may be derived from a grammar. The parse-tree can be recovered by keeping back-pointers.
The grammar needs to be in Chomsky Normal Form, i.e. all rules must be of the form \(A \rightarrow BC\) or \(A \rightarrow a\), where \(A\) is a non-terminal and \(a\) is a terminal.
This is a dynamic programming algorithm. We keep a table \(P[n,n,r]\) where \(n\) is the length of the string and \(r\) is the number of production rules. The entry \(P[i,j,k]\) is 1 if we are able to derive the sub-string of length \(i\) that starts at position \(j\) if the first line of the derivation begins with the \(k\) th non-terminal
We build the table starting with \(P[1,:,:]\). At layer \(P[l,:,:]\), we can solve each entry using the solutions to the sub-problems stored in \(P[l-1,:,:]\).
At the end of the algorithm, we return true if \(P[n,1,S]\) is 1
The algorithm can be written in pseudo-python as:
#initialize P #given input string x for s in range(0,n): for every production rule R_a -> c: if x[s] == c: P[1,s,a] = 1 for l in range(1,n): #l is the length of the sub-string for s in range(0,n-l+1): #s is the start of the sub-string for p in range(1, l-1)# p is the partition of the sub-string for every production rule R_a -> R_b R_c: if P[p,s,b] and P[l-p,s+p,c]: P[l,s,a] = 1