CYK Algorithm

A bottom-up parsing algorithm to turn a series of token into a parse tree for a grammar. Or, to be exact, it's an algorithm for telling whether a string may be derived from a grammar. The parse-tree can be recovered by keeping back-pointers.

The grammar needs to be in Chomsky Normal Form, i.e. all rules must be of the form \(A \rightarrow BC\) or \(A \rightarrow a\), where \(A\) is a non-terminal and \(a\) is a terminal.

This is a dynamic programming algorithm. We keep a table \(P[n,n,r]\) where \(n\) is the length of the string and \(r\) is the number of production rules. The entry \(P[i,j,k]\) is 1 if we are able to derive the sub-string of length \(i\) that starts at position \(j\) if the first line of the derivation begins with the \(k\) th non-terminal

We build the table starting with \(P[1,:,:]\). At layer \(P[l,:,:]\), we can solve each entry using the solutions to the sub-problems stored in \(P[l-1,:,:]\).

At the end of the algorithm, we return true if \(P[n,1,S]\) is 1

The algorithm can be written in pseudo-python as:

#initialize P
#given input string x
for s in range(0,n):
    for every production rule R_a -> c:
        if x[s] == c:
            P[1,s,a] = 1
for l in range(1,n): #l is the length of the sub-string
    for s in range(0,n-l+1): #s is the start of the sub-string
        for p in range(1, l-1)# p is the partition of the sub-string
            for every production rule R_a -> R_b R_c:
                if P[p,s,b] and P[l-p,s+p,c]:
                    P[l,s,a] = 1