Top-Down Parsing

Possible to fail if you replace the start symbol with the wrong production. We need to predict which production to choose, preferably by looking only at one symbol.

An LL(k) grammar can be parsed by a top-down parser with a max k-token lookahead.

An LL(*) grammar can be parsed by a top-down parser with unbounded lookahead.

LL(1) is best.

FIRST sets

  • FIRST($\alpha$) is the set of tokens (or $\epsilon$) that appear as the first symbol in some string generated from $\alpha$

  • FIRST($\alpha$) $\equiv$ {c : $\alpha$ $\rightarrow$* c $\beta$ }, where $\alpha$ and $\beta$ are arbitrary strings of symbols while c is a terminal

FIRST sets

FIRST sets and LL(1)

To predict the production to use, we need to satisfy: The FIRST set of all productions with the same left hand sides are disjoint

FOLLOW sets

FOLLOW(A) is the set of all terminal symbols that can immediately follow a subsequence derived from A in a sequence derived from the start symbol, S.

FOLLOW(A) $\equiv$ {c : S $\rightarrow^{+}$ $\alpha$ A c $\beta$ }

FIRST and FOLLOW sets and LL(1)

Additional condition: for all non-terminals A: if A $\rightarrow$* $\epsilon$, then FIRST(A) $\cap$ FOLLOW(A) = {}

Predict sets

Predict sets

LL(1) rule

The predict sets of all producions with the same left side are disjoint

Necessary and sufficient for a grammar to be LL(1).

We use predict sets to determine whether a grammar is LL(1), so that we can use a top-down parser on that grammar.