TODO: Read 2.2 to 2.4 in Scott

Context Free Grammars

REs don’t support recursion. For more complex expressions, we specify the language with a grammar using BNF (Backus-Naur Form). BNF allows us to specify context-free-grammars.

A grammar is a tuple ($\Sigma$, N, P, S) where:

  • $\Sigma$ is a set specifying the alphabet of tokens (aka terminal symbols)
  • N is the set of non-terminal symbols; variables that denote sets of sentences
  • P is a set of productions (substitution rules)
  • S is the start symbol where $S \in N$

In general, BNF spec is a set of derivation rules, written as:

<symbol> ::= __expression__

EBNF adds |, *, + from RE to BNF.

Non-context Free Grammars

Ex. The rule $\alpha$A$\beta$ ::= $\alpha\sigma\beta$ only allows A to be replaced by $\sigma$ when it appears between $\alpha$ and $\beta$.

Programming languages are NOT context free, but we use context-free grammars anyway bc it is easier. We handle the context separately.

Parsing

While grammars give rules for generating sentences in a language, parsing is the process of recognizing the language (and determining its structure). Parse trees are a way of representing the structure of a sentence.

Ambiguous Grammars

Admits more than one parse tree for a sentence. Sad! Add rules to disambiguate. Or change the language! Ex. Explicitly terminate if statements.

Parsing Complexity

There are $O(n^3)$ algorithms for any context-free grammar. For useful subsets of CFG, there are parsers that are linear. We are interested in LL (left-to-right, leftmost derivation) top-down grammars.

Def: A derivation of a string for a grammar is a sequence of grammar rule applications that transforms the start symbol into the string. A derivation proves that the string belongs to the grammar’s language.

In a leftmost derivation, the next nonterminal to rewrite is always the leftmost.

An LR (left-to-right, rightmost derivation) does bottom-up parsing.