68_Computer_science/68Q05-LLk.tex at master · planetmath/68_Computer_science · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
\documentclass[12pt]{article}
\usepackage{pmmeta}
\pmcanonicalname{LLk}
\pmcreated{2013-03-22 19:00:54}
\pmmodified{2013-03-22 19:00:54}
\pmowner{CWoo}{3771}
\pmmodifier{CWoo}{3771}
\pmtitle{LL(k)}
\pmrecord{8}{41885}
\pmprivacy{1}
\pmauthor{CWoo}{3771}
\pmtype{Definition}
\pmcomment{trigger rebuild}
\pmclassification{msc}{68Q05}
\pmclassification{msc}{68Q42}
\pmclassification{msc}{03D10}
\pmrelated{LRk}

\endmetadata

\usepackage{amssymb,amscd}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{mathrsfs}

% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
\usepackage{amsthm}
% making logically defined graphics
%%\usepackage{xypic}
\usepackage{pst-plot}

% define commands here
\newcommand*{\abs}[1]{\left\lvert #1\right\rvert}
\newtheorem{prop}{Proposition}
\newtheorem{thm}{Theorem}
\newtheorem{ex}{Example}
\newcommand{\real}{\mathbb{R}}
\newcommand{\pdiff}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\mpdiff}[3]{\frac{\partial^#1 #2}{\partial #3^#1}}
\begin{document}
Given a word $u$ and a context-free grammar $G$, how do we determine if $u\in L(G)$?

There are in general two ways to proceed.  Start from $u$, and proceed backward to find $v$ such that $v\Rightarrow u$.  Keep going until a derivation $\sigma\Rightarrow^* u$ is found.  This procedure is known as the bottom-up parsing of $u$.  The other method the top-down approach: begin with the starting symbol $\sigma$, and work its way down to $u$, so $\sigma\Rightarrow^* u$.

As with the bottom-up approach, finding a derivation of $u$ from the top-down may be time consuming, if one is lucky enough to find a derivation at all.

There is a class of grammars, known as the $LL(k)$ grammars, which make the top-down parsing of a word natural and direct.  The first $L$ in $LL(k)$ means scanning the symbols of $u$ from left to right, the second $L$ stands for finding a leftmost derivation ($\Rightarrow_L$) for $u$, and $k$ means having the allowance to look at up to $k$ symbols ahead while scanning.

\textbf{Definition}.  Let $G=(\Sigma,N,P,\sigma)$ be a context-free grammar such that $\sigma\to \sigma$ is not a production of $G$, and $k\ge 0$ an integer.  Suppose $u\in L(G)$ with a $X\to U_1$ a production in a leftmost derivation of $u$:
$$\sigma \Rightarrow_L^* UXU_2 \Rightarrow_L UU_1U_2 \Rightarrow_L^* u.$$
Let $n=|U|+k$ and $v$ be the prefix of $u$ of length $n$ (if $|u|<n$, then set $v=u$).

Then $G$ is said to be $LL(k)$ if for any $w\in L(G)$, with $v$ as a prefix, such that there is a production $X\to W_1$ in a leftmost derivation of $w$: $$\sigma \Rightarrow_L^* UXW_2 \Rightarrow_L UW_1W_2 \Rightarrow_L^* w,$$
implies that $W_1=U_1$.

In a leftmost derivation $D_u$ of a word $u$, call a prefix $v$ of $u$ is a \emph{leftmost descendant} of a production $P\to U$ if $\sigma \Rightarrow^* vPU' \Rightarrow vUU' \Rightarrow^* u$ is $D_u$.  Then the definition above can be restated in words as follows:

Given a leftmost derivation $D_u$ of a word $u$, a production used in $D_u$ is uniquely determined up to $k$ symbols beyond the prefix of $u$ which is a leftmost descendant of the production.  In other words, if $D_u$ and $D_w$ are leftmost derivations of $u$ and $w$ which agree on $k$ symbols beyond the common prefix $v$, where $v$ is both a leftmost descendant of $X\to U$ used in $D_u$, and a leftmost descendant of $X\to W$ used in $D_w$, then $X\to U$ and $X\to W$ are the same production, i.e. $U=W$.

Every $LL(k)$ is unambiguous.  Furthermore, every $LL(k)$ grammar is \PMlinkname{$LR(k)$}{LRk}.

Given a context-free grammar $G$ and $k\ge 0$, there is an algorithm deciding whether $G$ is $LL(k)$.

\textbf{Examples}
\begin{itemize}
\item
The grammar $G$ over $\Sigma=\lbrace a,b\rbrace$, with productions $\sigma \to a^2 \sigma b^2$, $\sigma \to a$ and $\sigma \to \lambda$ is $LL(2)$ but not $LL(1)$.  It is not hard to see that $L(G)$ is the set $\lbrace a^m b^n \mid n\mbox{ is even, and }n\le m\le n+1\rbrace$.  On the other hand, the grammar $G'$ over $\Sigma$, with productions $$\sigma \to aX, \quad \sigma\to \lambda,\quad X\to aYb,\quad X\to \lambda,\quad Y\to aXb, \quad Y\to b$$ also generates $L(G)$, but is $LL(1)$ instead.
\item
The grammar $G$ over $\lbrace a,b,c\rbrace$, with productions $$\sigma\to X,\quad \sigma\to Y,\quad X\to aXb,\quad X\to ab, \quad Y\to aYc, \quad Y\to ac$$ is not $LL(k)$ for any $k\ge 0$.
\end{itemize}

\textbf{Definition} A language is said to be $LL(k)$ if it is generated by an $LL(k)$ grammar.  The family of $LL(k)$ languages is denoted by $\mathscr{LL}(k)$.

It is easy to see that an $LL(0)$ contains no more than one word.  Furthermore, it can be shown that
$$\mathscr{LL}(0)\subset \mathscr{LL}(1)\subset \cdots \subset \mathscr{LL}(k) \subset \cdots ,$$ and the inclusion is strict.  If $\mathscr{LL}(k)'$ denotes the family of $\lambda$-free $LL(k)$ languages, then
$$\mathscr{LL}(0)'= \mathscr{LL}(1)'= \cdots = \mathscr{LL}(k)' = \cdots .$$

Given two $LL(k)$ grammars $G_1$ and $G_2$, there is an algorithm that decides if $L(G_1)=L(G_2)$.

\begin{thebibliography}{9}
\bibitem{AS} A. Salomaa, {\em Formal Languages}, Academic Press, New York (1973).
\end{thebibliography}

%%%%%
%%%%%
\end{document}