www.pudn.com > LexYaccProgs.zip > TPLY.TEX


 
\documentstyle[twocolumn]{article} 
 
\title{TP Lex and Yacc -- The Compiler Writer's Tools for Turbo Pascal\\ 
       Version 3.0 User Manual} 
 
\author{Albert Graef\\ 
        Schillerstr. 18\\ 
        6509 Schornsheim\\ 
        \\ 
        ag@muwinfa.geschichte.uni-mainz.de} 
 
\date{June 17, 1991} 
 
\setlength{\topmargin}{0cm} 
\setlength{\oddsidemargin}{0cm} 
\setlength{\evensidemargin}{0cm} 
\setlength{\textwidth}{16cm} 
\setlength{\textheight}{21cm} 
%\setlength{\parindent}{0pt} 
\parskip=4pt plus 1pt minus 1pt 
\itemsep=0pt 
\renewcommand{\baselinestretch}{1.1} 
\unitlength=1mm 
%\tolerance=500 
%\parskip=0.1cm 
\leftmargini 1.5em 
\leftmarginii 1.5em \leftmarginiii 1.5em \leftmarginiv 1.5em \leftmarginv 1.5em 
\leftmarginvi 1.5em 
 
\begin{document} 
 
\maketitle 
 
\section{Introduction} 
 
This document describes the TP Lex and Yacc compiler generator toolset. 
These tools are designed especially to help you prepare compilers and 
similar programs like text processing utilities and command language 
interpreters with the Turbo Pascal (TM) programming language. 
 
TP Lex and Yacc are Turbo Pascal adaptions of the well-known UNIX (TM) 
utilities Lex and Yacc, which were written by M.E. Lesk and S.C. Johnson 
at Bell Laboratories, and are used with the C programming language. TP Lex 
and Yacc are intended to be approximately ``compatible'' with these programs. 
However, they are an independent development of the author, based on the 
techniques described in the famous ``dragon book'' of Aho, Sethi and Ullman 
(Aho, Sethi, Ullman: {\em Compilers : principles, techniques and tools,\/} 
Reading (Mass.), Addison-Wesley, 1986). 
 
TP Lex and Yacc, like any other tools of this kind, are not intended for 
novices or casual programmers; they require extensive programming experience 
as well as a thorough understanding of the principles of parser design and 
implementation to be put to work successfully. But if you are a seasoned 
Turbo Pascal programmer with some background in compiler design and formal 
language theory, you will almost certainly find TP Lex and Yacc to be a 
powerful extension of your Turbo Pascal toolset. 
 
This manual tells you how to get started with the TP Lex and Yacc programs 
and provides a short description of these programs. Some knowledge about 
the C versions of Lex and Yacc will be useful, although not strictly 
necessary. For further reading, you may also refer to: 
 
\begin{itemize} 
   \item 
      Aho, Sethi and Ullman: {\em Compilers : principles, techniques and 
      tools.\/} Reading (Mass.), Addison-Wesley, 1986. 
   \item 
      Johnson, S.C.: {\em Yacc -- yet another compiler-compiler.\/} CSTR-32, 
      Bell Telephone Laboratories, 1974. 
   \item 
      Lesk, M.E.: {\em Lex -- a lexical analyser generator.\/} CSTR-39, Bell 
      Telephone Laboratories, 1975. 
   \item 
      Schreiner, Friedman: {\em Introduction to compiler construction with 
      UNIX.\/} Prentice-Hall, 1985. 
   \item 
      The Unix Programmer's Manual, Sections `Lex' and `Yacc'. 
\end{itemize} 
 
 
\subsection*{Getting Started} 
 
The TP Lex and Yacc programs run on IBM PC compatible computers with 
MS-DOS 3.0 (or later) and Turbo Pascal compiler (Version 4.0 or later). Your 
system should have at least 512 KB RAM; a hard disk is recommended, while 
not strictly necessary. 
 
To install TP Lex and Yacc on your system, simply copy the files on the 
distribution disk to an appropriate directory on your hard disk. Then 
put this directory on your DOS \verb"PATH" and Turbo Pascal's unit search path 
(such that the Turbo Pascal compiler finds the TP Lex and Yacc library 
units). 
 
The library units (\verb".TPU" files) in the distribution are compiled with 
Turbo Pascal 6.0. If you are using a different Turbo Pascal version, 
you will have to recompile these units (sources are provided in the 
corresponding \verb"LEXLIB.PAS" and \verb"YACCLIB.PAS" files). 
 
Having installed TP Lex and Yacc on your system, you can now compile 
your first TP Lex and Yacc program \verb"EXPR". \verb"EXPR" is a simple 
desktop calculator program which consists of a lexical analyzer in the TP Lex 
source file \verb"EXPRLEX.L" and the parser and main program in the TP Yacc 
source file \verb"EXPR.Y". To compile these programs, issue the commands 
\begin{quote}\begin{verbatim} 
   lex exprlex 
   yacc expr 
\end{verbatim}\end{quote} 
 
That's it! You now have the Turbo Pascal sources (\verb"EXPRLEX.PAS" and 
\verb"EXPR.PAS") for the \verb"EXPR" program. Use the Turbo Pascal compiler 
to compile these programs as follows: 
\begin{quote}\begin{verbatim} 
   tpc expr 
\end{verbatim}\end{quote} 
 
You can now execute the \verb"EXPR" program and type some expressions to see 
it work (terminate the program with an empty line). There is a number of other 
sample TP Lex and Yacc programs (\verb".L" and \verb".Y" files) on the 
distribution disk, including a TP Yacc cross reference utility and a complete 
parser for Standard Pascal. 
 
The TP Lex and Yacc programs recognize some options which may be specified 
anywhere on the command line. E.g., 
\begin{quote}\begin{verbatim} 
   lex /o exprlex 
\end{verbatim}\end{quote} 
runs TP Lex with ``DFA optimization'' and 
\begin{quote}\begin{verbatim} 
   yacc /v expr 
\end{verbatim}\end{quote} 
runs TP Yacc in ``verbose'' mode (TP Yacc generates a readable description 
of the generated parser). 
 
The TP Lex and Yacc programs use the following default filename extensions: 
\begin{itemize} 
   \item \verb".L": TP Lex input files 
   \item \verb".Y": TP Yacc input files 
   \item \verb".PAS": TP Lex and Yacc output files 
\end{itemize} 
 
As usual, you may overwrite default filename extensions by explicitly 
specifying suffixes. 
 
If you ever forget how to run TP Lex and Yacc, you can issue the command 
\begin{quote}\begin{verbatim} 
   lex 
\end{verbatim}\end{quote} 
or 
\begin{quote}\begin{verbatim} 
   yacc 
\end{verbatim}\end{quote} 
without arguments to get a short summary of the command line syntax. 
 
\section{TP Lex} 
 
This section describes the TP Lex lexical analyzer generator. 
 
\subsection*{Usage} 
 
\begin{quote}\begin{verbatim} 
LEX [options] lex-file[.L] 
  [output-file[.PAS]] 
\end{verbatim}\end{quote} 
 
\subsection*{Options} 
 
\begin{description} 
   \item[\verb"/v"] 
      ``Verbose:'' Lex generates a readable description of the generated 
      lexical analyzer, written to lex-file with new extension \verb".LST". 
   \item[\verb"/o"] 
      ``Optimize:'' Lex optimizes DFA tables to produce a minimal DFA. 
\end{description} 
 
\subsection*{Description} 
 
TP Lex is a program generator that is used to generate the Turbo Pascal 
source code for a lexical analyzer subroutine from the specification 
of an input language by a regular expression grammar. 
 
TP Lex parses the source grammar contained in \verb"lex-file" (with default 
suffix \verb".L") and writes the constructed lexical analyzer subroutine 
to the specified \verb"output-file" (with default suffix \verb".PAS"); if no 
output file is specified, output goes to \verb"lex-file" with new suffix 
\verb".PAS." If any errors are found during compilation, error messages are 
written to the list file (\verb"lex-file" with new suffix \verb".LST"). 
 
The generated output file contains a lexical analyzer routine, \verb"yylex", 
implemented as: 
\begin{quote}\begin{verbatim} 
   function yylex : Integer; 
\end{verbatim}\end{quote} 
 
This routine has to be called by your main program to execute the lexical 
analyzer. The return value of the \verb"yylex" routine usually denotes the 
number of a token recognized by the lexical analyzer (see the \verb"return" 
routine in the \verb"LexLib" unit). At end-of-file the \verb"yylex" routine 
normally returns \verb"0". 
 
The code template for the \verb"yylex" routine may be found in the 
\verb"YYLEX.COD" file. This file is needed by TP Lex when it constructs the 
output file. It must be present either in the current directory or in the 
directory from which TP Lex was executed (TP Lex searches these directories in 
the indicated order). 
 
The TP Lex library (\verb"LexLib") unit is required by programs using 
Lex-generated lexical analyzers; you will therefore have to put an appropriate 
\verb"uses" clause into your program or unit that contains the lexical 
analyzer routine. The \verb"LexLib" unit also provides various useful utility 
routines; see the file \verb"LEXLIB.PAS" for further information. 
 
\subsection*{Lex Source} 
 
A TP Lex program consists of three sections separated with the \verb"%%" 
delimiter: 
 
\begin{quote}\begin{verbatim} 
definitions 
%% 
rules 
%% 
auxiliary procedures 
\end{verbatim}\end{quote} 
 
All sections may be empty. The TP Lex language is line-oriented; definitions 
and rules are separated by line breaks. There is no special notation for 
comments, but (Turbo Pascal style) comments may be included as Turbo Pascal 
fragments (see below). 
 
The definitions section may contain the following elements: 
\begin{itemize} 
   \item 
      regular definitions in the format: 
      \begin{quote}\begin{verbatim} 
   name   substitution 
      \end{verbatim}\end{quote} 
      which serve to abbreviate common subexpressions. The \verb"{name}" 
      notation causes the corresponding substitution from the definitions 
      section to be inserted into a regular expression. The name must be 
      a legal identifier (letter followed by a sequence of letters and digits; 
      the underscore counts as a letter; upper- and lowercase are distinct). 
      Regular definitions must be non-recursive. 
   \item 
      start state definitions in the format: 
      \begin{quote}\begin{verbatim} 
   %start name ... 
      \end{verbatim}\end{quote} 
      which are used in specifying start conditions on rules (described 
      below). The \verb"%start" keyword may also be abbreviated as \verb"%s" 
      or \verb"%S". 
   \item 
      Turbo Pascal declarations enclosed between \verb"%{" and \verb"%}". 
      These will be inserted into the output file (at global scope). Also, 
      any line that does not look like a Lex definition (e.g., starts with 
      blank or tab) will be treated as Turbo Pascal code. (In particular, 
      this also allows you to include Turbo Pascal comments in your Lex 
      program.) 
\end{itemize} 
 
The rules section of a TP Lex program contains the actual specification of 
the lexical analyzer routine. It may be thought of as a big \verb"CASE" 
statement discriminating over the different patterns to be matched and listing the 
corresponding statements (actions) to be executed. Each rule consists of a 
regular expression describing the strings to be matched in the input, and a 
corresponding action, a Turbo Pascal statement to be executed when the 
expression matches. Expression and statement are delimited with whitespace 
(blanks and/or tabs). Thus the format of a Lex grammar rule is: 
 
\begin{quote}\begin{verbatim} 
   expression      statement; 
\end{verbatim}\end{quote} 
 
Note that the action must be a single Turbo Pascal statement terminated 
with a semicolon (use \verb"begin ... end" for compound statements). The 
statement may span multiple lines if the successor lines are indented with 
at least one blank or tab. The action may also be replaced by the \verb"|" 
character, indicating that the action for this rule is the same as that for 
the next one. 
 
The TP Lex library unit provides various variables and routines which are 
useful in the programming of actions. In particular, the \verb"yytext" string 
variable holds the text of the matched string, and the \verb"yyleng" Byte 
variable its length. 
 
Regular expressions are used to describe the strings to be matched in a 
grammar rule. They are built from the usual constructs describing character 
classes and sequences, and operators specifying repetitions and alternatives. 
The precise format of regular expressions is described in the next section. 
 
The rules section may also start with some Turbo Pascal declarations 
(enclosed in \verb"%{ %}") which are treated as local declarations of the 
actions routine. 
 
Finally, the auxiliary procedures section may contain arbitrary Turbo 
Pascal code (such as supporting routines or a main program) which is 
simply tacked on to the end of the output file. The auxiliary procedures 
section is optional. 
 
\subsection*{Regular Expressions} 
 
Table \ref{tab1} summarizes the format of the regular expressions 
recognized by TP Lex (also compare Aho, Sethi, Ullman 1986, fig.\ 3.48). 
$c$ stands for a single character, $s$ for a string, $r$ for a regular 
expression, and $n,m$ for nonnegative integers. 
 
\begin{table*}\centering 
   \begin{tabular}{c|c|c} 
      \hline\hline 
      {\sc Expression}& {\sc Matches}& {\sc Example}\\ 
      \hline 
      $c$& any non-operator character $c$& \verb"a"\\ 
      \verb"\"$c$& character $c$ literally& \verb"\*"\\ 
      \verb'"'$s$\verb'"'& string $s$ literally& \verb'"**"'\\ 
      \verb"."& any character but newline& \verb"a.*b"\\ 
      \verb"^"& beginning of line& \verb"^abc"\\ 
      \verb"$"& end of line& \verb"abc$"\\ 
      \verb"["$s$\verb"]"& any character in $s$& \verb"[abc]"\\ 
      \verb"[^"$s$\verb"]"& any character not in $s$& \verb"[^abc]"\\ 
      $r$\verb"*"& zero or more $r$'s& \verb"a*"\\ 
      $r$\verb"+"& one or more $r$'s& \verb"a+"\\ 
      $r$\verb"?"& zero or one $r$& \verb"a?"\\ 
      $r$\verb"{"$m$\verb","$n$\verb"}"& $m$ to $n$ occurrences of $r$& \verb"a{1,5}"\\ 
      $r$\verb"{"$m$\verb"}"& $m$ occurrences of $r$& \verb"a{5}"\\ 
      $r_1r_2$& $r_1$ then $r_2$& \verb"ab"\\ 
      $r_1$\verb"|"$r_2$& $r_1$ or $r_2$& \verb"a|b"\\ 
      \verb"("$r$\verb")"& $r$& \verb"(a|b)"\\ 
      $r_1$\verb"/"$r_2$& $r_1$ when followed by $r_2$& \verb"a/b"\\ 
      \verb"<"$x$\verb">"$r$& $r$ when in start condition $x$& \verb"abc"\\ 
      \hline 
   \end{tabular} 
   \caption{Regular expressions.} 
   \label{tab1} 
\end{table*} 
 
The operators \verb"*", \verb"+", \verb"?" and \verb"{}" have highest 
precedence, followed by concatenation. The \verb"|" operator has lowest 
precedence. Parentheses \verb"()" may be used to group expressions and 
overwrite default precedences. The \verb"<>" and \verb"/" operators may only 
occur once in an expression. 
 
The usual C-like escapes are recognized: 
\begin{itemize} 
   \item \verb"\n"     denotes newline 
   \item \verb"\r"     denotes carriage return 
   \item \verb"\t"     denotes tab 
   \item \verb"\b"     denotes backspace 
   \item \verb"\f"     denotes form feed 
   \item \verb"\"$nnn$ denotes character no.\ $nnn$ in octal base 
\end{itemize} 
 
You can also use the \verb"\" character to quote characters which would 
otherwise be interpreted as operator symbols. In character classes, you may 
use the \verb"-" character to denote ranges of characters. For instance, 
\verb"[a-z]" denotes the class of all lowercase letters. 
 
The expressions in a TP Lex program may be ambigious, i.e. there may be inputs 
which match more than one rule. In such a case, the lexical analyzer prefers 
the longest match and, if it still has the choice between different rules, 
it picks the first of these. If no rule matches, the lexical analyzer 
executes a default action which consists of copying the input character 
to the output unchanged. Thus, if the purpose of a lexical analyzer is 
to translate some parts of the input, and leave the rest unchanged, you 
only have to specify the patterns which have to be treated specially. If, 
however, the lexical analyzer has to absorb its whole input, you will have 
to provide rules that match everything. E.g., you might use the rules 
\begin{quote}\begin{verbatim} 
   .   | 
   \n  ; 
\end{verbatim}\end{quote} 
which match ``any other character'' (and ignore it). 
 
Sometimes certain patterns have to be analyzed differently depending on some 
amount of context in which the pattern appears. In such a case the \verb"/" 
operator is useful. For instance, the expression \verb"a/b" matches \verb"a", 
but only if followed by \verb"b". Note that the \verb"b" does not belong to 
the match; rather, the lexical analyzer, when matching an \verb"a", will look 
ahead in the input to see whether it is followed by a \verb"b", before it 
declares that it has matched an \verb"a". Such lookahead may be arbitrarily 
complex (up to the size of the \verb"LexLib" input buffer). E.g., the pattern 
\verb"a/.*b" matches an \verb"a" which is followed by a \verb"b" somewhere on 
the same input line. TP Lex also has a means to specify left context which is 
described in the next section. 
 
\subsection*{Start Conditions} 
 
TP Lex provides some features which make it possible to handle left context. 
The \verb"^" character at the beginning of a regular expression may be used 
to denote the beginning of the line. More distant left context can be described 
conveniently by using start conditions on rules. 
 
Any rule which is prefixed with the \verb"<>" construct is only valid if the 
lexical analyzer is in the denoted start state. For instance, the expression 
\verb"a" can only be matched if the lexical analyzer is in start state 
\verb"x". You can have multiple start states in a rule; e.g., \verb"a" 
can be matched in start states \verb"x" or \verb"y". 
 
Start states have to be declared in the definitions section by means of 
one or more start state definitions (see above). The lexical analyzer enters 
a start state through a call to the \verb"LexLib" routine \verb"start". E.g., 
you may write: 
 
\begin{quote}\begin{verbatim} 
%start x y 
%% 
a    start(y); 
b    start(x); 
%% 
begin 
  start(x); if yylex=0 then ; 
end. 
\end{verbatim}\end{quote} 
 
Upon initialization, the lexical analyzer is put into state \verb"x". It then 
proceeds in state \verb"x" until it matches an \verb"a" which puts it into 
state \verb"y". In state \verb"y" it may match a \verb"b" which puts it into 
state \verb"x" again, etc. 
 
Start conditions are useful when certain constructs have to be analyzed 
differently depending on some left context (such as a special character 
at the beginning of the line), and if multiple lexical analyzers have to 
work in concert. If a rule is not prefixed with a start condition, it is 
valid in all user-defined start states, as well as in the lexical analyzer's 
default start state. 
 
\subsection*{Lex Library} 
 
The TP Lex library (\verb"LexLib") unit provides various variables and 
routines which are used by Lex-generated lexical analyzers and application 
programs. It provides the input and output streams and other internal data 
structures used by the lexical analyzer routine, and supplies some variables 
and utility routines which may be used by actions and application programs. 
Refer to the file \verb"LEXLIB.PAS" for a closer description. 
 
You can also modify the Lex library unit (and/or the code template in the 
\verb"YYLEX.COD" file) to customize TP Lex to your target applications. E.g., 
you might wish to optimize the code of the lexical analyzer for some 
special application, make the analyzer read from/write to memory instead 
of files, etc. 
 
\subsection*{Implementation Restrictions And Bugs} 
 
Internal table sizes and the main memory available limit the complexity 
of source grammars that TP Lex can handle. There is currently no possibility 
to change internal table sizes (apart from modifying the sources of TP Lex 
itself), but the maximum table sizes provided by TP Lex seem to be large 
enough to handle most realistic applications. The current limits are 
600 p (positions), 300 s (states) and 600 t (transitions). 
 
As implemented, the generated DFA table is stored as a typed array constant 
which is inserted into the \verb"YYLEX.COD" code template. The transitions in 
each state are stored in order. Of course it would have been more efficient to 
generate a big \verb"CASE" statement instead, but I found that this may cause 
problems with the encoding of large DFA tables because Turbo Pascal has 
a quite rigid limit on the code size of individual procedures. I decided to 
use a scheme in which transitions on different symbols to the same state are 
merged into one single transition (specifying a character set and the 
corresponding next state). This keeps the number of transitions in each state 
quite small and still allows a fairly efficient access to the transition 
table. 
 
The TP Lex program has an option (\verb"/o") to optimize DFA tables. This 
causes a minimal DFA to be generated, using the algorithm described in Aho, 
Sethi, Ullman (1986). Although the absolute limit on the number of DFA states 
that TP Lex can handle is 300, TP Lex poses an additional restriction (100) on 
the number of states in the initial partition of the DFA optimization 
algorithm. Thus, you may get a fatal \verb"integer set overflow" message when 
using the \verb"/o" option even when TP Lex is able to generate an unoptimized 
DFA. In such cases you will just have to be content with the unoptimized DFA. 
(Anyhow, using the merged transitions scheme described above, TP Lex usually 
constructs unoptimized DFA's which are not far from being optimal, and thus 
in most cases DFA optimization won't have a great impact on DFA table sizes.) 
 
\subsection*{Differences from UNIX Lex} 
 
Major differences between TP Lex and UNIX Lex are listed below. 
 
\begin{itemize} 
   \item 
      TP Lex produces output code for Turbo Pascal, rather than for C. 
   \item 
      Character tables (\verb"%T") are not supported; neither are any 
      directives to determine internal table sizes (\verb"%p", \verb"%n", 
      etc.). 
   \item 
      Library routines are named differently from the UNIX version (e.g., 
      the \verb"start" routine takes the place of the \verb"BEGIN" macro of 
      UNIX Lex), and, of course, all macros of UNIX Lex (\verb"ECHO", 
      \verb"REJECT", etc.) had to be implemented as procedures. 
\end{itemize} 
 
\section{TP Yacc} 
 
This section describes the TP Yacc compiler compiler. 
 
\subsection*{Usage} 
 
\begin{quote}\begin{verbatim} 
YACC [options] yacc-file[.Y] 
  [output-file[.PAS]] 
\end{verbatim}\end{quote} 
 
\subsection*{Options} 
 
\begin{description} 
   \item[\verb"/v"] 
      ``Verbose:'' TP Yacc generates a readable description of the generated 
      parser, written to \verb"yacc-file" with new extension \verb".LST". 
   \item[\verb"/d"] 
      ``Debug:'' TP Yacc generates parser with debugging output. 
\end{description} 
 
\subsection*{Description} 
 
TP Yacc is a program that lets you prepare parsers from the description 
of input languages by BNF-like grammars. You simply specify the grammar 
for your target language, augmented with the Turbo Pascal code necessary 
to process the syntactic constructs, and TP Yacc translates your grammar 
into the Turbo Pascal code for a corresponding parser subroutine named 
\verb"yyparse". 
 
TP Yacc parses the source grammar contained in \verb"yacc-file" (with default 
suffix \verb".Y") and writes the constructed parser subroutine to the 
specified \verb"output-file" (with default suffix \verb".PAS"); if no output 
file is specified, output goes to \verb"yacc-file" with new suffix 
\verb".PAS". If any errors are found during compilation, error messages are 
written to the list file (\verb"yacc-file" with new suffix \verb".LST"). 
 
The generated parser routine, \verb"yyparse", is declared as: 
 
\begin{quote}\begin{verbatim} 
   function yyparse : Integer; 
\end{verbatim}\end{quote} 
 
This routine may be called by your main program to execute the parser. 
The return value of the \verb"yyparse" routine denotes success or failure of 
the parser (possible return values: 0 = success, 1 = unrecoverable syntax 
error or parse stack overflow). 
 
The code template for the \verb"yyparse" routine may be found in the 
\verb"YYPARSE.COD" file. This file is needed by TP Yacc when it constructs 
the output file. It must be present either in the current directory or in the 
directory from which TP Yacc was executed (TP Yacc searches these directories 
in the indicated order). 
 
The TP Yacc library (\verb"YaccLib") unit is required by programs using Yacc- 
generated parsers; you will therefore have to put an appropriate \verb"uses" 
clause into your program or unit that contains the parser routine. The 
\verb"YaccLib" unit also provides some routines which may be used to control 
the actions of the parser. See the file \verb"YACCLIB.PAS" for further 
information. 
 
\subsection*{Yacc Source} 
 
A TP Yacc program consists of three sections separated with the \verb"%%" 
delimiter: 
 
\begin{quote}\begin{verbatim} 
definitions 
%% 
rules 
%% 
auxiliary procedures 
\end{verbatim}\end{quote} 
 
The TP Yacc language is free-format: whitespace (blanks, tabs and newlines) 
is ignored, except if it serves as a delimiter. Comments have the C-like 
format \verb"/* ... */". They are treated as whitespace. Grammar symbols are 
denoted by identifiers which have the usual form (letter, including 
underscore, followed by a sequence of letters and digits; upper- and 
lowercase is distinct). The TP Yacc language also has some keywords which 
always start with the \verb"%" character. Literals are denoted by characters 
enclosed in single quotes. The usual C-like escapes are recognized: 
 
\begin{itemize} 
   \item \verb"\n"     denotes newline 
   \item \verb"\r"     denotes carriage return 
   \item \verb"\t"     denotes tab 
   \item \verb"\b"     denotes backspace 
   \item \verb"\f"     denotes form feed 
   \item \verb"\"$nnn$ denotes character no.\ $nnn$ in octal base 
\end{itemize} 
 
\subsection*{Definitions} 
 
The first section of a TP Yacc grammar serves to define the symbols used in 
the grammar. It may contain the following types of definitions: 
 
\begin{itemize} 
   \item 
      start symbol definition: A definition of the form 
      \begin{quote}\begin{verbatim} 
   %start symbol 
      \end{verbatim}\end{quote} 
      declares the start nonterminal of the grammar (if this definition is 
      omitted, TP Yacc assumes the left-hand side nonterminal of the first 
      grammar rule as the start symbol of the grammar). 
   \item 
      terminal definitions: Definitions of the form 
      \begin{quote}\begin{verbatim} 
   %token symbol ... 
      \end{verbatim}\end{quote} 
      are used to declare the terminal symbols (``tokens'') of the target 
      language. Any identifier not introduced in a \verb"%token" definition 
      will be treated as a nonterminal symbol. 
     
      As far as TP Yacc is concerned, tokens are atomic symbols which do not 
      have an innert structure. A lexical analyzer must be provided which 
      takes on the task of tokenizing the input stream and return the 
      individual tokens and literals to the parser (see Section {\em Lexical 
      Analysis\/}). 
   \item 
      precedence definitions: Operator symbols (terminals) may be associated 
      with a precedence by means of a precedence definition which may have 
      one of the following forms 
      \begin{quote}\begin{verbatim} 
   %left symbol ... 
   %right symbol ... 
   %nonassoc symbol ... 
      \end{verbatim}\end{quote} 
      which are used to declare left-, right- and nonassociative operators, 
      respectively. Each precedence definition introduces a new precedence 
      level, lowest precedence first. E.g., you may write: 
      \begin{quote}\begin{verbatim} 
   %nonassoc '<' '>' '=' GEQ LEQ NEQ 
      /* relational operators */ 
   %left     '+' '-'  OR 
      /* addition operators */ 
   %left     '*' '/' AND 
     /* multiplication operators */ 
   %right    NOT UMINUS 
     /* unary operators */ 
      \end{verbatim}\end{quote} 
 
      A terminal identifier introduced in a precedence definition may, but 
      need not, appear in a \verb"%token" definition as well. 
   \item 
      type definitions: Any (terminal or nonterminal) grammar symbol may be 
      associated with a type identifier which is used in the processing of 
      semantic values. Type tags of the form \verb"" may be used in 
      token and precedence definitions to declare the type of a terminal 
      symbol, e.g.: 
      \begin{quote}\begin{verbatim} 
   %token   NUM 
   %left   '+' '-' 
      \end{verbatim}\end{quote} 
 
      To declare the type of a nonterminal symbol, use a type definition of 
      the form: 
      \begin{quote}\begin{verbatim} 
   %type  symbol ... 
      \end{verbatim}\end{quote} 
      e.g.: 
      \begin{quote}\begin{verbatim} 
   %type  expr 
      \end{verbatim}\end{quote} 
 
      In a \verb"%type" definition, you may also omit the nonterminals, i.e. 
      you may write: 
      \begin{quote}\begin{verbatim} 
   %type  
      \end{verbatim}\end{quote} 
 
      This is useful when a given type is only used with type casts (see 
      Section {\em Grammar Rules and Actions\/}), and is not associated with 
      a specific nonterminal. 
   \item 
      Turbo Pascal declarations: You may also include arbitrary Turbo Pascal 
      code in the definitions section, enclosed in \verb"%{ %}". This code 
      will be inserted as global declarations into the output file, unchanged. 
\end{itemize} 
 
\subsection*{Grammar Rules and Actions} 
 
The second part of a TP Yacc grammar contains the grammar rules for the 
target language. Grammar rules have the format 
 
\begin{quote}\begin{verbatim} 
   name : symbol ... ; 
\end{verbatim}\end{quote} 
 
The left-hand side of a rule must be an identifier (which denotes a 
nonterminal symbol). The right-hand side may be an arbitrary (possibly 
empty) sequence of nonterminal and terminal symbols (including literals 
enclosed in single quotes). The terminating semicolon may also be omitted. 
Different rules for the same left-hand side symbols may be written using 
the \verb"|" character to separate the different alternatives: 
 
\begin{quote}\begin{verbatim} 
   name : symbol ... 
        | symbol ... 
        ... 
        ; 
\end{verbatim}\end{quote} 
 
For instance, to specify a simple grammar for arithmetic expressions, you 
may write: 
 
\begin{quote}\begin{verbatim} 
%left '+' '-' 
%left '*' '/' 
%token NUM 
%% 
expr : expr '+' expr 
     | expr '-' expr 
     | expr '*' expr 
     | expr '/' expr 
     | '(' expr ')' 
     | NUM 
     ; 
\end{verbatim}\end{quote} 
 
(The \verb"%left" definitions at the beginning of the grammar are needed to 
specify the precedence and associativity of the operator symbols. This will be 
discussed in more detail in Section {\em Ambigious Grammars\/}.) 
 
Grammar rules may contain actions -- Turbo Pascal statements enclosed in 
\verb"{ }" -- to be executed as the corresponding rules are recognized. 
Furthermore, rules may return values, and access values returned by other 
rules. These ``semantic'' values are written as \verb"$$" (value of the 
left-hand side nonterminal) and \verb"$i" (value of the $i$th right-hand 
side symbol). They are kept on a special value stack which is maintained 
automatically by the parser. 
 
Values associated with terminal symbols must be set by the lexical analyzer 
(more about this in Section {\em Lexical Analysis\/}). Actions of the form 
\verb"$$ := $1" can frequently be omitted, since it is the default action 
assumed by TP Yacc for any rule that does not have an explicit action. 
 
By default, the semantic value type provided by Yacc is \verb"Integer". You 
can also put a declaration like 
\begin{quote}\begin{verbatim} 
   %{ 
   type YYSType = Real; 
   %} 
\end{verbatim}\end{quote} 
into the definitions section of your Yacc grammar to change the default value 
type. However, if you have different value types, the preferred method is to 
use type definitions as discussed in Section {\em Definitions\/}. When such 
type definitions are given, TP Yacc handles all the necessary details of the 
\verb"YYSType" definition and also provides a fair amount of type checking 
which makes it easier to find type errors in the grammar. 
 
For instance, we may declare the symbols \verb"NUM" and \verb"expr" in the 
example above to be of type \verb"Real", and then use these values to 
evaluate an expression as it is parsed. 
 
\begin{quote}\begin{verbatim} 
%left '+' '-' 
%left '*' '/' 
%token  NUM 
%type   expr 
%% 
expr : expr '+' expr   { $$ := $1+$3; } 
     | expr '-' expr   { $$ := $1-$3; } 
     | expr '*' expr   { $$ := $1*$3; } 
     | expr '/' expr   { $$ := $1/$3; } 
     | '(' expr ')'    { $$ := $2;    } 
     | NUM 
     ; 
\end{verbatim}\end{quote} 
 
(Note that we omitted the action of the last rule. The ``copy action'' 
\verb"$$ := $1" required by this rule is automatically added by TP Yacc.) 
 
Actions may not only appear at the end, but also in the middle of a rule 
which is useful to perform some processing before a rule is fully parsed. 
Such actions inside a rule are treated as special nonterminals which are 
associated with an empty right-hand side. Thus, a rule like 
\begin{quote}\begin{verbatim} 
   x : y { action; } z 
\end{verbatim}\end{quote} 
will be treated as: 
\begin{quote}\begin{verbatim} 
  x : y $act z 
  $act : { action; } 
\end{verbatim}\end{quote} 
 
Actions inside a rule may also access values to the left of the action, 
and may return values by assigning to the \verb"$$" value. The value returned 
by such an action can then be accessed by other actions using the usual 
\verb"$i" notation. E.g., we may write: 
\begin{quote}\begin{verbatim} 
   x : y { $$ := 2*$1; } z { $$ := $2+$3; } 
\end{verbatim}\end{quote} 
which has the effect of setting the value of \verb"x" to 
\begin{quote}\begin{verbatim} 
   2*(the value of y)+(the value of z). 
\end{verbatim}\end{quote} 
 
Sometimes it is desirable to access values in enclosing rules. This can be 
done using the notation \verb"$i" with $i\leq 0$. \verb"$0" refers to the 
first value ``to the left'' of the current rule, \verb"$-1" to the second, 
and so on. Note that in this case the referenced value depends on the actual 
contents of the parse stack, so you have to make sure that the requested 
values are always where you expect them. 
 
There are some situations in which TP Yacc cannot easily determine the 
type of values (when a typed parser is used). This is true, in particular, 
for values in enclosing rules and for the \verb"$$" value in an action inside 
a rule. In such cases you may use a type cast to explicitly specify the type 
of a value. The format for such type casts is \verb"$$" (for left-hand 
side values) and \verb"$i" (for right-hand side values) where 
\verb"name" is a type identifier (which must occur in a \verb"%token", 
precedence or \verb"%type" definition). 
 
\subsection*{Auxiliary Procedures} 
 
The third section of a TP Yacc program is optional. If it is present, it 
may contain any Turbo Pascal code (such as supporting routines or a main 
program) which is tacked on to the end of the output file. 
 
\subsection*{Lexical Analysis} 
 
For any TP Yacc-generated parser, the programmer must supply a lexical 
analyzer routine named \verb"yylex" which performs the lexical analysis for 
the parser. This routine must be declared as 
 
\begin{quote}\begin{verbatim} 
   function yylex : Integer; 
\end{verbatim}\end{quote} 
 
The \verb"yylex" routine may either be prepared by hand, or by using the 
lexical analyzer generator TP Lex (see Section {\em TP Lex\/}). 
 
The lexical analyzer must be included in your main program behind the 
parser subroutine (the \verb"yyparse" code template includes a forward 
definition of the \verb"yylex" routine such that the parser can access the 
lexical analyzer). For instance, you may put the lexical analyzer 
routine into the auxiliary procedures section of your TP Yacc grammar, 
either directly, or by using the the Turbo Pascal include directive 
(\verb"$I"). 
 
The parser repeatedly calls the \verb"yylex" routine to tokenize the input 
stream and obtain the individual lexical items in the input. For any 
literal character, the \verb"yylex" routine has to return the corresponding 
character code. For the other, symbolic, terminals of the input language, 
the lexical analyzer must return corresponding integer codes. These are 
assigned automatically by TP Yacc in the order in which token definitions 
appear in the definitions section of the source grammar. The lexical 
analyzer can access these values through corresponding integer constants 
which are declared by TP Yacc in the output file. 
 
For instance, if 
\begin{quote}\begin{verbatim} 
   %token NUM 
\end{verbatim}\end{quote} 
is the first definition in the Yacc grammar, then TP Yacc will create 
a corresponding constant declaration 
\begin{quote}\begin{verbatim} 
   const NUM = 257; 
\end{verbatim}\end{quote} 
in the output file (TP Yacc automatically assigns symbolic token numbers 
starting at 257; 1 thru 255 are reserved for character literals, 0 denotes 
end-of-file, and 256 is reserved for the special error token which will be 
discussed in Section {\em Error Handling\/}). This definition may then be 
used, e.g., in a corresponding TP Lex program as follows: 
\begin{quote}\begin{verbatim} 
   [0-9]+   return(NUM); 
\end{verbatim}\end{quote} 
 
You can also explicitly assign token numbers in the grammar. For this 
purpose, the first occurrence of a token identifier in the definitions 
section may be followed by an unsigned integer. E.g. you may write: 
\begin{quote}\begin{verbatim} 
   %token NUM 299 
\end{verbatim}\end{quote} 
 
Besides the return value of \verb"yylex", the lexical analyzer routine may 
also return an additional semantic value for the recognized token. This value 
is assigned to a variable named \verb"yylval" and may then be accessed in 
actions through the \verb"$i" notation (see above, Section {\em Grammar 
Rules and Actions\/}). The \verb"yylval" variable is of type \verb"YYSType" 
(the semantic value type, \verb"Integer" by default); its declaration may be 
found in the \verb"YYPARSE.COD" file. 
 
For instance, to assign an \verb"Integer" value to a \verb"NUM" token in the 
above example, we may write: 
 
\begin{quote}\begin{verbatim} 
   [0-9]+   begin 
              val(yytext, yylval, code); 
              return(NUM); 
            end; 
\end{verbatim}\end{quote} 
 
This assigns \verb"yylval" the value of the \verb"NUM" token (using the Turbo 
Pascal standard procedure \verb"val"). 
 
If a parser uses tokens of different types (via a \verb"%token " 
definition), then the \verb"yylval" variable will not be of type 
\verb"Integer", but instead of a corresponding variant record type which is 
capable of holding all the different value types declared in the TP Yacc 
grammar. In this case, the lexical analyzer must assign a semantic value to 
the corresponding record component which is named \verb"yy"{\em name\/} 
(where {\em name\/} stands for the corresponding type identifier). 
 
E.g., if token \verb"NUM" is declared \verb"Real": 
\begin{quote}\begin{verbatim} 
   %token  NUM 
\end{verbatim}\end{quote} 
then the value for token \verb"NUM" must be assigned to \verb"yylval.yyReal". 
 
\subsection*{How The Parser Works} 
 
TP Yacc uses the LALR(1) technique developed by Donald E.\ Knuth and F.\ 
DeRemer to construct a simple, efficient, non-backtracking bottom-up 
parser for the source grammar. The LALR parsing technique is described 
in detail in Aho/Sethi/Ullman (1986). It is quite instructive to take a 
look at the parser description TP Yacc generates from a small sample 
grammar, to get an idea of how the LALR parsing algorithm works. We 
consider the following simplified version of the arithmetic expression 
grammar: 
 
\begin{quote}\begin{verbatim} 
%token NUM 
%left '+' 
%left '*' 
%% 
expr : expr '+' expr 
     | expr '*' expr 
     | '(' expr ')' 
     | NUM 
     ; 
\end{verbatim}\end{quote} 
 
When run with the \verb"/v" option on the above grammar, TP Yacc generates 
the parser description listed below. 
 
\begin{quote}\begin{verbatim} 
state 0: 
 
        $accept : _ expr $end 
 
        '('     shift 2 
        NUM     shift 3 
        .       error 
 
        expr    goto 1 
 
state 1: 
 
        $accept : expr _ $end 
        expr : expr _ '+' expr 
        expr : expr _ '*' expr 
 
        $end    accept 
        '*'     shift 4 
        '+'     shift 5 
        .       error 
 
state 2: 
 
        expr : '(' _ expr ')' 
 
        '('     shift 2 
        NUM     shift 3 
        .       error 
 
        expr    goto 6 
 
state 3: 
 
        expr : NUM _    (4) 
 
        .       reduce 4 
 
state 4: 
 
        expr : expr '*' _ expr 
 
        '('     shift 2 
        NUM     shift 3 
        .       error 
 
        expr    goto 7 
 
state 5: 
 
        expr : expr '+' _ expr 
 
        '('     shift 2 
        NUM     shift 3 
        .       error 
 
        expr    goto 8 
 
state 6: 
 
        expr : '(' expr _ ')' 
        expr : expr _ '+' expr 
        expr : expr _ '*' expr 
 
        ')'     shift 9 
        '*'     shift 4 
        '+'     shift 5 
        .       error 
 
state 7: 
 
        expr : expr '*' expr _  (2) 
        expr : expr _ '+' expr 
        expr : expr _ '*' expr 
 
        .       reduce 2 
 
state 8: 
 
        expr : expr '+' expr _  (1) 
        expr : expr _ '+' expr 
        expr : expr _ '*' expr 
 
        '*'     shift 4 
        $end    reduce 1 
        ')'     reduce 1 
        '+'     reduce 1 
        .       error 
 
state 9: 
 
        expr : '(' expr ')' _   (3) 
 
        .       reduce 3 
\end{verbatim}\end{quote} 
 
Each state of the parser corresponds to a certain prefix of the input 
which has already been seen. The parser description lists the grammar 
rules wich are parsed in each state, and indicates the portion of each 
rule which has already been parsed by an underscore. In state 0, the 
start state of the parser, the parsed rule is 
\begin{quote}\begin{verbatim} 
        $accept : expr $end 
\end{verbatim}\end{quote} 
 
This is not an actual grammar rule, but a starting rule automatically 
added by TP Yacc. In general, it has the format 
\begin{quote}\begin{verbatim} 
        $accept : X $end 
\end{verbatim}\end{quote} 
where \verb"X" is the start nonterminal of the grammar, and \verb"$end" is 
a pseudo token denoting end-of-input (the \verb"$end" symbol is used by the 
parser to determine when it has successfully parsed the input). 
 
The description of the start rule in state 0, 
\begin{quote}\begin{verbatim} 
        $accept : _ expr $end 
\end{verbatim}\end{quote} 
with the underscore positioned before the \verb"expr" symbol, indicates that 
we are at the beginning of the parse and are ready to parse an expression 
(nonterminal \verb"expr"). 
 
The parser maintains a stack to keep track of states visited during the 
parse. There are two basic kinds of actions in each state: {\em shift\/}, 
which reads an input symbol and pushes the corresponding next state on top of 
the stack, and {\em reduce\/} which pops a number of states from the stack 
(corresponding to the number of right-hand side symbols of the rule used 
in the reduction) and consults the {\em goto\/} entries of the uncovered 
state to find the transition corresponding to the left-hand side symbol of the 
reduced rule. 
 
In each step of the parse, the parser is in a given state (the state on 
top of its stack) and may consult the current {\em lookahead symbol\/}, the 
next symbol in the input, to determine the parse action -- shift or reduce -- 
to perform. The parser terminates as soon as it reaches state 1 and reads 
in the endmarker, indicated by the {\em accept\/} action on \verb"$end" in 
state 1. 
 
Sometimes the parser may also carry out an action without inspecting the 
current lookahead token. This is the case, e.g., in state 3 where the 
only action is reduction by rule 4: 
\begin{quote}\begin{verbatim} 
        .       reduce 4 
\end{verbatim}\end{quote} 
 
The default action in a state can also be {\em error\/} indicating that any 
other input represents a syntax error. (In case of such an error the 
parser will start syntactic error recovery, as described in Section 
{\em Error Handling\/}.) 
 
Now let us see how the parser responds to a given input. We consider the 
input string \verb"2+5*3" which is presented to the parser as the token 
sequence: 
\begin{quote}\begin{verbatim} 
   NUM + NUM * NUM 
\end{verbatim}\end{quote} 
 
Table \ref{tab2} traces the corresponding actions of the parser. We also 
show the current state in each move, and the remaining states on the stack. 
 
\begin{table*}\centering 
   \begin{tabular}{l|l|l|p{8cm}} 
      \hline\hline 
      {\sc State}& {\sc Stack}& {\sc Lookahead}& {\sc Action}\\ 
      \hline 
0 &               &    \verb"NUM"    &    shift state 3\\ 
3 &     0         &                  &    reduce rule 4 (pop 1 state, uncovering state 
                                          0, then goto state 1 on symbol \verb"expr")\\ 
1 &     0         &    \verb"+"      &    shift state 5\\ 
5 &     1 0       &    \verb"NUM"    &    shift state 3\\ 
3 &     5 1 0     &                  &    reduce rule 4 (pop 1 state, uncovering state 
                                          5, then goto state 8 on symbol \verb"expr")\\ 
8 &     5 1 0     &    \verb"*"      &    shift 4\\ 
4 &     8 5 1 0   &    \verb"NUM"    &    shift 3\\ 
3 &     4 8 5 1 0 &                  &    reduce rule 4 (pop 1 state, uncovering state 
                                          4, then goto state 7 on symbol \verb"expr")\\ 
7 &     4 8 5 1 0 &                  &    reduce rule 2 (pop 3 states, uncovering state 
                                          5, then goto state 8 on symbol \verb"expr")\\ 
8 &     5 1 0     &    \verb"$end"   &    reduce rule 1 (pop 3 states, uncovering state 
                                          0, then goto state 1 on symbol \verb"expr")\\ 
1 &     0         &    \verb"$end"   &    accept\\ 
      \hline 
   \end{tabular} 
   \caption{Parse of \protect\verb"NUM + NUM * NUM".} 
   \label{tab2} 
\end{table*} 
 
It is also instructive to see how the parser responds to illegal inputs. 
E.g., you may try to figure out what the parser does when confronted with: 
\begin{quote}\begin{verbatim} 
   NUM + ) 
\end{verbatim}\end{quote} 
or: 
\begin{quote}\begin{verbatim} 
   ( NUM * NUM 
\end{verbatim}\end{quote} 
 
You will find that the parser, sooner or later, will always run into an 
error action when confronted with errorneous inputs. An LALR parser will 
never shift an invalid symbol and thus will always find syntax errors as 
soon as it is possible during a left-to-right scan of the input. 
 
TP Yacc provides a debugging option (\verb"/d") that may be used to trace 
the actions performed by the parser. When a grammar is compiled with the 
\verb"/d" option, the generated parser will print out the actions as it 
parses its input. 
 
\subsection*{Ambigious Grammars} 
 
There are situations in which TP Yacc will not produce a valid parser for 
a given input language. LALR(1) parsers are restricted to one-symbol 
lookahead on which they have to base their parsing decisions. If a 
grammar is ambigious, or cannot be parsed unambigiously using one-symbol 
lookahead, TP Yacc will generate parsing conflicts when constructing the 
parse table. There are two types of such conflicts: {\em shift/reduce 
conflicts\/} (when there is both a shift and a reduce action for a given 
input symbol in a given state), and {\em reduce/reduce\/} conflicts (if 
there is more than one reduce action for a given input symbol in a given 
state). Note that there never will be a shift/shift conflict. 
 
When a grammar generates parsing conflicts, TP Yacc prints out the number 
of shift/reduce and reduce/reduce conflicts it encountered when constructing 
the parse table. However, TP Yacc will still generate the output code for the 
parser. To resolve parsing conflicts, TP Yacc uses the following built-in 
disambiguating rules: 
 
\begin{itemize} 
   \item 
      in a shift/reduce conflict, TP Yacc chooses the shift action. 
   \item 
      in a reduce/reduce conflict, TP Yacc chooses reduction of the first 
      grammar rule. 
\end{itemize} 
 
The shift/reduce disambiguating rule correctly resolves a type of 
ambiguity known as the ``dangling-else ambiguity'' which arises in the 
syntax of conditional statements of many programming languages (as in 
Pascal): 
 
\begin{quote}\begin{verbatim} 
%token IF THEN ELSE 
%% 
stmt : IF expr THEN stmt 
     | IF expr THEN stmt ELSE stmt 
     ; 
\end{verbatim}\end{quote} 
 
This grammar is ambigious, because a nested construct like 
\begin{quote}\begin{verbatim} 
   IF expr-1 THEN IF expr-2 THEN stmt-1 
     ELSE stmt-2 
\end{verbatim}\end{quote} 
can be parsed two ways, either as: 
\begin{quote}\begin{verbatim} 
   IF expr-1 THEN ( IF expr-2 THEN stmt-1 
     ELSE stmt-2 ) 
\end{verbatim}\end{quote} 
or as: 
\begin{quote}\begin{verbatim} 
   IF expr-1 THEN ( IF expr-2 THEN stmt-1 ) 
     ELSE stmt-2 
\end{verbatim}\end{quote} 
 
The first interpretation makes an \verb"ELSE" belong to the last unmatched 
\verb"IF" which also is the interpretation chosen in most programming 
languages. This is also the way that a TP Yacc-generated parser will parse 
the construct since the shift/reduce disambiguating rule has the effect of 
neglecting the reduction of \verb"IF expr-2 THEN stmt-1"; instead, the parser 
will shift the \verb"ELSE" symbol which eventually leads to the reduction of 
\verb"IF expr-2 THEN stmt-1 ELSE stmt-2". 
 
The reduce/reduce disambiguating rule is used to resolve conflicts that 
arise when there is more than one grammar rule matching a given construct. 
Such ambiguities are often caused by ``special case constructs'' which may be 
given priority by simply listing the more specific rules ahead of the more 
general ones. 
 
For instance, the following is an excerpt from the grammar describing the 
input language of the UNIX equation formatter EQN: 
 
\begin{quote}\begin{verbatim} 
%right SUB SUP 
%% 
expr : expr SUB expr SUP expr 
     | expr SUB expr 
     | expr SUP expr 
     ; 
\end{verbatim}\end{quote} 
 
Here, the \verb"SUB" and \verb"SUP" operator symbols denote sub- and 
superscript, respectively. The rationale behind this example is that 
an expression involving both sub- and superscript is often set differently 
from a superscripted subscripted expression (compare $x_i^n$ to ${x_i}^n$). 
This special case is therefore caught by the first rule in the above example 
which causes a reduce/reduce conflict with rule 3 in expressions like 
\verb"expr-1 SUB expr-2 SUP expr-3". The conflict is resolved in favour of 
the first rule. 
 
In both cases discussed above, the ambiguities could also be eliminated 
by rewriting the grammar accordingly (although this yields more complicated 
and less readable grammars). This may not always be the case. Often 
ambiguities are also caused by design errors in the grammar. Hence, if 
TP Yacc reports any parsing conflicts when constructing the parser, you 
should use the \verb"/v" option to generate the parser description 
(\verb".LST" file) and check whether TP Yacc resolved the conflicts correctly. 
 
There is one type of syntactic constructs for which one often deliberately 
uses an ambigious grammar as a more concise representation for a language 
that could also be specified unambigiously: the syntax of expressions. 
For instance, the following is an unambigious grammar for simple arithmetic 
expressions: 
 
\begin{quote}\begin{verbatim} 
%token NUM 
 
%% 
 
expr    : term 
        | expr '+' term 
        ; 
 
term    : factor 
        | term '*' factor 
        ; 
 
factor  : '(' expr ')' 
        | NUM 
        ; 
\end{verbatim}\end{quote} 
 
You may check yourself that this grammar gives \verb"*" a higher precedence 
than \verb"+" and makes both operators left-associative. The same effect can 
be achieved with the following ambigious grammar using precedence definitions: 
 
\begin{quote}\begin{verbatim} 
%token NUM 
%left '+' 
%left '*' 
%% 
expr : expr '+' expr 
     | expr '*' expr 
     | '(' expr ')' 
     | NUM 
     ; 
\end{verbatim}\end{quote} 
 
Without the precedence definitions, this is an ambigious grammar causing 
a number of shift/reduce conflicts. The precedence definitions are used 
to correctly resolve these conflicts (conflicts resolved using precedence 
will not be reported by TP Yacc). 
 
Each precedence definition introduces a new precedence level (lowest 
precedence first) and specifies whether the corresponding operators 
should be left-, right- or nonassociative (nonassociative operators 
cannot be combined at all; example: relational operators in Pascal). 
 
TP Yacc uses precedence information to resolve shift/reduce conflicts as 
follows. Precedences are associated with each terminal occuring in a 
precedence definition. Furthermore, each grammar rule is given the 
precedence of its rightmost terminal (this default choice can be 
overwritten using a \verb"%prec" tag; see below). To resolve a shift/reduce 
conflict using precedence, both the symbol and the rule involved must 
have been assigned precedences. TP Yacc then chooses the parse action 
as follows: 
 
\begin{itemize} 
   \item 
      If the symbol has higher precedence than the rule: shift. 
   \item 
      If the rule has higher precedence than the symbol: reduce. 
   \item 
      If symbol and rule have the same precedence, the associativity of the 
      symbol determines the parse action: if the symbol is left-associative: 
      reduce; if the symbol is right-associative: shift; if the symbol is 
      non-associative: error. 
\end{itemize} 
 
To give you an idea of how this works, let us consider our ambigious 
arithmetic expression grammar (without precedences): 
 
\begin{quote}\begin{verbatim} 
%token NUM 
%% 
expr : expr '+' expr 
     | expr '*' expr 
     | '(' expr ')' 
     | NUM 
     ; 
\end{verbatim}\end{quote} 
 
This grammar generates four shift/reduce conflicts. The description 
of state 8 reads as follows: 
 
\begin{quote}\begin{verbatim} 
state 8: 
 
        *** conflicts: 
 
        shift 4, reduce 1 on '*' 
        shift 5, reduce 1 on '+' 
 
        expr : expr '+' expr _  (1) 
        expr : expr _ '+' expr 
        expr : expr _ '*' expr 
 
        '*'     shift 4 
        '+'     shift 5 
        $end    reduce 1 
        ')'     reduce 1 
        .       error 
\end{verbatim}\end{quote} 
 
In this state, we have successfully parsed a \verb"+" expression (rule 1). 
When the next symbol is \verb"+" or \verb"*", we have the choice between the 
reduction and shifting the symbol. Using the default shift/reduce 
disambiguating rule, TP Yacc has resolved these conflicts in favour of shift. 
 
Now let us assume the above precedence definition: 
\begin{quote}\begin{verbatim} 
   %left '+' 
   %left '*' 
\end{verbatim}\end{quote} 
which gives \verb"*" higher precedence than \verb"+" and makes both operators 
left-associative. The rightmost terminal in rule 1 is \verb"+". Hence, given 
these precedence definitions, the first conflict will be resolved in favour 
of shift (\verb"*" has higher precedence than \verb"+"), while the second one 
is resolved in favour of reduce (\verb"+" is left-associative). 
 
Similar conflicts arise in state 7: 
 
\begin{quote}\begin{verbatim} 
state 7: 
 
        *** conflicts: 
 
        shift 4, reduce 2 on '*' 
        shift 5, reduce 2 on '+' 
 
        expr : expr '*' expr _  (2) 
        expr : expr _ '+' expr 
        expr : expr _ '*' expr 
 
        '*'     shift 4 
        '+'     shift 5 
        $end    reduce 2 
        ')'     reduce 2 
        .       error 
\end{verbatim}\end{quote} 
 
Here, we have successfully parsed a \verb"*" expression which may be followed 
by another \verb"+" or \verb"*" operator. Since \verb"*" is left-associative 
and has higher precedence than \verb"+", both conflicts will be resolved in 
favour of reduce. 
 
Of course, you can also have different operators on the same precedence 
level. For instance, consider the following extended version of the 
arithmetic expression grammar: 
 
\begin{quote}\begin{verbatim} 
%token NUM 
%left '+' '-' 
%left '*' '/' 
%% 
expr    : expr '+' expr 
        | expr '-' expr 
        | expr '*' expr 
        | expr '/' expr 
        | '(' expr ')' 
        | NUM 
        ; 
\end{verbatim}\end{quote} 
 
This puts all ``addition'' operators on the first and all ``multiplication'' 
operators on the second precedence level. All operators are left-associative; 
for instance, \verb"5+3-2" will be parsed as \verb"(5+3)-2". 
 
By default, TP Yacc assigns each rule the precedence of its rightmost 
terminal. This is a sensible decision in most cases. Occasionally, it 
may be necessary to overwrite this default choice and explicitly assign 
a precedence to a rule. This can be done by putting a precedence tag 
of the form 
\begin{quote}\begin{verbatim} 
   %prec symbol 
\end{verbatim}\end{quote} 
at the end of the corresponding rule which gives the rule the precedence 
of the specified symbol. For instance, to extend the expression grammar 
with a unary minus operator, giving it highest precedence, you may write: 
 
\begin{quote}\begin{verbatim} 
%token NUM 
%left '+' '-' 
%left '*' '/' 
%right UMINUS 
%% 
expr    : expr '+' expr 
        | expr '-' expr 
        | expr '*' expr 
        | expr '/' expr 
        | '-' expr      %prec UMINUS 
        | '(' expr ')' 
        | NUM 
        ; 
\end{verbatim}\end{quote} 
 
Note the use of the \verb"UMINUS" token which is not an actual input symbol 
but whose sole purpose it is to give unary minus its proper precedence. If 
we omitted the precedence tag, both unary and binary minus would have the 
same precedence because they are represented by the same input symbol. 
 
\subsection*{Error Handling} 
 
Syntactic error handling is a difficult area in the design of user-friendly 
parsers. Usually, you will not like to have the parser give up upon the 
first occurrence of an errorneous input symbol. Instead, the parser should 
recover from a syntax error, that is, it should try to find a place in the 
input where it can resume the parse. 
 
TP Yacc provides a general mechanism to implement parsers with error 
recovery. A special predefined \verb"error" token may be used in grammar rules 
to indicate positions where syntax errors might occur. When the parser runs 
into an error action (i.e., reads an errorneous input symbol) it prints out 
an error message and starts error recovery by popping its stack until it 
uncovers a state in which there is a shift action on the \verb"error" token. 
If there is no such state, the parser terminates with return value 1, 
indicating an unrecoverable syntax error. If there is such a state, the 
parser takes the shift on the \verb"error" token (pretending it has seen 
an imaginary \verb"error" token in the input), and resumes parsing in a 
special ``error mode.'' 
 
While in error mode, the parser quietly skips symbols until it can again 
perform a legal shift action. To prevent a cascade of error messages, the 
parser returns to its normal mode of operation only after it has seen 
and shifted three legal input symbols. Any additional error found after 
the first shifted symbol restarts error recovery, but no error message 
is printed. The TP Yacc library routine \verb"yyerrok" may be used to reset 
the parser to its normal mode of operation explicitly. 
 
For a simple example, consider the rule 
\begin{quote}\begin{verbatim} 
   stmt : error ';' { yyerrok; } 
\end{verbatim}\end{quote} 
and assume a syntax error occurs while a statement (nonterminal \verb"stmt") 
is parsed. The parser prints an error message, then pops its stack until it 
can shift the token \verb"error" of the error rule. Proceeding in error mode, 
it will skip symbols until it finds a semicolon, then reduces by the error 
rule. The call to \verb"yyerrok" tells the parser that we have recovered from 
the error and that it should proceed with the normal parse. This kind of 
``panic mode'' error recovery scheme works well when statements are always 
terminated with a semicolon. The parser simply skips the ``bad'' statement 
and then resumes the parse. 
 
Implementing a good error recovery scheme can be a difficult task; see 
Aho/Sethi/Ullman (1986) for a more comprehensive treatment of this topic. 
Schreiner and Friedman have developed a systematic technique to implement 
error recovery with Yacc which I found quite useful (I used it myself 
to implement error recovery in the TP Yacc parser); see Schreiner/Friedman 
(1985). 
 
\subsection*{Yacc Library} 
 
The TP Yacc library (\verb"YaccLib") unit provides some global declarations 
used by the parser routine \verb"yyparse", and some variables and utility 
routines which may be used to control the actions of the parser and to 
implement error recovery. See the file \verb"YACCLIB.PAS" for a description 
of these variables and routines. 
 
You can also modify the Yacc library unit (and/or the code template in the 
\verb"YYPARSE.COD" file) to customize TP Yacc to your target applications. 
 
\subsection*{Other Features} 
 
TP Yacc supports all additional language elements entitled as ``Old Features 
Supported But not Encouraged'' in the UNIX manual, which are provided for 
backward compatibility with older versions of (UNIX) Yacc: 
 
\begin{itemize} 
   \item 
      literals delimited by double quotes. 
   \item 
      multiple-character literals. Note that these are not treated as 
      character sequences but represent single tokens which are given a 
      symbolic integer code just like any other token identifier. However, 
      they will not be declared in the output file, so you have to make sure 
      yourself that the lexical analyzer returns the correct codes for these 
      symbols. E.g., you might explicitly assign token numbers by using a 
      definition like 
      \begin{quote}\begin{verbatim} 
   %token ':=' 257 
      \end{verbatim}\end{quote} 
      at the beginning of the Yacc grammar. 
   \item 
      \verb"\" may be used instead of \verb"%", i.e. \verb"\\" means 
      \verb"%%", \verb"\left" is the same as \verb"%left", etc. 
   \item 
      other synonyms: 
      \begin{itemize} 
         \item \verb"%<"                    for \verb"%left" 
         \item \verb"%>"                    for \verb"%right" 
         \item \verb"%binary" or \verb"%2"  for \verb"%nonassoc" 
         \item \verb"%term" or \verb"%0"    for \verb"%token" 
         \item \verb"%="                    for \verb"%prec" 
      \end{itemize} 
   \item 
      actions may also be written as \verb"= { ... }" or 
      \verb"= single-statement;" 
   \item 
      Turbo Pascal declarations (\verb"%{ ... %}") may be put at the 
      beginning of the rules section. They will be treated as local 
      declarations of the actions routine. 
\end{itemize} 
 
\subsection*{Implementation Restrictions} 
 
As with TP Lex, internal table sizes and the main memory available limit the 
complexity of source grammars that TP Yacc can handle. There is currently no 
possibility to change internal table sizes (apart from modifying the sources 
of TP Yacc itself) or to make use of extended memory. However, the maximum 
table sizes provided by TP Yacc are large enough to handle quite complex 
grammars (such as the Pascal grammar in the TP Yacc distribution). The 
current limits are 600 s (states), 2400 i (LR0 kernel items), 2400 t (shift 
and goto transitions) and 1200 r (reductions). 
 
The default stack size of the generated parsers is \verb"yymaxdepth = 1024", 
as declared in the TP Yacc library unit. This should be sufficient for any 
average application, but you can change the stack size by including a 
corresponding declaration in the definitions part of the Yacc grammar 
(or change the value in the \verb"YaccLib" unit). Note that right-recursive 
grammar rules may increase stack space requirements, so it is a good 
idea to use left-recursive rules wherever possible. 
 
\subsection*{Differences from UNIX Yacc} 
 
Major differences between TP Yacc and UNIX Yacc are listed below. 
 
\begin{itemize} 
   \item 
      TP Yacc produces output code for Turbo Pascal, rather than for C. 
   \item 
      TP Yacc does not support \verb"%union" definitions. Instead, a value 
      type is declared by specifying the type identifier itself as the tag of 
      a \verb"%token" or \verb"%type" definition. TP Yacc will automatically 
      generate an appropriate variant record type (\verb"YYSType") which is 
      capable of holding values of any of the types used in \verb"%token" and 
      \verb"%type". 
     
      Type checking is very strict. If you use type definitions, then 
      any symbol referred to in an action must have a type introduced 
      in a type definition. Either the symbol must have been assigned a 
      type in the definitions section, or the \verb"$" 
      notation must be used. The syntax of the \verb"%type" definition has 
      been changed slightly to allow definitions of the form 
      \begin{quote}\begin{verbatim} 
   %type  
      \end{verbatim}\end{quote} 
      (omitting the nonterminals) which may be used to declare types which 
      are not assigned to any grammar symbol, but are used with the 
      \verb"$<...>" construct. 
   \item 
      The parse tables constructed by this Yacc version are slightly greater 
      than those constructed by UNIX Yacc, since a reduce action will only be 
      chosen as the default action if it is the only action in the state. 
      In difference, UNIX Yacc chooses a reduce action as the default action 
      whenever it is the only reduce action of the state (even if there are 
      other shift actions). 
     
      This solves a bug in UNIX Yacc that makes the generated parser start 
      error recovery too late with certain types of error productions (see 
      also Schreiner/Friedman, {\em Introduction to compiler construction with 
      UNIX,\/} 1985). Also, errors will be caught sooner in most cases where 
      UNIX Yacc would carry out an additional (default) reduction before 
      detecting the error. 
   \item 
      Library routines are named differently from the UNIX version (e.g., 
      the \verb"yyerrlab" routine takes the place of the \verb"YYERROR" 
      macro of UNIX Yacc), and, of course, all macros of UNIX Yacc 
      (\verb"YYERROR", \verb"YYACCEPT", etc.) had to be implemented as 
      procedures. 
\end{itemize} 
 
\end{document}