www.pudn.com > 12cocorc.zip > COCO.USE
Quick start notes on the use of Coco/R (C/C++ version)
======================================================
These notes apply directly to the MS-DOS versions of Coco/R (C/C++).
We know that you can't wait to begin!
Installation
============
Please read the file README.1ST for details of how to install the system.
Getting going
=============
Examples of input for Coco/R can be found in the case study source files in
this kit. It is suggested that you experiment with these before developing
your own applications.
For each application, the user has to prepare a text file to contain the
attributed grammar. Points to be aware of are that
- it is sensible to work within a "project directory" (say C:\WORK) and not
within the "system directory" (C:\COCO);
- text file preparation must be done with an ASCII editor, and not with a
word processor;
- by convention the file is named with a primary name that is based on the
grammar's goal symbol, and with an "ATG" extension, for example CALC.ATG.
Running Coco/R
==============
To start Coco/R, type COCOR, adding the name of the file that contains your
attribute grammar:
COCOR TEST.ATG
A second parameter can be supplied to set compiler options, for example:
COCOR TEST.ATG /CS
or, if you prefer the Unix form
COCOR -CS TEST.ATG
For those who need reminding, the command
COCOR
with no parameters will print a help screen something like the following, and
then abort.
Coco/R will respond with a screen something like:
Coco/R Compiler-Compiler V1.xx (C version)
Released by Frankie Arzu
Usage: COCOR [(/|-)Options]
Example: COCOR -C -S Test.atg
Options:
A - Trace automaton C - Generate compiler module
D - Include source debugging information (#line)
F - Give Start and Follower sets G - Print top-down graph
L - Force listing O - Terminal conditions use OR only
P - Generate parser only S - Print symbol table
T - Grammar tests only - no code generated
X - Generate C++ with classes
Z - Force extensions to .hpp and .cpp files
Environment variables:
CRFRAMES: Search directory for frames file. If not specified,
frames must be in the working directory.
CRHEXT: Extension for the '.h' generated files. If not specified,
'.h' for C, '.hpp' for C++ (Dos and Unix).
CRCEXT: Extension for the '.c' generated files. If not specified,
'.c' for C, '.cpp' for C++ (Dos and Unix).
You can also set up these options by using -Dvarname=value\n
Input to Coco/R
===============
Coco/R takes five (or six) files as input, and produces six (or seven) files
as output. These output files can then be combined with a main program and
any other auxiliary files needed, so as to produce a complete compiler.
The input files needed are
grammar.ATG - an attributed grammar (grammar used here for illustration)
PARSER_H.FRM
PARSER_C.FRM - the frame file for parser generation
SCAN_H.FRM
SCAN_C.FRM - the frame file for scanner generation
optionally
grammar.FRM - an application specific frame file for complete compiler
generation
A "generic" version of this last frame file is given as
COMPILER.FRM - the generic frame file for complete compiler generation
and this is intended to act as a model for your own applications, a process
that will be helped by studying various application specific frame files
supplied in the kit. (The other frame files are effectively standardized and
should require little if any alteration; they are fairly resilient, and any
particular configuring for specific applications will require some experience
of the internal workings of Coco/R itself.
When using Coco/R, the frame files are assumed to exist in directories
specified by the environment variable CRFRAMES. To set this variable, use
the SET command, for example
SET CRFRAMES=C:\COCO\FRAMES (for dos)
set CRFRAMES=/usr/lib/coco/frames; export CRFRAMES (for unix)
You may like to add this line to your AUTOEXEC.BAT file, so that it takes
effect every time you start your computer.
Unix users would set this variable something like
set CRFRAMES=/usr/lib/coco/frames; export CRFRAMES (for unix)
and possibly add this to the .profile file or equivalent.
As from version 1.08 you can also set these variables using a command line
option, for example
-DCRFRAMES=/usr/lib/coco/frames
The frame file for the compiler itself is named as grammar.FRM, where grammar
is the grammar name. This is searched for in the directory of the input file.
If it is not found, a search is made for the generic COMPILER.FRM in the
directories specified in the environment variable CRFRAMES. The basic
compiler frame file (COMPILER.FRM) that comes with the kit will allow simple
applications to be generated immediately, but it is sensible to copy this
basic file to the project directory, and then to rename and edit it to suit
the application.
Output from Coco/R
==================
The generated files are placed in the same directory as the grammar file.
Coco/R for C generates the files
grammarS.C and .H generated FSA scanner
grammarP.C and .H generated recursive descent parser
grammarC.H token numbers used in scanner and parser
grammarE.H error numbers and corresponding message texts
grammar.LST compilation history (if the /L option is used)
and, optionally, a file
grammar.C generated main module for the complete compiler
where grammar is the name of the attributed grammar (this grammar is sensibly
stored in the file grammar.ATG).
Coco/R for C++ version generates similar files with extensions .CPP and .HPP.
If .C/.H and/or .CPP/.HPP extensions are not acceptable to your compiler,
the extensions may alternatively be specified by defining the further
environment variables CRHEXT and CRCEXT, for example.
SET CRHEXT=HHH
Hopefully, the system should produce code acceptable to most C/C++ compilers.
A list of those with which it is known to work appears in the file DOCS\COCO.
Compiling the generated compiler
================================
Once the components of the application have been generated, they are ready to
be compiled by your C or C++ compiler. It is assumed that you are familiar
with the process of compiling such programs.
For a very simple MS-DOS application using the Borland C++ system, one might
be able to use commands like
BCC -ml -IC:\COCO\CPLUS2 -c CALC.CPP CALCS.CPP CALCP.CPP
BCC -ml -LC:\COCO\CPLUS2 -eCALC.EXE CALC.OBJ CALCS.OBJ CALCP.OBJ CR_LIB.LIB
but for larger applications a better mechanism is to use the MAKE command in
conjunction with a "makefile". Notice that if you are using the C++ system
you will also need to incorporate the base class library found in the
directory CPLUS2 (please see the README.1ST file for installation details).
If you are using Borland C++ you may need to set up a configuration file
TURBOC.CFG to reflect the correct paths and options for your compiler.
Coco/R options and pragmas
==========================
As implied above, various didactic output and useful variations may be invoked
by the use of compiler pragmas in the input grammar, or by the use of a
command line option. Compiler pragmas take the form
$String
and the optional command line parameter takes the form
/String or -String
where String contains one or more of the letters ACDFGLPSTXZ in either upper
or lower case.
The C D L P T X and Z options are generally useful
C - (Compiler) Generate complete compiler driving module, including source
listing featuring interleaved error message reporting. To use this
option the file COMPILER.FRM (or grammar.FRM) must be available.
D - (Debug) Generate source line numbers (#line) for each semantic action.
This causes the semantic actions in the generated program to be labelled
with reference to the original .ATG file, so that one can use a symbolic
debuggers on the .ATG file.
L - (Listing) Force listing
Normally the listing of the grammar is suppressed if the compilation
is error free; any errors are reported in a fairly cryptic form.
P - (Parser only) Suppress generation of the scanner.
Regeneration of the scanner is often tedious, and results in no changes
from the one first generated. This option must be used with care. It
can also be used if a hand-crafted scanner is to be supplied (see the
notes on the use of hand-crafted scanners in the file COCOL).
T - (Tests) Suppress generation of scanner and parser.
If this option is exercised, the generation of the scanner and parser
is suppressed, but the attributed grammar is parsed and checked for
grammatical inconsistencies, LL(1) violations and so on.
X - Generate parsers and scanners in the form of C++ classes.
Z - Use .CPP/.HPP as extensions in preference to .C/.H.
The following options are really intended to help with debugging/teaching
applications. Their effect may best be seen by judicious experimentation.
A - Trace automaton
F - Give First and Follow sets for each non-terminal in the grammar
G - Print top-down graph
S - Print symbol table
Grammar checks
==============
Coco/R performs several tests to check if the grammar is well-formed. If one
of the following error messages is produced, no compiler parts are generated.
NO PRODUCTION FOR X
The nonterminal X has been used, but there is no production for it.
X CANNOT BE REACHED
There is a production for nonterminal X, but X cannot be derived from the
start symbol.
X CANNOT BE DERIVED TO TERMINALS
For example, if there is a production X = "(" X ")" .
X - Y, Y - X
X and Y are nonterminals with circular derivations.
TOKENS X AND Y CANNOT BE DISTINGUISHED
The terminal symbols X and Y are declared to have the same structure,
e.g.
integer = digit { digit } .
real = digit { digit } ["." { digit } ].
In this example, a digit string appears ambiguously to be recognized as
an integer or as a real.
The following messages are warnings. They may indicate an error but they may
also describe desired effects. The generated compiler parts may still be
valid. If an LL(1) error is reported for a construct X, one must be aware
that the generated parser will choose the first of several possible
alternatives for X.
X NULLABLE
X can be derived to the empty string, e.g. X = { Y } .
LL(1) ERROR IN X:Y IS START OF MORE THAN ONE ALTERNATIVE
Several alternatives in the production of X start with the terminal Y
e.g.
Statement = ident ":=" Expression | ident [ ActualParameters ] .
LL(1) ERROR IN X:Y IS START AND SUCCESSOR OF NULLABLE STRUCTURE
Nullable structures are [ ... ] and { ... }
e.g.
qualident = [ ident "." ] ident .
Statement = "IF" Expression "THEN" Statement [ "ELSE" Statement ] .
The ELSE at the start of the else part may also be a successor of a
statement. This LL(1) conflict is known under the name "dangling else".
The Parser Interface
====================
A parser generated by Coco/R defines various routines that may be called from
an application. As for the scanner, the form of the interface depends on the
host system. The parser generated by Coco/R for C has the following simple
interface:
#define MinErrDist 2
void Parse();
/* Parses the source */
int Successful();
/* Returns 1 if no errors have been recorded while parsing */
void LexString(char *Lex, int Size);
/* Retrieves at most Size characters from the most recently parsed
token into Lex */
void LexName(char *Lex, int Size);
/* Retrieves at most Size characters from the most recently parsed
token into Lex, converted to upper case if IGNORE CASE was specified */
void LookAheadString(char *Lex, int Size);
/* Retrieves at most Size characters from the lookahead token into Lex */
void LookAheadName(char *Lex, int Size);
/* Retrieves at most Size characters from the lookahead token into Lex,
converted to upper case if IGNORE CASE was specified */
void SynError(int errNo);
/* Reports syntax error denoted by errNo */
void SemError(int errNo);
/* Reports semantic error denoted by errNo */
For the C++ version, it effectively takes the form below. (There is actually
an underlying class hierarchy, and the declarations are really slightly
different from those presented here).
class grammarParser
{ public:
grammarParser(AbsScanner *S, CRError *E);
// Constructs parser associated with scanner S and error reporter E
void Parse();
// Parses the source
int Successful();
// Returns 1 if no errors have been recorded while parsing
private:
void LexString(char *Lex, int Size);
// Retrieves at most Size characters from the most recently parsed
// token into Lex
void LexName(char *Lex, int Size);
// Retrieves at most Size characters from the most recently parsed
// token into Lex, converted to upper case if IGNORE CASE was specified
long LexPos();
// Retrieves the position of the most recently parsed token
void LookAheadString(char *Lex, int Size);
// Retrieves at most Size characters from the lookahead token into Lex
void LookAheadName(char *Lex, int Size);
// Retrieves at most Size characters from the lookahead token into Lex,
// converted to upper case if IGNORE CASE was specified
long LookAheadPos();
// Retrieves the position of the lookahead token token
void SynError(int errNo);
// Reports syntax error denoted by errNo
void SemError(int errNo);
// Reports semantic error denoted by errNo
// ... Prototypes of functions for parsing each non-terminal in grammar
};
The functionality provides for the parser to
- initiate the parse for the goal symbol by calling Parse().
- investigate whether the parse succeeded by calling Successful().
- report on the presence of syntactic and semantic errors by calling SynError
and SemError.
- obtain the lexeme value of a particular token in one of four ways
(LexString, LexName, LookAheadString and LookAheadName). Calls to
LexString are most common; the others are used for special variations.
A tailored frame file can be supplied, from which Coco/R can generate a main
program if the $C pragma/option is used. Examples of this can be found in the
kit as well.
The Scanner Interface
=====================
The scanner generated by Coco/R for C has the following interface (the C++
version is somewhat different)
int S_src; /* source file */
int S_Line, S_Col; /* line and column of current symbol */
int S_Len; /* length of current symbol */
long S_Pos; /* file position of current symbol */
int S_NextLine; /* line of lookahead symbol */
int S_NextCol; /* column of lookahead symbol */
int S_NextLen; /* length of lookahead symbol */
long S_NextPos; /* file position of lookahead symbol */
int S_CurrLine; /* current input line (may be higher than line) */
long S_lineStart; /* start position of current line */
int S_Get();
/* Gets next symbol from source file */
void S_Reset();
/* Reads and stores source file internally */
/* Assert: S_src has been opened */
void S_GetString(long pos, int len, char *s);
/* Retrieves exact string of max length len at position pos in source
file */
void S_GetName(long pos, int len, char *s);
/* Retrieves an string of max length len at position pos in source file.
Each character in the string will be capitalized if IGNORE CASE is
specified */
unsigned char S_CurrentCh(long pos);
/* Returns current character at specified file position */
Notes
-----
It is rarely necessary to make use of any of this interface directly. The
parser interface discussed above exports most of the functionality that is
required when actions are required to retrieve token information.
The variables S_Line, S_Col, S_Pos, S_Len are apposite for the most recently
parsed token.
The variables S_NextLine, S_NextCol, S_NextPos, S_NextLen are apposite for the
most recently scanned token (the look-ahead token retrieved by the most recent
call to S_Get).
Tab characters (Ascii 9) are assumed to correspond to 8 character tab stops.
Although Borland C's editor allows the user to change the tab size to any
number (default 3), Coco/R uses 8 character long tabs for compatibility with
UNIX and DOS. If you wish to change the tab size, set the defined constant
TAB_SIZE in the frame file scan_c.frm to the size you prefer. Using an
incorrect tab size will cause the scanner to report the wrong column of a
token (S_Col, S_NextCol).
The main module is responsible for opening the source file S_src prior to
calling the parser. If you are using MS-DOS add O_BINARY to the open mode
options. Don't let the compiler convert CR/LF to LF, as this will cause an
invalid file position for reporting errors.
Reset is called by the parser to initialize the scanner. Reset reads the
entire source into a large internal buffer, thus improving the efficiency
of the scanner very markedly.
S_Get is called repeatedly from the parser, to get the next token from the
source text.
S_GetString and S_GetName can be used to obtain the text of a token starting
at position pos and having length len.
For the C++ version, the interface is effectively that shown below, although
there is actually an underlying class hierarchy, so that the declarations are
not exactly the same as those shown. Once again, it is rarely necessary to
make use of any of this interface directly.
class grammarScanner
{ public:
grammarScanner(int SourceFile, int ignoreCase);
// Constructs scanner for grammar and associates this with a
// previously opened SourceFile. Specifies whether to IGNORE CASE
int Get();
// Retrieves next token from source
void GetString(Token *Sym, char *Buffer, int Max);
// Retrieves at most Max characters from Sym into Buffer
void GetString(long Pos, char *Buffer, int Max);
// Retrieves at most Max characters from Pos into Buffer
void GetName(Token *Sym, char *Buffer, int Max);
// Retrieves at most Max characters from Sym into Buffer
// Buffer is capitalized if IGNORE CASE was specified
long GetLine(long Pos, char *Line, int Max);
// Retrieves at most Max characters (or until next line break)
// from position Pos in source file into Line
};
Automatically generated error explanations are written to a file
GrammarE.H by Coco/R in the following form:
"EOF expected",
"ident expected",
"string expected"
"number expected",
...
This text can then be merged into a program to procedure textual error
messages. This is done automatically if the $C pragma (/C command line
option) is used.
Bootstrapping Coco
==================
The parser and scanner used by Coco/R were themselves generated by a
bootstrap process; if Coco/R is given the grammar CR.ATG as input, it will
reproduce the files CRS.C, CRS.H, CRP.C, CRP.H and CRE.H, CRC.H. It can
also regenerate its own main program from the file SOURCES\CR.FRM if the $C
pragma is used.
This means that Coco/R can be extended and corrected by changing its
grammar and recompiling itself. If you feel tempted to do this, please
make sure that you have kept copies of the original system in case you
destroy or corrupt the scanner and parser!
The TASTE package
=================
The distribution kit contains, in the "taste" and "taste_cp" directories,
three related applications of Coco/R: a compiler/interpreter, a
cross-reference generator, and a pretty-printer, for a simple Pascal-like
block structured language. New users will find much of interest in these
applications, which exemplify the use of symbol table construction, code
generation, error handling and so on. Versions are given for both straight
C and also for C++, where the various support modules are all defined as a
simple set of hierarchical classes.
Trademarks
==========
All trademarks are acknowledged.
=END=