www.pudn.com > 12cocorc.zip > COCO.USE


Quick start notes on the use of Coco/R (C/C++ version) 
====================================================== 
 
These notes apply directly to the MS-DOS versions of Coco/R (C/C++). 
 
We know that you can't wait to begin! 
 
Installation 
============ 
 
Please read the file README.1ST for details of how to install the system. 
 
Getting going 
============= 
 
Examples of input for Coco/R can be found in the case study source files in 
this kit.  It is suggested that you experiment with these before developing 
your own applications. 
 
For each application, the user has to prepare a text file to contain the 
attributed grammar.  Points to be aware of are that 
 
 - it is sensible to work within a "project directory" (say C:\WORK) and not 
   within the "system directory" (C:\COCO); 
 
 - text file preparation must be done with an ASCII editor, and not with a 
   word processor; 
 
 - by convention the file is named with a primary name that is based on the 
   grammar's goal symbol, and with an "ATG" extension, for example CALC.ATG. 
 
Running Coco/R 
============== 
 
To start Coco/R, type COCOR, adding the name of the file that contains your 
attribute grammar: 
 
          COCOR   TEST.ATG 
 
A second parameter can be supplied to set compiler options, for example: 
 
          COCOR   TEST.ATG   /CS 
 
or, if you prefer the Unix form 
 
          COCOR   -CS  TEST.ATG 
 
For those who need reminding, the command 
 
          COCOR 
 
with no parameters will print a help screen something like the following, and 
then abort. 
 
Coco/R will respond with a screen something like: 
 
     Coco/R      Compiler-Compiler V1.xx (C version) 
     Released by Frankie Arzu 
 
     Usage: COCOR  [(/|-)Options]  
     Example: COCOR -C -S Test.atg 
 
     Options: 
     A  - Trace automaton               C  - Generate compiler module 
     D  - Include source debugging information (#line) 
     F  - Give Start and Follower sets  G  - Print top-down graph 
     L  - Force listing                 O  - Terminal conditions use OR only 
     P  - Generate parser only          S  - Print symbol table 
     T  - Grammar tests only - no code generated 
     X  - Generate C++ with classes 
     Z  - Force extensions to .hpp and .cpp files 
 
     Environment variables: 
     CRFRAMES:  Search directory for frames file. If not specified, 
                frames must be in the working directory. 
     CRHEXT:    Extension for the '.h' generated files. If not specified, 
                '.h' for C, '.hpp' for C++ (Dos and Unix). 
     CRCEXT:    Extension for the '.c' generated files. If not specified, 
                '.c' for C, '.cpp' for C++ (Dos and Unix). 
     You can also set up these options by using -Dvarname=value\n 
 
 
Input to Coco/R 
=============== 
 
Coco/R takes five (or six) files as input, and produces six (or seven) files 
as output.  These output files can then be combined with a main program and 
any other auxiliary files needed, so as to produce a complete compiler. 
 
The input files needed are 
 
   grammar.ATG  -  an attributed grammar  (grammar used here for illustration) 
   PARSER_H.FRM 
   PARSER_C.FRM -  the frame file for parser generation 
   SCAN_H.FRM 
   SCAN_C.FRM   -  the frame file for scanner generation 
 
optionally 
 
   grammar.FRM  -  an application specific frame file for complete compiler 
                   generation 
 
A "generic" version of this last frame file is given as 
 
   COMPILER.FRM  -  the generic frame file for complete compiler generation 
 
and this is intended to act as a model for your own applications, a process 
that will be helped by studying various application specific frame files 
supplied in the kit.  (The other frame files are effectively standardized and 
should require little if any alteration; they are fairly resilient, and any 
particular configuring for specific applications will require some experience 
of the internal workings of Coco/R itself. 
 
When using Coco/R, the frame files are assumed to exist in directories 
specified by the environment variable CRFRAMES.  To set this variable, use 
the SET command, for example 
 
   SET CRFRAMES=C:\COCO\FRAMES                          (for dos) 
   set CRFRAMES=/usr/lib/coco/frames; export CRFRAMES   (for unix) 
 
You may like to add this line to your AUTOEXEC.BAT file, so that it takes 
effect every time you start your computer. 
 
Unix users would set this variable something like 
 
   set CRFRAMES=/usr/lib/coco/frames; export CRFRAMES   (for unix) 
 
and possibly add this to the .profile file or equivalent. 
 
As from version 1.08 you can also set these variables using a command line 
option, for example 
 
              -DCRFRAMES=/usr/lib/coco/frames 
 
The frame file for the compiler itself is named as grammar.FRM, where grammar 
is the grammar name.  This is searched for in the directory of the input file. 
If it is not found, a search is made for the generic COMPILER.FRM in the 
directories specified in the environment variable CRFRAMES.  The basic 
compiler frame file (COMPILER.FRM) that comes with the kit will allow simple 
applications to be generated immediately, but it is sensible to copy this 
basic file to the project directory, and then to rename and edit it to suit 
the application. 
 
Output from Coco/R 
================== 
 
The generated files are placed in the same directory as the grammar file. 
 
Coco/R for C generates the files 
 
  grammarS.C and .H    generated FSA scanner 
  grammarP.C and .H    generated recursive descent parser 
  grammarC.H           token numbers used in scanner and parser 
  grammarE.H           error numbers and corresponding message texts 
  grammar.LST          compilation history (if the /L option is used) 
 
and, optionally, a file 
 
  grammar.C            generated main module for the complete compiler 
 
where grammar is the name of the attributed grammar (this grammar is sensibly 
stored in the file grammar.ATG). 
 
Coco/R for C++ version generates similar files with extensions .CPP and .HPP. 
 
If .C/.H and/or .CPP/.HPP extensions are not acceptable to your compiler, 
the extensions may alternatively be specified by defining the further 
environment variables CRHEXT and CRCEXT, for example. 
 
          SET CRHEXT=HHH 
 
Hopefully, the system should produce code acceptable to most C/C++ compilers. 
A list of those with which it is known to work appears in the file DOCS\COCO. 
 
Compiling the generated compiler 
================================ 
 
Once the components of the application have been generated, they are ready to 
be compiled by your C or C++ compiler.  It is assumed that you are familiar 
with the process of compiling such programs. 
 
For a very simple MS-DOS application using the Borland C++ system, one might 
be able to use commands like 
 
   BCC -ml -IC:\COCO\CPLUS2 -c CALC.CPP CALCS.CPP CALCP.CPP 
   BCC -ml -LC:\COCO\CPLUS2 -eCALC.EXE CALC.OBJ CALCS.OBJ CALCP.OBJ CR_LIB.LIB 
 
but for larger applications a better mechanism is to use the MAKE command in 
conjunction with a "makefile".  Notice that if you are using the C++ system 
you will also need to incorporate the base class library found in the 
directory CPLUS2 (please see the README.1ST file for installation details). 
 
If you are using Borland C++ you may need to set up a configuration file 
TURBOC.CFG to reflect the correct paths and options for your compiler. 
 
Coco/R options and pragmas 
========================== 
 
As implied above, various didactic output and useful variations may be invoked 
by the use of compiler pragmas in the input grammar, or by the use of a 
command line option.  Compiler pragmas take the form 
 
       $String 
 
and the optional command line parameter takes the form 
 
       /String   or    -String 
 
where String contains one or more of the letters ACDFGLPSTXZ in either upper 
or lower case. 
 
The C D L P T X and Z options are generally useful 
 
C  - (Compiler) Generate complete compiler driving module, including source 
     listing featuring interleaved error message reporting.  To use this 
     option the file COMPILER.FRM (or grammar.FRM) must be available. 
 
D  - (Debug) Generate source line numbers (#line) for each semantic action. 
     This causes the semantic actions in the generated program to be labelled 
     with reference to the original .ATG file, so that one can use a symbolic 
     debuggers on the .ATG file. 
 
L  - (Listing) Force listing 
     Normally the listing of the grammar is suppressed if the compilation 
     is error free; any errors are reported in a fairly cryptic form. 
 
P  - (Parser only)  Suppress generation of the scanner. 
     Regeneration of the scanner is often tedious, and results in no changes 
     from the one first generated.  This option must be used with care.  It 
     can also be used if a hand-crafted scanner is to be supplied (see the 
     notes on the use of hand-crafted scanners in the file COCOL). 
 
T  - (Tests) Suppress generation of scanner and parser. 
     If this option is exercised, the generation of the scanner and parser 
     is suppressed, but the attributed grammar is parsed and checked for 
     grammatical inconsistencies, LL(1) violations and so on. 
 
X  - Generate parsers and scanners in the form of C++ classes. 
 
Z  - Use .CPP/.HPP as extensions in preference to .C/.H. 
 
The following options are really intended to help with debugging/teaching 
applications.  Their effect may best be seen by judicious experimentation. 
 
A  - Trace automaton 
 
F  - Give First and Follow sets for each non-terminal in the grammar 
 
G  - Print top-down graph 
 
S  - Print symbol table 
 
 
Grammar checks 
============== 
 
Coco/R performs several tests to check if the grammar is well-formed.  If one 
of the following error messages is produced, no compiler parts are generated. 
 
   NO PRODUCTION FOR X 
     The nonterminal X has been used, but there is no production for it. 
 
   X CANNOT BE REACHED 
     There is a production for nonterminal X, but X cannot be derived from the 
     start symbol. 
 
   X CANNOT BE DERIVED TO TERMINALS 
     For example, if there is a production X = "(" X ")" . 
 
   X - Y, Y - X 
     X and Y are nonterminals with circular derivations. 
 
   TOKENS X AND Y CANNOT BE DISTINGUISHED 
     The terminal symbols X and Y are declared to have the same structure, 
     e.g. 
 
       integer = digit { digit } . 
       real = digit { digit } ["." { digit } ]. 
 
     In this example, a digit string appears ambiguously to be recognized as 
     an integer or as a real. 
 
 
The following messages are warnings.  They may indicate an error but they may 
also describe desired effects.  The generated compiler parts may still be 
valid.  If an LL(1) error is reported for a construct X, one must be aware 
that the generated parser will choose the first of several possible 
alternatives for X. 
 
X NULLABLE 
   X can be derived to the empty string, e.g. X = { Y } . 
 
LL(1) ERROR IN X:Y IS START OF MORE THAN ONE ALTERNATIVE 
   Several alternatives in the production of X start with the terminal Y 
   e.g. 
 
      Statement = ident ":=" Expression | ident [ ActualParameters ] . 
  
LL(1) ERROR IN X:Y IS START AND SUCCESSOR OF NULLABLE STRUCTURE 
   Nullable structures are [ ... ] and { ... } 
   e.g. 
 
      qualident = [ ident "." ] ident . 
      Statement = "IF" Expression "THEN" Statement [ "ELSE" Statement ] . 
 
   The ELSE at the start of the else part may also be a successor of a 
   statement.  This LL(1) conflict is known under the name "dangling else". 
 
The Parser Interface 
==================== 
 
A parser generated by Coco/R defines various routines that may be called from 
an application.  As for the scanner, the form of the interface depends on the 
host system.  The parser generated by Coco/R for C has the following simple 
interface: 
 
    #define MinErrDist 2 
 
    void Parse(); 
    /* Parses the source */ 
 
    int Successful(); 
    /* Returns 1 if no errors have been recorded while parsing */ 
 
    void LexString(char *Lex, int Size); 
    /* Retrieves at most Size characters from the most recently parsed 
       token into Lex */ 
 
    void LexName(char *Lex, int Size); 
    /* Retrieves at most Size characters from the most recently parsed 
       token into Lex, converted to upper case if IGNORE CASE was specified */ 
 
    void LookAheadString(char *Lex, int Size); 
    /* Retrieves at most Size characters from the lookahead token into Lex */ 
 
    void LookAheadName(char *Lex, int Size); 
    /* Retrieves at most Size characters from the lookahead token into Lex, 
       converted to upper case if IGNORE CASE was specified */ 
 
    void SynError(int errNo); 
    /* Reports syntax error denoted by errNo */ 
 
    void SemError(int errNo); 
    /* Reports semantic error denoted by errNo */ 
 
For the C++ version, it effectively takes the form below. (There is actually 
an underlying class hierarchy, and the declarations are really slightly 
different from those presented here). 
 
  class grammarParser 
  { public: 
      grammarParser(AbsScanner *S, CRError *E); 
      // Constructs parser associated with scanner S and error reporter E 
 
      void Parse(); 
      // Parses the source 
 
      int Successful(); 
      // Returns 1 if no errors have been recorded while parsing 
 
    private: 
      void LexString(char *Lex, int Size); 
      // Retrieves at most Size characters from the most recently parsed 
      // token into Lex 
 
      void LexName(char *Lex, int Size); 
      // Retrieves at most Size characters from the most recently parsed 
      // token into Lex, converted to upper case if IGNORE CASE was specified 
 
      long LexPos(); 
      // Retrieves the position of the most recently parsed token 
 
      void LookAheadString(char *Lex, int Size); 
      // Retrieves at most Size characters from the lookahead token into Lex 
 
      void LookAheadName(char *Lex, int Size); 
      // Retrieves at most Size characters from the lookahead token into Lex, 
      // converted to upper case if IGNORE CASE was specified 
 
      long LookAheadPos(); 
      // Retrieves the position of the lookahead token token 
 
      void SynError(int errNo); 
      // Reports syntax error denoted by errNo 
 
      void SemError(int errNo); 
      // Reports semantic error denoted by errNo 
 
      // ... Prototypes of functions for parsing each non-terminal in grammar 
  }; 
 
The functionality provides for the parser to 
 
 - initiate the parse for the goal symbol by calling Parse(). 
 - investigate whether the parse succeeded by calling Successful(). 
 - report on the presence of syntactic and semantic errors by calling SynError 
   and SemError. 
 - obtain the lexeme value of a particular token in one of four ways 
   (LexString, LexName, LookAheadString and LookAheadName).  Calls to 
   LexString are most common; the others are used for special variations. 
 
A tailored frame file can be supplied, from which Coco/R can generate a main 
program if the $C pragma/option is used.  Examples of this can be found in the 
kit as well. 
 
The Scanner Interface 
===================== 
 
The scanner generated by Coco/R for C has the following interface (the C++ 
version is somewhat different) 
 
   int  S_src;         /* source file */ 
   int  S_Line, S_Col; /* line and column of current symbol */ 
   int  S_Len;         /* length of current symbol */ 
   long S_Pos;         /* file position of current symbol */ 
   int  S_NextLine;    /* line of lookahead symbol */ 
   int  S_NextCol;     /* column of lookahead symbol */ 
   int  S_NextLen;     /* length of lookahead symbol */ 
   long S_NextPos;     /* file position of lookahead symbol */ 
   int  S_CurrLine;    /* current input line (may be higher than line) */ 
   long S_lineStart;   /* start position of current line */ 
 
   int S_Get(); 
   /* Gets next symbol from source file */ 
 
   void S_Reset(); 
   /* Reads and stores source file internally */ 
   /* Assert: S_src has been opened */ 
 
   void S_GetString(long pos, int len, char *s); 
   /* Retrieves exact string of max length len at position pos in source 
      file */ 
 
   void S_GetName(long pos, int len, char *s); 
   /* Retrieves an string of max length len at position pos in source file. 
      Each character in the string will be capitalized if IGNORE CASE is 
      specified */ 
 
   unsigned char S_CurrentCh(long pos); 
   /* Returns current character at specified file position */ 
 
 
Notes 
----- 
 
It is rarely necessary to make use of any of this interface directly.  The 
parser interface discussed above exports most of the functionality that is 
required when actions are required to retrieve token information. 
 
The variables S_Line, S_Col, S_Pos, S_Len are apposite for the most recently 
parsed token. 
 
The variables S_NextLine, S_NextCol, S_NextPos, S_NextLen are apposite for the 
most recently scanned token (the look-ahead token retrieved by the most recent 
call to S_Get). 
 
Tab characters (Ascii 9) are assumed to correspond to 8 character tab stops. 
Although Borland C's editor allows the user to change the tab size to any 
number (default 3), Coco/R uses 8 character long tabs for compatibility with 
UNIX and DOS.  If you wish to change the tab size, set the defined constant 
TAB_SIZE in the frame file scan_c.frm to the size you prefer.  Using an 
incorrect tab size will cause the scanner to report the wrong column of a 
token (S_Col, S_NextCol). 
 
The main module is responsible for opening the source file S_src prior to 
calling the parser.  If you are using MS-DOS add O_BINARY to the open mode 
options.  Don't let the compiler convert CR/LF to LF, as this will cause an 
invalid file position for reporting errors. 
 
Reset is called by the parser to initialize the scanner.  Reset reads the 
entire source into a large internal buffer, thus improving the efficiency 
of the scanner very markedly. 
 
S_Get is called repeatedly from the parser, to get the next token from the 
source text. 
 
S_GetString and S_GetName can be used to obtain the text of a token starting 
at position pos and having length len. 
 
For the C++ version, the interface is effectively that shown below, although 
there is actually an underlying class hierarchy, so that the declarations are 
not exactly the same as those shown.  Once again, it is rarely necessary to 
make use of any of this interface directly. 
 
  class grammarScanner 
  { public: 
      grammarScanner(int SourceFile, int ignoreCase); 
      // Constructs scanner for grammar and associates this with a 
      // previously opened SourceFile.  Specifies whether to IGNORE CASE 
 
      int Get(); 
      // Retrieves next token from source 
 
      void GetString(Token *Sym, char *Buffer, int Max); 
      // Retrieves at most Max characters from Sym into Buffer 
 
      void GetString(long Pos, char *Buffer, int Max); 
      // Retrieves at most Max characters from Pos into Buffer 
 
      void GetName(Token *Sym, char *Buffer, int Max); 
      // Retrieves at most Max characters from Sym into Buffer 
      // Buffer is capitalized if IGNORE CASE was specified 
 
      long GetLine(long Pos, char *Line, int Max); 
      // Retrieves at most Max characters (or until next line break) 
      // from position Pos in source file into Line 
 
  }; 
 
Automatically generated error explanations are written to a file 
GrammarE.H by Coco/R in the following form: 
 
           "EOF expected", 
           "ident expected", 
           "string expected" 
           "number expected", 
         ... 
 
This text can then be merged into a program to procedure textual error 
messages.  This is done automatically if the $C pragma (/C command line 
option) is used. 
 
Bootstrapping Coco 
================== 
 
The parser and scanner used by Coco/R were themselves generated by a 
bootstrap process; if Coco/R is given the grammar CR.ATG as input, it will 
reproduce the files CRS.C, CRS.H, CRP.C, CRP.H and CRE.H, CRC.H.  It can 
also regenerate its own main program from the file SOURCES\CR.FRM if the $C 
pragma is used. 
 
This means that Coco/R can be extended and corrected by changing its 
grammar and recompiling itself.  If you feel tempted to do this, please 
make sure that you have kept copies of the original system in case you 
destroy or corrupt the scanner and parser! 
 
 
The TASTE package 
================= 
 
The distribution kit contains, in the "taste" and "taste_cp" directories, 
three related applications of Coco/R: a compiler/interpreter, a 
cross-reference generator, and a pretty-printer, for a simple Pascal-like 
block structured language.  New users will find much of interest in these 
applications, which exemplify the use of symbol table construction, code 
generation, error handling and so on.  Versions are given for both straight 
C and also for C++, where the various support modules are all defined as a 
simple set of hierarchical classes. 
 
Trademarks 
========== 
 
All trademarks are acknowledged. 
 
=END=