www.pudn.com > xvoice-0.8.1.rar > CODEGUIDE
The code has been fairly significantly warmed over between 0.6 and 0.8. Here's
an outline of the new design &amt; goals.
- "Target" and "MainWin" have been abstracted a bit more, so that the core
code will eventually run on the console, as well. This means gtk/gnome
references are (or will be) isolated in gnomeMainWin.c. Also, the native
"event" structure is no longer an XEvent. This also allows us to have "grammar"
and other events which are unrelated to X. It is up to a "Target" to convert
to XEvent, or whatever.
- EventStreams are generated on-the-fly, rather than at start-up. This allows us
to use BNF grammars, where non-terminals can be substituted by the ViaVoice
"translation" facility. E.g. it can fill in "27" in "move <number> lines down".
- Voice.cc has been reworked into a fairly generic wrapper on top of ViaVoice.
Previously quite a bit of xvoice logic was down in the SMAPI callbacks, and
different xvoice facilities were special cased out (e.g. check the vocab name
for "windowmanagershortcuts", and do something different with it). Now the
calls and callbacks are generic. The wrapped API looks something like this.
Grammars are enabled or disabled by calling enableCommandGrammar() or
disableCommandGrammar() with the name of the grammar to enable, and a callback.
The callback is invoked if 1) enabling the grammar failed, 2) enabling the grammar
succeeded, or 3) a word was recognized.
The vocab API is basically the same, except the vocab words are passed in, and
Vocabs are currently never undefined. Also, the vocab API automatically does
lookups on words (analogous to "translations" for grammars), so the callback
can use a simple switch to handle recognised words.
A fair question is "Why another layer of callbacks? Is this actually any
simpler than hooking straight to SMAPI?"
The advantages, as I see them, are
1) The wrapper provides per-vocabulary callbacks, rather than the per-process
callbacks provided by SMAPI. This is much more useful.
2) The wrapper deals with all the details like finding grammar files, handling
low-level errors, looking up translations, etc., making the wrapped API
much simpler.
3) This design allows different gui's to provide different commands w/o
modifying Voice.cc. It's extremely easy for a gui to register commands
specific to its windows.
4) It isolates the SMAPI dependencies, in case we ever want to try a different
voice engine.
In implementing callbacks for vocabs, I discovered that SMAPI doesn't actually
report the name of the vocab when a word is recognised (as it does with grammars).
This is bizzare. As a work around, Voice.cc currently searches all installed
vocabs for the word. This is probably fine as long as we have only short vocabs.
I suspect this will always be the case. Anything longer will likely end up in a
grammar. If this proves false, Voice.cc should be changed to hash the vocab names
for fast look-ups.
What follows is the CODEGUIDE from 0.6, edited where things have changed.
---------------------------------------------------------------------------------
This document gives a general overview of the architecure of the XVoice
program. It is intended for use by those who wish to modify and enhance the
program. It does not give great technical detail concerning the program's
operation - that can be gleaned from the source code and comments. Xvoice sits
on top of the ViaVoice for Linux engine. This engine provides an API to the
program. Calls to the speech api are prefixed by "Sm".
These calls are marginally documented in the ViaVoice for Linux documentation.
To quote one IBM employee, "The User's Guide is so out of date (and poorly
written to begin with) it really needs to be re-written. Band-aids for now...".
Calls may be synchronous or asynchronous. When they are synchronous, a reply
structure must be provided. The engine initates callbacks in the client
(XVoice) code. These callbacks are to be found in "Voice.cc". A callback is
simply the execution of a client-side function in response to a previous
asynchronous call made by the client.
The gui is written in GTK and is implemented in the gnomeMainWin.cc file. It is
straightforward GTK code - refer to GTK documentation if you want to modify the
gui. The MainWindow class also performs many tasks which can be initiated by
user actions upon the gui - turning on the microphone etc.
Under normal operation, Xvoice receives callbacks for recognised phrases from
the speach to text engine. Parameters to the callbacks contain the text which
was the result of the speech recognition. This text is then sent to the current
target application.
Simulated X events are used to send the text. In BuildEvent and EventStream
there are functions for creating X events and sending them to X applications.
Applications known to xvoice, and the grammars they require, are defined in
xvoice.xml. If you copy from the sample xvoice.xml, you can probably add
new commands without knowing the details of what's going on. Here are details,
in case you need them:
The gross structure of the file is xml. Vocabularies and applications are
defined in xml blocks, like <application> </application>. Each such block
consists of a BNF grammar which is written to a .bnf file in ~/.xvoice/grammars, and
compiled with the ViaVoice grammar compiler (to .fsg and .fst files).
The BNF data consist of grammar elements followed by translations (which occur
after the "->"). ViaVoice sends the translations to xvoice when it recognises a
grammar element. xvoice expects translations in xml format.
So... xvoice ONLY parses XML. ViaVoice ONLY parses BNF. xvoice.xml is read by
xvoice so it can create the BNF grammar files. The grammar files are read by
ViaVoice so it can recognise phrases and send translations to xvoice. The
translations are in XML so xvoice can easily convert them to XEvents, grammar
events, etc.
The code which handles the parsing process for Xvoice is in ParseEventStream.
There are two main modes of operation - command mode and Dication mode -- which
can be enabled independently.
When in command mode the recognised phrases are returned to the RecoPhraseCB
callback. The callback looks up the translation and passes it to the registered
callback (usually appHandler()). The translations are then either passed to
Voice.cc (for grammar events), or passed to the Target (for mouse/key events).
When in dictation mode, the recognised text is sent to the RecoTextCB callback.
A small number of checks are done for special sequences and the text is passed
directly to the Target.
While in other modes, dynamic vocabularies may still be active. These are
necessary in order to ensure that when the user says "stop dictation" the app
moves out of dictation mode. The appropriate callback is RecoWordCB.
The Target.cc file models X applications using a simple class.