www.pudn.com > flite-1.2-release.rar > flite.texi, change:2003-02-22,size:34887b


\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename flite.info
@settitle Flite: a small, fast speech synthesis engine
@finalout
@setchapternewpage odd
@c %**end of header

@c This document was modelled on the numerous examples of texinfo
@c documentation available with GNU software, primarily the hello
@c world example, but many others too.  I happily acknowledge their
@c aid in producing this document -- awb

@set EDITION 1.2
@set VERSION 1.2
@set UPDATED 19th February 2003

@ifinfo
This file documents @code{Flite}, a small, fast run-time speech
synthesis engine.

Copyright (C) 2001-2003 Carnegie Mellon University

Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

@ignore
Permission is granted to process this file through TeX, or otherwise and
print the results, provided the printed document carries copying
permission notice identical to this one except for the removal of this
paragraph (this paragraph not being relevant to the printed manual).

@end ignore
Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions,
except that this permission notice may be stated in a translation approved
by the authors.
@end ifinfo

@titlepage
@title Flite: a small, fast speech synthesis engine
@subtitle System documentation
@subtitle Edition @value{EDITION}, for Flite version @value{VERSION}
@subtitle @value{UPDATED}
@author by Alan W Black and Kevin A. Lenzo

@page
@vskip 0pt plus 1filll
Copyright @copyright{} 2001-2003 Carnegie Mellon University, all rights
reserved.

Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.

Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions,
except that this permission notice may be stated in a translation approved
by the Carnegie Mellon University
@end titlepage

@node Top, , , (dir)

@menu
* Abstract::            initial comments
* Copying::             How you can copy and share the code
* Acknowledgements::    List of contributors
* Installation::        Compilation and Installation
* Flite Design::
* APIs::                 Standard functions
* Converting FestVox Voices:: building flite voices from FestVox ones

@end menu

@node Abstract, Copying, , Top
@chapter Abstract

This document provides a user manual for flite, a small, fast
run-time speech synthesis engine.

This manual is nowhere near complete.

Flite offers text to speech synthesis in a small and efficient binary.
It is designed for embedded systems like PDAs as well large server
installation which must serve synthesis to many ports.  Flite is part
of the suite of free speech synthesis tools which include Edinburgh
University's Festival Speech Synthesis System
@url{http://www.cstr.ed.ac.uk/projects/festival} and Carnegie
Mellon University's FestVox project @url{http://festvox.org}, which
provides tools, scripts, and documentation for building new synthetic
voices.

Flite is written in ANSI C, and is designed to be portable
to almost any platform, including very small hardware.

Flite is really just a synthesis library that can be linked into other
programs, it includes two simple voices with the distribution, an old
diphone voice and anb example limited domain voice which uses the newer
unit selection techniques we have been developing.  Neither of these
voices would be considered production voices but server as examples, new
voices will be released as they are developed.

The latest versions, comments, new voices etc for Flite are available
from its home page which may be found at
@example
@url{http://cmuflite.org}
@end example

@node Copying, Acknowledgements, Abstract, Top
@chapter Copying

Flite is free software.  It is distributed under an X11-like license.
Apart from the few exceptions noted below (which still have
similarly open lincenses) the general license is
@example
                 Language Technologies Institute                      
                    Carnegie Mellon University                        
                     Copyright (c) 1999-2003                          
                       All Rights Reserved.                           
                                                                      
 Permission is hereby granted, free of charge, to use and distribute  
 this software and its documentation without restriction, including   
 without limitation the rights to use, copy, modify, merge, publish,  
 distribute, sublicense, and/or sell copies of this work, and to      
 permit persons to whom this work is furnished to do so, subject to   
 the following conditions:                                            
  1. The code must retain the above copyright notice, this list of    
     conditions and the following disclaimer.                         
  2. Any modifications must be clearly marked as such.                
  3. Original authors' names are not deleted.                         
  4. The authors' names are not used to endorse or promote products   
     derived from this software without specific prior written        
     permission.                                                      
                                                                      
 CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK         
 DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING      
 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT   
 SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE      
 FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES    
 WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN   
 AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,          
 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF       
 THIS SOFTWARE.                                                       
@end example

@node Acknowledgements, Installation, Copying, Top
@chapter Acknowledgements

The initial development of flite was primarily done by awb while
travelling, perhaps the name is doubly appropriate as a substantial
amount of the coding was done over 30,000ft).  During most of that
time awb was funded by the Language Technologies Institute at
Carnegie Mellon University.

Kevin A. Lenzo was involved in the design, conversion techniques and
representations for the voice distributed with flite (as well as being
the actual voice itself).

Other contributions are:
@itemize @bullet
@item David Huggins-Daines (dhd@@cepstral.com):
much of the clunits code, porting to multiple platforms, substantial
code tidy up and configure/autoconf guidance.
@item Cepstral, LLC (@url{http://cepstral.com}):
For supporting DHD to spend time on flite and passing back the important
fixes and enhancements including SAPI support
@item Willie Walker <william.walker@@sun.com> and the Sun Speech Group:
lots of low level bugs (and fixes).
@item Portuguese Foundation for Science and Technology (FCT) Praxis XXI program
The SAPI interface provided by Cepstral, LLC was partially funded by
the above program.
@item Henry Spencer:
For the regex code
@item University of Edinburgh:
for releasing Festival for free, making a companion runtime synthesizer
a practical project, much of the design of flite relies on the 
architecture decisions made in the Festival Speech Synthesis Systems and
the Edinburgh Speech Tools.

The duration cart tree and intonation (accent and F0) models were
derived from the models in the Festival distribution. which in turn
were trained from the Boston University FM Radio Data Corpus.

@item Carnegie Mellon University
The included lexicon is derived from CMULEX and the letter to sound
rules are constructed using the Lenzo and Black techniques for
building LTS decision graphs.
@item Craig Reese: IDA/Supercomputing Research Center and Joe Campbell: Department of Defense
who wrote the ulaw conversion routines in src/speech/cst_wave_utils.c
@end itemize


@node Installation, Flite Design, Acknowledgements, Top
@chapter Installation

Flite consist simple of a set of C files.  GNU configure is
used to configure the engine and will work on most
major architectures.

In general, the following should build the system
@example
tar zxvf flite-XXX.tar.gz
cd flite-XXX
./configure
make
@end example
However you will need to explicitly call GNU make
@code{gmake} if @code{code} make is not GNU make on your system.

The configuration process build a file @file{config/config} which under
some circumstances may need to be edited, e.g. to add unusual options or
dealing with cross compilation.

On Linux systems, we also support shared libraries which are useful for
keeping space down when multiple different application are linked to the
flite libraries.  For development we strong discourage use of shared
libraries as it is too easy to either not set them up correctly or
accidentally pick up the wrong version.  But for installation they are
definitely encouraged.  That is if you are just going to make and
install they are good but unless you know what @var{LD_LIBRARY_PATH}
does, it may be better to use static libraries (the default) if you are
changing C code or building your own voices.
@example
./configure --enable-shared
make
@end example
This will build both shared and static versions of the libraries but
will link the executables to the @emph{shared} libraries thus you will
need to install the libraries in a place that your dynamic linker will
find them (cf. /etc/ld.so.conf) or set @var{LD_LIBRARY_PATH}
appropriately.

@node Flite Design, APIs, Installation, Top
@chapter Flite Design

@section Background

Flite was primarily developed to address one of the most common
complaints about the Festival Speech Synthesis System.  Festival is
large and slow, even with the software bloat common amongst most
products and that that bloat has helped machines get faster, have more
memory and large disks, still Festival is criticized for its size.

Although sometimes this complaint is unfair, it is valid and although
much work was done to ensure Festival can be trimmed and run fast it
still requires substantial resources per utterance to run.  After some
investigation to see if Festival itself could be trimmed down it became
clear because there was a core set of functions that were sufficient for
synthesis that a new implementation containing only those aspects that
were necessary would be easier than trimming down Festival itself.

Given that a new implementation was being considered a number of
problems with Festival could also be addressed at the same time.
Festival is not thread-safe, and although it runs under Windows, in
server mode it relies on the Unix-centric view of fast forks with
copy-on-write shared memory for servicing clients.  This is a perfectly
safe and practical solution for Unix systems, but under Windows where
threads are the more common feature used for servicing multiple events
and forking is expensive, a non-thread safe program can't be used as
efficiently.

Festival is written in C++ which was a good decision at the time and
perfectly suitable for a large program.  However what was discovered
over the years of development is that C++ is not a portable language.
Different C++ compilers are quite different and it takes significant
amount of work to ensure compatibility of the code base over multiple
compilers.  What makes this worse is that new versions of each compiler
are incompatible and changes are required.  At first this looked like we
were producing bad quality code but after 7 years it is clear that it is
also that the compilers are still maturing.  Thus it is clear that
Festival and the Edinburgh Speech Tools will continue to require
constant support as new versions of compilers are released.

A second problem with C++ is the size and efficiency of the code
produced.  Proponents of C++ may rightly argue that Festival and the
Edinburgh Speech Tools aren't properly designed, but irrespective if
that is true or not, it is true that the size of the code is much larger
and slower than it need be for what it does.  Throughout the design
there is a constant trade-off between elegancy and efficiency which
unfortunately at times in Festival requires untidy solutions of
copying data out of objects processing it and copying back because
direct access (particularly in some signal processing routines)
is just too inefficient.

Another major criticism of Festival is the use of Scheme as the
interpreter language.  Even though it is a simple to implement language
that is adequate for Festival's needs and can be easily included in the
distribution, people still hate it.  Often these people do learn to use
it and appreciate how run time configurability is very desirable and that
new voices may be added without recompilation.  Scheme does have garbage
collection which makes leaky programs much harder to write and as some
of the intended audience for developing in Festival will not be hard
core programmers a safe programming language seems very desirable.

After taking into consideration all of the above it was decided to
develop Flite as a new system written in ANSI C.  C is much more
portable that C++ as well as offering much lower level control of the
size of the objects and data structure it uses.  

Flite is not intended as a research and development platform for speech
synthesis, Festival is and will continue to be the best platform for
that.  Flite however is designed as a run-time engine when an
application needs to be delivered.  It specifically addresses two
communities.  First as a engine for small devices such as PDAs and
telephones where the memory and CPU power are limited and in some cases do
not even have a conventional operating system.

The second community is for those running synthesis servers for many
clients.  Here although large fixed databases are acceptable, the size
of memory require per utterance and speed in which they can be
synthesized is crucial.

However in spite of the decision to build a new synthesis engine we see
this as being tightly coupled into the existing free software synthesis
tools or Festival and the FestVox voice building suite.  Flite offers
a companion run-time engine.  Our intended mode of development is
to build new voices in FestVox and debug and tune them in Festival.
Then for deployment the FestVox format voice may be (semi-)automatically
compiled into a form that can be used by Flite.

In case some people feel that development of a small run-time
synthesizer is not an appropriate thing to do within a University and is
more suited to commercial development, we have a few points which they
should be aware of that to our mind justify this work.  

We have long felt that research in speech and language should have an
identifiable link to ultimate commercial use.  In providing a platform
that can be used in consumer products that falls within the same
framework as our research we can better understand what research issues
are actually important to the improvement our work.

In considering small useful synthesizers it forces a more explicit
definition of what is necessary in a synthesizer and also how we can
trade size, flexibility and speed with the quality of synthesized
output.  Defining that relationship is a research issue.

We are also advocates of speech technology within other research areas
and the ability to offer support on new platforms such as PDAs and
wearables allows for more interesting speech applications that will
prove new and interesting areas of research.  Thus having a platform
that others around us can more easily integrate into their research
makes our work more satisfying.

@section Key Decisions

The basic architecture of Festival is good.  It is well proven.  Paul
Taylor, Alan W. Black and Richard Caley spent many hours debating low
level aspects of representation and structure that would both be
adequate for current theories but also allow for future theories too.
The heterogeneous relation graphs (HRG) are theoretically adequate,
computationally efficient and well proven.  Thus both because HRGs have
such a background and that Flite is to be compatible with voices and
models developed in Festival, Flite uses HRGs as its basic utterance
representation structure.

Most of a synthesizer is in its data (lexicons, unit database etc), the
actual synthesis code it pretty small.  In Festival most of that data
exists in external files which are loaded on demand.  This is obviously
slow and memory expensive (you need both a copy on the data on disk and
in memory).  As one of the principal targets for Flite is very small
machines we wanted to allow that core data to be in ROM, and be
appropriately mapped into RAM without any explicit loading.  This can be
done by various memory mapping functions (in Unix its called mmap) and
is the core technique used in shared libraries (called DLLs in some parts
of the world).  Thus the data should be in a format that it can be
directly accessed.  If you are going to directly access data you need
to ensure the byte layout is appropriate for the architecture you are
running on, byte order and address width become crucial if you want to
avoid any extra conversion code at access time (like byte swapping).

At first is was considered that synthesis data would be converted in
binary files which could be mmap'ed into the runtime systems but
building appropriate binaries files for architectures is quite a job.
However the C compiler does this in a standard way.  Therefore the mode
of operation for data within Flite is to convert it to C code (actually
C structures) and use the C compiler to generate the appropriate binary
structures.

Using the C compiler is a good portable solution but it as these
structures can be very big this can tax the C compiler somewhat.  Also
because this data is not going to change at run time it can all be
declared @code{const}.  Which means (in Unix) it will be in the text
segment and hence read only (this can be ROM on platforms which have
that distinction).  For structures to be const all their subparts
must also be const thus all relevant parts must be in the same file,
hence the unit databases files can be quite big.

Of course, this all presumes that you have a C compiler robust enough to
compile these files, hardware smart enough to treat flash ROM as memory
rather than disk, or an operating system smart enough to demand-page
executables.  Certain "popular" operating systems and compilers fail in
at least one of these respects, and therefore we have provided the
flexibility to use memory-mapped file I/O on voice databases, where
available, or simply to load them all into memory.

@chapter Structure

The flite distribution consists of two distinct parts:
@itemize @bullet
@item The flite library containing the core synthesis code
@item Voice(s) for flite.  These contain three sub-parts
@itemize @bullet
@item Language models:
text processing, prosody models etc.
@item Lexicon and letter to sound rules
@item Unit database and voice definition
@end itemize
@end itemize

@section cst_val

This is a basic simple object which can contain ints, floats, strings
and other objects.  It also allows for lists using the Scheme/Lisp,
car/cdr architecture (as that is the most efficient way to represent
arbitrary trees and lists).

The @code{cst_val} structure is carefully designed to take up only 8 bytes (or
16 on a 64-bit machine).  The multiple union structure that it can
contain are designed so there are no conflicts.  However it depends on
the fact that a pointer to a @code{cst_val} is guaranteed to lie on a even
address boundary (which is true for all architectures I know of).  Thus
the distinction between between cons (i.e. list) objects and atomic
values can be determined by the odd/evenness of the least significant bits
of the first address in a @code{cst_val}.  In some circles this is considered
hacky, in others elegant. This was done in flite to ensure that the most
common structure is 8 bytes rather than 12 which saves significantly on
memory.

All @code{cst_val}'s except those of type cons are reference counted.  A
few functions generate new lists of @code{cst_val}'s which the user
should be careful about as they need to explicitly delete them (notably
the lexicon lookup function that returns a list of phonemes).
Everything that is added to an utterance will be deleted (and/or
dereferenced) when the utterance is deleted.

Like Festival user types can be added to the @code{cst_val}s.  In
Festival this can be done on the fly but because this requires the
updating of some list when each new type is added, this wouldn't be
thread safe.  Thus an explicit method of defining user types is done in
@file{src/utils/cst_val_user.c}.  This is not as neat as defining on the
fly or using a registration function but it is thread safe and these
user types wont changes often.

@node APIs, Converting FestVox Voices, Flite Design, Top
@chapter APIs

Flite is a library that we expected will be embedded into other
applications.  Included with the distribution is a small example
executable that allows synthesis of strings of text and text files
from the command line.

@section flite binary

The example flite binary may be suitable for very simple applications.
Unlike Festival its start up time is very short (less that 25ms on a PIII
500MHz) making it practical (on larger machines) to call it each
time you need to synthesize something.
@example
flite TEXT OUTPUTTYPE
@end example
If @code{TEXT} contains a space it is treated as a string of text and
converted to speech, if it does not contain a space @code{TEXT} is
treated as a file name and the contents of that file are converted to
speech.  The option @code{-t} specifies @code{TEXT} is to be treat
as text (not a filename) and @code{-f} forces treatment as a file.
Thus
@example
flite -t hello 
@end example 
will say the word "hello" while
@example
flite hello 
@end example 
will say the content of the file @file{hello}.  Likewise
@example
flite "hello world."
@end example 
will say the words "hello world" while
@example
flite -f "hello world"
@end example 
will say the contents of a file @file{hello world}.  If no argument is
specified text is read from standard input.

The second argument @code{OUTPUTTYPE} is the name of a file the output
is written to, or if it is @code{play} then it is played to the audio
device directly.  If it is @code{none} then the audio is created but
discarded, this is used for benchmarking.  If @code{OUTPUTTYPE} is
omitted, @code{play} is assumed.  You can also explicitly set the
outputtype with the @code{-o} flag.
@example
flite -f doc/alice -o alice.wav
@end example

@section C example

Each voice in Flite is held in a structure, a pointer to which is
returned by the voice registration function.  In the standard
distribution, the example diphone voice is @code{cmu_us_kal}.

Here is a simple C program that uses the flite library
@example
#include "flite.h"

cst_voice *register_cmu_us_kal();

int main(int argc, char **argv)
{
    cst_voice *v;

    if (argc != 2)
    {
        fprintf(stderr,"usage: flite_test FILE\n");
        exit(-1);
    }

    flite_init();

    v = register_cmu_us_kal();

    flite_file_to_speech(argv[1],v,"play");

}
@end example
Assuming the shell variable FLITEDIR is set to the flite directory
the following will compile the system (with appropriate changes for
your platform if necessary).
@example
gcc -Wall -g -o flite_test flite_test.c -I$FLITEDIR/include -L$FLITEDIR/lib 
    -lflite_cmu_us_kal -lflite_usenglish -lflite_cmulex -lflite -lm
@end example

@section Public Functions

Although, of course you are welcome to call lower level functions,
there a few key functions that will satisfy most users
of flite.
@table @code
@item void flite_init(void);
This must be called before any other flite function can be called.  As
of Flite 1.1, it actually does nothing at all, but there is no guarantee
that this will remain true.
@item cst_wave *flite_text_to_wave(const char *text,cst_voice *voice);
Returns a waveform (as defined in @file{include/cst_wave.h}) synthesized
from the given text string by the given voice.
@item float flite_file_to_speech(const char *filename, cst_voice *voice, const char *outtype);
synthesizes all the sentences in the file @file{filename} with 
given voice.  Output (at present) can only reasonably be, @code{play}
or @code{none}.
@item float flite_text_to_speech(const char *text, cst_voice *voice, const char *outtype);
synthesizes the text in string point to by @code{text}, with the given
voice.  @code{outtype} may be a filename where the generated waveform is
written to, or "play" and it will be sent to the audio device, or
"none" and it will be discarded.  The return value is the
number of seconds of speech generated.
@item cst_utterance *flite_synth_text(const char *text,cst_voice *voice);
synthesize the given text with the given voice and returns an utterance
from it for further processing and access.
@item cst_utterance *flite_synth_phones(const char *phones,cst_voice *voice);
synthesize the given phones with the given voice and returns an utterance
from it for further processing and access.
@end table

@node Converting FestVox Voices, , APIs, top
@chapter Converting FestVox Voices

As of 1.2 initial scripts have been added to aid the conversion of
FestVox voices to Flite.  In general the conversion cannot be automatic.
For example all specific Scheme code written for a voice needs to be
hand converted to C to work in Flite, this can be a major task.

Simple conversion scripts are given as examples of the stages you need
to go through.  These are designed to work on standard (English) diphone
sets, and simple limited domain voices.  The conversion technique will
almost certainly fail for large unit selection voices due to
limitations in the C compiler (more discussion below).

Conversion is basically taking the description of units (clunit
catalogue or diphone index) and constructing some C files that can be
compiled to form a usable database.  Using the C compiler to generate
the object files has the advantage that we do not need to worry about
byte order, alignment and object formats as the C compiler for the
particular target platform should be able to generate the right code.

Before you start ensure you have successfully built and run your FestVox
voice in Festival.  Flite is not designed as a voice building/debugging
tool it is just a delivery vehicle for finalized voices so you should
first ensure you are satisfied with the quality of Festival voices
before you start converting it for Flite.

The following basic stages are required:
@itemize @bullet
@item Setup the directories and copy the conversion scripts
@item Build the LPC files
@item Build the MCEP files (for ldom/clunits)
@item Convert LPC (MCEP) into STS (short term signal) files
@item Convert the catalogue/diphone index
@item Compile the generated C code
@end itemize

The conversion assumes the environment variable @code{FLITEDIR}
is set, for example
@example
   export FLITEDIR=/home/awb/projects/flite/
@end example
The basic flite conversion takes place within a FestVox voice directory.
Thus all of the conversion scripts expect that the standard files are
available.  The first task is to build some new directories and copy in
the build scripts.  The scripts are copied rather than linked from the
Flite directories as you may need to change these for your particular
voices.
@example
   $FLITEDIR/tools/setup_flite
@end example
This will read @file{etc/voice.defs}, which should have been created by
the FestVox build process (except in very old versions of FestVox).

If you don't have a @file{etc/voice.defs} you can construct one
with @code{festvox/src/general/guess_voice_defs} in the Festvox
distribution, or generate one by hand making it look
like
@example
FV_INST=cmu
FV_LANG=us
FV_NAME=ked_timit
FV_TYPE=clunits
FV_VOICENAME=$FV_INST"_"$FV_LANG"_"$FV_NAME
FV_FULLVOICENAME=$FV_VOICENAME"_"$FV_TYPE
@end example

The main script build building the Flite voice is @file{bin/build_flite}
which will eventually build sufficient C code in @file{flite/} that can
be compiled with the constructed @file{flite/Makefile} to give you a
library that can be linked into applications and also an example
@file{flite} binary with the constructed voice built-in.

You can run all of these stages, except the final make, together by
running the the build script with no arguments
@example
   ./bin/build_flite
@end example
But as things may not run smoothly, we will go through the 
stages explicitly.

The first stage is to build the LPC files, this may have already been
done as part of the diphone building process (though probably not in
the ldom/clunit case).  In our experience it is very important that the
records be of similar power, as mis-matched power can often cause
overflows in the resulting flite (and sometimes Festival) voices. Thus,
for diphone voices, it is important to run the power normalization
techniques described int he FestVox document.  The Flite LPC build
process also builds a parameter file of the ranges of the LPC parameters
used in later coding of the files, so even if you have already built your
LPC files you should still do this again
@example
   ./bin/build_flite lpc
@end example

For ldom, and clunit voices (but not for diphone voices) we also
need the Mel-frequency Cepstral Coefficients.  These are assumed to
have been cleared and are in @file{mcep/} as they are necessary
for running the voice in Festival.  This stage simply constructs 
information about the range of the mcep parameters.
@example
   ./bin/build_flite mcep
@end example

The next stage is to construct the STS files.  Short Term Signals (STS)
are built for each pitch period in the database.  These are ascii files
(one for each utterance file in the database, with LPC coefficients, and
ulaw encoded residuals for each pitch period.  These are built using a
binary executable built as part of the Flite build
(@file{flite/tools/find_sts}.
@example
   ./bin/build_flite sts
@end example
Note that the flite code expects waveform files to be in Microsoft RIFF
format and cannot deal with files in other formats.  Some earlier
versions of the Edinburgh Speech Tools used NIST as the default header
format.  This is likely to cause flite and its related programs not
work. So do ensure you waveform files are in riff format (ch_wave -info
wav/* will tell you the format).  And the following fill convert
all you wave files
@example
   mv wav wav.nist
   mkdir wav
   cd wav.nist
   for i in *.wav
   do
      ch_wave -otype riff -o ../wav/$i $i
   done
@end example

The next stage is to convert the index to the required C format.  For
diphone voices this takes the @file{dic/*.est} index files, for
clunit/ldom voices it takes the @file{festival/clunit/VOICE.catalogue}
and @file{festival/trees/VOICE.tree} files.  This process uses a binary
executable built as part of the Flite build process
(@file{flite/tools/flite_sort}) to sort the indices into the same
sorting order required for flite to run.  (Using unix sort may or may
not give the same result due to definitions of lexicographic order so
we use the very same function in C that will be used in flite to ensure
that a consistent order is given.)
@example
   ./bin/build_flite idx
@end example
All the necessary C files should now have been built in @file{flite/}
and you may compile them by
@example
   cd flite
   make
@end example
This should give a library and an executable called @file{flite} that
can run as
@example
   ./flite "Hello World"
@end example
Assuming a general voice.  For ldom voices it will only be able to say
things in its domain.  This @file{flite} binary offers the same options
as standard the standard @file{flite} binary compiled in the Flite build
but with your voice rather than the distributed voices.

Almost certainly this process will not run smoothly for you.  Building
voices is still a very hard thing to do and problems will probably
exist.

This build process does not deal with customization for the given
voices.  Thus you will need to edit @file{flite/VOICE.c} to set
intonation ranges and duration stretch for your particular voice.

For example in our @file{cmu_us_sls_diphone} voice (a US English female
diphone voice).  We had to change the default parameters from 
@example
    feat_set_float(v->features,"int_f0_target_mean",110.0);
    feat_set_float(v->features,"int_f0_target_stddev",15.0);

    feat_set_float(v->features,"duration_stretch",1.0); 
@end example
to
@example
    feat_set_float(v->features,"int_f0_target_mean",167.0);
    feat_set_float(v->features,"int_f0_target_stddev",25.0);

    feat_set_float(v->features,"duration_stretch",1.0); 
@end example

Note this conversion is limited.  Because it depends on the C compiler
to do the final conversion into binary object format (a good idea in
general for portability), you can easily generate files too big for the
C compiler to deal with.  We have spent a some time investigating this
so the largest possible voices can be converted but it is still too
limited for our larger voices.  In general the limitation seems to be
best quantified by the number of pitch periods in the database.  After
about 100k pitch periods the files get too big to handle.  There are
probably solutions to this but we have not yet investigated them.  This
limitation doesn't seem to be an issue with the diphone voices as they
are typically much smaller than unit selection voices.

@section Lexicon Conversion

@section Language Conversion

This is by far the weakest part as this is the most open ended.  There
are basic tools in the @file{flite/tools/} directory that include Scheme
code to convert various Scheme structures to C include CART tree
conversion and Lisp list conversion.  The other major source of help
here is the existing language examples in @file{flite/lang/usenglish/}.

@chapter Porting to new platforms

byte order, unions, compiler restrictions

@chapter Future developments

@contents

@bye