Text-To-Speed Working on Single Channel Cards (I think)
Matthew N Cheely
mcheely at Glue.umd.edu
Wed Apr 14 21:45:39 EDT 2004
It came up earlier that some people were having trouble with
text-to-speech (TTS) in ut2004, because their soundcards could only handle
one stream at a time. The symptoms were that you would get no TTS during
the game, but when the game quit, all of the sudden you'd hear all the
TTS from the game. I am pretty sure I have figured out how to solve this
problem, but I can't take the last step of testing it with unreal, because
I have since discovered that ut2004 and my alsa libraries don't seem to
play nice together for some reason. So, I'm going to explain in detail how
to get it set up on the list, and you guys let me know if it works.
Okay, so you've done everything listed at
and you've got the symptoms listed above.
The first thing to do is to set up a virtual audio device for your alsa
driver that can do software mixing, using the dmix plugin, which is part
of alsa. We do this by creating/editing /etc/asound.conf We use
/etc/asound.conf so that our settings will apply systemwide, since both
the festival sever run by speechd as root and our user programs will need
to be able to use the configuration.
Here is my /etc/asound.conf file.
#unmixed device for programs that have trouble with dmix (xine)
#handles software mixing of output
#handles software mixing of capture
#fuses input and output mixing into a duplex device
#this has something to do with sample rate conversion
#aoss emulation stuff:
#end aoss stuff
Most people should just be able to copy this file to their
/etc/asound.conf and mixing should work nicely. The file basically
just creates a bunch of virutal audio devices (pcms):
unmixed => This pcm is just an alias for direct hardware access. We tell a
program to use the 'umixed' device if we want it to bypass all our
software mixing plugins.
dmixer => This device uses the dmix plugin in also to perform software
mixing. As long as all our programs' sound outputs are passing through
this device, we'll hear all of them at once.
dsnooper => This device uses the dsnoop plugin, which is like dmix,
except it mixes the soundcard's input channels so we can capture from more
than one device at a time (the CD and MIC, for example)
duplex => This device uses the asym plugin to give us a single device
which we can capture from and output to with mixing. It redirects any
output sent to it to the dmixer device, and if it is asked for capture
information, it uses the dsnooper device
!default => This is the device alsa programs will use by default if we
don't specify a device. It is just an alias for the duplex device. Why
didn't we just name the duplex device !default? Honestly, I'm not exactly
sure, but I think it has something to do with sample rate conversion. I
tried it both ways, and it only works for me this way.
ctl.!default => This is the mixer device for setting volumes and stuff,
just tells things to go straight to the hardware for that.
dsp0 and mixer0 are special devices which can be used by programs designed
to work with oss sound drivers (/dev/dsp and /dev/mixer). This way you can
use the pcm plugin functionality of dmix and dsnoop with oss programs.
There's a catch, though. This doesn't work with the oss emulation kernel
modules. You have to use a special 'aoss' script. To use it, make sure
the oss emulation modules aren't loaded or compiled into your kernel. Then
you just append 'aoss' before whatever program you want to run, i.e.:
aoss mpg123 foo.mp3
Not that this won't work with programs that use libc's fopen() function to
access /dev/dsp, only programs that use glibc's open(). If you don't
understand this, just know that it doesn't work with everything.
So, anyway, as you can see, I've just attached the dsp0 device to my
Alright, so the sound card should be configured for software mixing. Try
it out by running a couple of alsa programs like aplay or alsaplayer at
the same time.
Now we have to get festival working with alsa. It's an oss program, and it
doesn't use fopen() so we should be able to use the aoss script. However,
this doesn't work when festival is in server mode for some weird reason I
have yet to figure out. It works fine in interactive mode, but that's no
good to us, because speechd necessarily uses the server mode. If anybody
out there who's a better programmer than me (most anybody) cares to look
at the code and see if they can figure it out, it would be excellent. I
tried asking people on the festival mailing list about this, but got no
Never fear, however! Festival luckily has several audio output modes, one
of which will run the program of your choice. We just have to set it up to
use this mode and the appropriate program. To make sure this happens
whenever it runs, we add the following lines to /etc/festival.scm
(Parameter.set 'Audio_Command "aplay -f S16_LE -t raw -r $SR $FILE")
(Parameter.set 'Audio_Method 'Audio_Command)
This tells festival to use the command "aplay" (which uses the alsa
driver) to play our soundfile, a signed-16-bit-little-endian (-f S16_LE)
raw (-t raw) file. Festival will provide the sample rate and file name
in the $SR and $FILE positions.
Now you should be good to go. I'd test this first with alsaplayer or
aplay and /dev/speech, before trying it in ut2004. The reason being that
ut2004 may be trying to use OSS devices without you knowing it, which it
does silently if ALSA fails for any reason.
Here are some handy links for you. I got most of this information from
somwhere on the following pages. There are examples of more complex stuff
you can do with dmix and dsnoop. One thing I learned from reasearching all
this - ALSA is bad-ass.
Alsa Wiki: has good pages on dmix and .asoundrc (asound.conf)
Alsa Documentation: has a page on .asoundrc, info on specific soundcards, etc.
Festival Homepage: (seems to be down today)
More information about the ut2004