Live Speech and Dialog in a Virtual Machine.



Eric Riebling


The “Interaction in Virtual Worlds” VM allows you to “play” with an existing (English) speech recognizer supporting live decoding, and experience an open source speech dialog system in a virtual world. The VM contains everything you need, except for a “viewer” for the OpenSIM based Virtual World, spawned by the VM.


Virtual Machine in OVA format.


Supported as part of the "Speech Recognition Virtual Kitchen", see the FORUM.

IP Agreement


Virtualbox 4.3.16, Singularity viewer (or equivalent)

Required Acknowledgment


Interaction in Virtual Worlds README

This README corresponds to Mario2-IVW.ova, current as of 20140911.

There are a couple experiments that can be performed with this virtual machine. The first one is real-time speech recognition (‘live decoding’) using the Kaldi online decoder. The second is a full “interaction in virtual worlds” speech dialog system, which you can fully control.

Table of Contents

I. Installation
II. Running the system
III. Customizing the system
IV. Bot Actions


I. Installation

  1. Install Oracle VirtualBox (, along with the matching Extension Pack. The VM will work best with version 4.3.16.
  2. Set up a host-only network in VirtualBox in the VirtualBox graphical user interface, via “File” → “Settings or Preferences” → “Network”. Click on the Host-only Networks tab, then click the network card icon with the green plus sign in the right, if there are no networks yet listed. The resulting new default network should appear with the name ‘vboxnet0′.
  3. Import the Mario2-IVW.ova file into VirtualBox and run it.
  4. Make sure that your microphone can be used from inside the virtual machine. This includes checking that it works outside the VM, first (checking levels that are present, but not distorting) and, then, checking them within the VM (via the VM’s “System Settings” → “Sound”. The total level is a combination of these.
  5. Install the Singularity Viewer from
  6. While running Singularity, create a new grid. Click the “Grid Manager” button, click the “Create” button, then In the field “Grid Name”, name it whatever you want, and enter in the “Login URI”: Click “OK” to close this window.
  7. The username is “World Master”, and the password is “avatar”. You will use these, as well as the grid name you just chose, to log into the virtual world. But not until starting the server within the VM, described next.
  8. The password for the virtual machine is ‘?1zza4All’ should you need it

NOTE: if you are trying to run this from a Windows host, there is a bug in VirtualBox that prevents the microphone from working with an Ubuntu guest operating system. See this Forum post for more information:

II. Running the system

Once the VM is running…

  1. Please make sure that your microphone can be reached in the virtual machine.
  2. To check and test “online/ live decoding”, do the following:
    1. Open a terminal
    2. Do “cd /kaldi-trunk/egs/voxforge/online_demo”
    3. Do “./ –test-mode live”
    4. Try speaking into the microphone. The quality is terrible, but you should see speech recognition output appear.

    You might see some error messages on the console, but it can still work in spite of these:

    `ALSA lib pcm_dsnoop.c:612:(snd_pcm_dsnoop_open) unable to open slave
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71
     ALSA lib setup.c:565:(add_elem) Cannot obtain info for CTL elem (MIXER,’IEC958 Playback Default’,0,0,0): No such file or directory
     ALSA lib setup.c:565:(add_elem) Cannot obtain info for CTL elem (MIXER,’IEC958 Playback Default’,0,0,0): No such file or directory
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
     ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
     ALSA lib pcm_dmix.c:957:(snd_pcm_dmix_open) The dmix plugin supports only playback stream`
  3. To start the “Interaction in Virtual Worlds“, double click the icon on the desktop with the headphones icon that says “START”. This starts 4 processes:
    1. The OpenSim world server
    2. The Kaldi Online Decoder speech recognizer
    3. The Stanford CoreNLP Parser
    4. The Kaldi/Parser client
  4. You can read what the commands are in, if you want to run them separately or display the terminals. It will take a long time for these to start up. You will know when all the processes have started when you see the terminal window “Kaldi Online Decoder” display several lines of “ALSA lib pcm” warning messages.
  5. Now you can log in the World Master avatar to Singularity, as in Step 6 above.
  6. Once the world loads, open MonoDevelop (third icon down the taskbar on the left of the VM screen), click on the IVW solution, and run it with the play button. This will start the SampleBot project within the IVW solution, and log the “Friend Bot” into the virtual world, the computer-controlled character that you can talk to. It will also connect to the Kaldi Parser Wrapper and receive text from the speech recognizer. The IVW solution contains another project, “DogBot” that has lots of sample code that can be used to extend the system.

Warning: If the parser times out (both the parser and the wrapper will throw an error if they’re both running), then please restart: first the parser, then the wrapper. (See below)

III. Customizing the system

Here are the locations of all the code:
The START icon runs, which is located on the Desktop. Here, you can find the commands to run parts of the system individually.

  1. Kaldi Online Decoder: /kaldi-trunk/egs/voxforge/online_demo
    We use this code, but plug in models trained on TED talks: tedlium
    This is the speech recognition system.
  2. OpenSim: ~/Desktop/opensim-0.8/
    This sets up the virtual world. If you cannot login to Singularity, it is probably because you need to wait for this.
  3. Stanford Parser: ~/Desktop/parse/
    This is the server for the parser.
  4. Kaldi Wrapper: ~/Desktop/parse/
    This code takes the output of the Kaldi Online Decoder, feeds it into the Stanford Parser, and makes the results available to the bot code as a TCP socket service on localhost port 9999.
    Make sure that the Kaldi and the Stanford servers are running before you start this code. If this script throws a Timeout error, please restart the Stanford Parser and then restart this client.
  5. Communicating with the Bot: ~/Desktop/Bot Development/SampleBot
    Accessible in MonoDevelop (IVW solution, SampleBot project), this code logs the bot into the virtual world and polls the Kaldi wrapper. Make sure that the Kaldi online decoder and wrapper are running before you start this. NOTE: if this code is suspended in debug long enough, the bot will disappear from the virtual world. (The virtual world protocol requires ‘keep-alive’ messages behind the scenes)

IV. Bot Actions

At the moment, the bot is very limited in its capabilities. It can tell you where certain objects are located. It looks at its surroundings and sees if the object’s name can be found, and if it can, it tells you how many meters away it is from the bot. This is limited to a search radius of 20 meters; you can experiment with this.

There are a LOT more things you could add to extend this system, some of which exist in the DogBot project (in Mono). Feel free to play and drop us a line. More details about some of the included technologies can be found here:

 Advanced Topics

Kaldi Online Decoder Model Files

Kaldi language/acoustic model graphs produced by training
examples (“egs” such as egs/tedlium) consist of several files:

HCLG.fst, matrix, model, phones.txt, tree, words.txt

This list of files makes up a ‘model’ in the Kaldi online decode example. Models are located in named folders under


They come from the output of running other Kaldi experiments, such as


Here is a mapping of model files, and their origins, for the final stage of the ‘tedlium’ experiment. On the left are the names in the online
decode models folder, on the right are where the files originate

HCLG.fst    egs/tedlium/s5/exp/tri3_denlats/dengraph/HCLG.fst
matrix      egs/tedlium/s5/exp/tri3_mmi_b0.1/final.mat
model       egs/tedlium/s5/exp/tri3_mmi_b0.1/final.mdl
phones.txt  egs/tedlium/s5/exp/tri3_denlats/denraph/phones.txt
tree        egs/tedlium/s5/exp/tri3_mmi_b0.1/tree
words.txt   egs/tedlium/s5/exp/tri3_denlats/dengraph/words.txt

The above training example has the name “tri3_mmi_b0.1″ as the final stage (training tends to build upon previous stages) and each stage gets a new folder name under exp/.  You can usually get the name of the final stage by looking the end of

If this information is inaccurate or incomplete, please submit an update through this form.