There are a couple experiments that can be performed with this virtual machine. The first one is real-time speech recognition (‘live decoding’) using the Kaldi online decoder. The second is a full “interaction in virtual worlds” speech dialog system, which you can fully control.
Table of Contents
II. Running the system
III. Customizing the system
IV. Bot Actions
- Install Oracle VirtualBox (http://virtualbox.org/), along with the matching Extension Pack. The VM will work best with version 4.3.16.
- Set up a host-only network in VirtualBox in the VirtualBox graphical user interface, via “File” → “Settings or Preferences” → “Network”. Click on the Host-only Networks tab, then click the network card icon with the green plus sign in the right, if there are no networks yet listed. The resulting new default network should appear with the name ‘vboxnet0′.
- Import the Mario2-IVW.ova file into VirtualBox and run it.
- Make sure that your microphone can be used from inside the virtual machine. This includes checking that it works outside the VM, first (checking levels that are present, but not distorting) and, then, checking them within the VM (via the VM’s “System Settings” → “Sound”. The total level is a combination of these.
- Install the Singularity Viewer from http://www.singularityviewer.org/downloads.
- While running Singularity, create a new grid. Click the “Grid Manager” button, click the “Create” button, then In the field “Grid Name”, name it whatever you want, and enter in the “Login URI”: http://192.168.56.101:9000. Click “OK” to close this window.
- The username is “World Master”, and the password is “avatar”. You will use these, as well as the grid name you just chose, to log into the virtual world. But not until starting the server within the VM, described next.
- The password for the virtual machine is ‘?1zza4All’ should you need it
NOTE: if you are trying to run this from a Windows host, there is a bug in VirtualBox that prevents the microphone from working with an Ubuntu guest operating system. See this Forum post for more information: http://speechkitchen.org/forums/topic/new-readme-for-mario2-ivw-vm/#post-748
II. Running the system
Once the VM is running…
- Please make sure that your microphone can be reached in the virtual machine.
- To check and test “online/ live decoding”, do the following:
- Open a terminal
- Do “cd /kaldi-trunk/egs/voxforge/online_demo”
- Do “./run.sh –test-mode live”
- Try speaking into the microphone. The quality is terrible, but you should see speech recognition output appear.
You might see some error messages on the console, but it can still work in spite of these:
`ALSA lib pcm_dsnoop.c:612:(snd_pcm_dsnoop_open) unable to open slave ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib setup.c:565:(add_elem) Cannot obtain info for CTL elem (MIXER,’IEC958 Playback Default’,0,0,0): No such file or directory ALSA lib setup.c:565:(add_elem) Cannot obtain info for CTL elem (MIXER,’IEC958 Playback Default’,0,0,0): No such file or directory ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm_dmix.c:957:(snd_pcm_dmix_open) The dmix plugin supports only playback stream`
- To start the “Interaction in Virtual Worlds“, double click the icon on the desktop with the headphones icon that says “START”. This starts 4 processes:
- The OpenSim world server
- The Kaldi Online Decoder speech recognizer
- The Stanford CoreNLP Parser
- The Kaldi/Parser client
- You can read what the commands are in startBackend.sh, if you want to run them separately or display the terminals. It will take a long time for these to start up. You will know when all the processes have started when you see the terminal window “Kaldi Online Decoder” display several lines of “ALSA lib pcm” warning messages.
- Now you can log in the World Master avatar to Singularity, as in Step 6 above.
- Once the world loads, open MonoDevelop (third icon down the taskbar on the left of the VM screen), click on the IVW solution, and run it with the play button. This will start the SampleBot project within the IVW solution, and log the “Friend Bot” into the virtual world, the computer-controlled character that you can talk to. It will also connect to the Kaldi Parser Wrapper and receive text from the speech recognizer. The IVW solution contains another project, “DogBot” that has lots of sample code that can be used to extend the system.
Warning: If the parser times out (both the parser and the wrapper will throw an error if they’re both running), then please restart: first the parser, then the wrapper. (See below)
III. Customizing the system
Here are the locations of all the code:
The START icon runs startBackend.sh, which is located on the Desktop. Here, you can find the commands to run parts of the system individually.
- Kaldi Online Decoder: /kaldi-trunk/egs/voxforge/online_demo
We use this code, but plug in models trained on TED talks: tedlium
This is the speech recognition system.
- OpenSim: ~/Desktop/opensim-0.8/
This sets up the virtual world. If you cannot login to Singularity, it is probably because you need to wait for this.
- Stanford Parser: ~/Desktop/parse/corenlp.py
This is the server for the parser.
- Kaldi Wrapper: ~/Desktop/parse/wrapKaldiLive.py
This code takes the output of the Kaldi Online Decoder, feeds it into the Stanford Parser, and makes the results available to the bot code as a TCP socket service on localhost port 9999.
Make sure that the Kaldi and the Stanford servers are running before you start this code. If this script throws a Timeout error, please restart the Stanford Parser and then restart this client.
- Communicating with the Bot: ~/Desktop/Bot Development/SampleBot
Accessible in MonoDevelop (IVW solution, SampleBot project), this code logs the bot into the virtual world and polls the Kaldi wrapper. Make sure that the Kaldi online decoder and wrapper are running before you start this. NOTE: if this code is suspended in debug long enough, the bot will disappear from the virtual world. (The virtual world protocol requires ‘keep-alive’ messages behind the scenes)
IV. Bot Actions
At the moment, the bot is very limited in its capabilities. It can tell you where certain objects are located. It looks at its surroundings and sees if the object’s name can be found, and if it can, it tells you how many meters away it is from the bot. This is limited to a search radius of 20 meters; you can experiment with this.
There are a LOT more things you could add to extend this system, some of which exist in the DogBot project (in Mono). Feel free to play and drop us a line. More details about some of the included technologies can be found here:
Kaldi Online Decoder Model Files
Kaldi language/acoustic model graphs produced by training
examples (“egs” such as egs/tedlium) consist of several files:
HCLG.fst, matrix, model, phones.txt, tree, words.txt
This list of files makes up a ‘model’ in the Kaldi online decode example. Models are located in named folders under
They come from the output of running other Kaldi experiments, such as
Here is a mapping of model files, and their origins, for the final stage of the ‘tedlium’ experiment. On the left are the names in the online
decode models folder, on the right are where the files originate
HCLG.fst egs/tedlium/s5/exp/tri3_denlats/dengraph/HCLG.fst matrix egs/tedlium/s5/exp/tri3_mmi_b0.1/final.mat model egs/tedlium/s5/exp/tri3_mmi_b0.1/final.mdl phones.txt egs/tedlium/s5/exp/tri3_denlats/denraph/phones.txt tree egs/tedlium/s5/exp/tri3_mmi_b0.1/tree words.txt egs/tedlium/s5/exp/tri3_denlats/dengraph/words.txt
The above training example has the name “tri3_mmi_b0.1″ as the final stage (training tends to build upon previous stages) and each stage gets a new folder name under exp/. You can usually get the name of the final stage by looking the end of run.sh.