Eesen is a toolkit to build speech recognition (ASR) systems in a completely end-to-end fashion..



Yajie Miao


Eesen is a toolkit to build speech recognition (ASR) systems in a completely end-to-end fashion. The goal of Eesen is to simplify the existing complicated, expertise-intensive ASR pipeline into a straightforward learning problem. Acoustic modeling in Eesen involves training a single recurrent neural network (RNN) which models the sequence-to-sequence mapping from speech to transcripts. Eesen discards the following elements required by the existing ASR pipeline:

  • Hidden Markov models (HMMs)
  • Gaussian mixture models (GMMs)
  • Decision trees and phonetic questions
  • Dictionary, if characters are used as the modeling units
  • ...

Eesen is developed on the basis of the popular Kaldi toolkit. However, Eesen is fully self-contained, requiring no dependencies from Kaldi to funciton. 

Eesen is released as an open-source project under the highly non-restrictive Apache License Version 2.0. We welcome community participation and contribution.

For more information, please refer to our manuscript: EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding.


Git repository



IP Agreement

Apache 2.0


Kaldi toolkit: Kaldi

Required Acknowledgment

If this information is inaccurate or incomplete, please submit an update through this form.