Jump to navigation
Who We Are
Policies & Guidelines
Learn at LTI
Explore Our Work
Back to the catalogue
Editing LTI LangID Corpus
If you have a problem in filling in this form, contact lti.catalogue AT gmail.com.
Fields marked with (
) are required.
The email is only for internal purposes.
the submission will not be considered until you confirm that you own this email.
You will receive a confirmation email upon submitting. The confirmation email might get into your junk email, so check your SPAM folder. If you do not receive an email, please contact us.
A proof for not being spam
This information will appear in the catalogue.
Natural Language Processing/Computational Linguistics
Information Retrieval, Text Mining and Analytics
Spoken Interfaces and Dialogue Processing
Keywords (comma separated, internal and not shown to public)
Direct Download Link
(If you provide a direct download link, please also provide the IP agreement, and the Required Acknowledgement)
<p>This is a corpus of training and test data for language identification. The initial release contains data for modeling 781 languages, with samples (some very tiny) for an additional 310 languages.</p>
Availability (e.g. source code, binary only, XML file, etc.)
<p>Raw data, with scripts to unpack and spilt the data into training and test sets.</p>
Support Status (e.g. as-is, maintained, etc.)
<p>Maintained. Release 2 is in preparation.</p>
Prerequisites (e.g. Windows XP, Java 1.6, etc.)
Required Acknowledgement (e.g. paper to cite)
<p>Please cite</p> <p> <strong>Ralf D. Brown</strong>, "Non-linear Mapping for Improved Identification of 1300+ Languages." In <em>Proceedings of the Conference on Empirical Methods in Natural Language Processing</em> (EMNLP-2014).</p> <p>or</p> <p> <a href="http://www.cs.cmu.edu/~ralf/langid.html">http://www.cs.cmu.edu/~ralf/langid.html</a></p>
might be helpful)
<p>Text is licenced under Creative Commons or public domain. The included scripts are licensed under GNU GPL version 3.</p>
Contact (e.g. e-mail)
<p>ralf @ cs.cmu.edu</p> <p> </p>
Additional Comments (internal and not shown to public)