an open-source statistical analyzer for multiword expressions and supersenses in context.
Nathan Schneider, Noah A. Smith, et al.
AMALGrAM (A Machine Analyzer of Lexical Groupings And Meanings) analyzes English sentences for multiword expressions (MWEs) and noun and verb supersenses. For example, given the sentence
I do n't think he 's afraid to take a strong stand on gun control , what with his upbringing in El Paso .
I do|`a n't think|cognition he 's|stative afraid to take_a_ strong _stand|cognition on gun_control|ARTIFACT , what_with his upbringing|ATTRIBUTE in El_Paso|LOCATION .
will be predicted, grouping "take a stand", "gun control", "what with", and "El Paso" as MWEs and labeling several lexical expressions with supersenses (UPPERCASE for nouns, lowercase for verbs). The model and algorithms implemented in the tool are described in Schneider et al. (TACL 2014, NAACL-HLT 2015), and resources required to use it are available at http://www.ark.cs.cmu.edu/LexSem/.
More generally, the codebase supports supervised discriminative learning and structured prediction of statistical sequence models over discrete data, i.e., taggers. It implements the structured perceptron (Collins, EMNLP 2002) for learning and the Viterbi algorithm for decoding. Cython is used to make decoding reasonably fast, even with millions of features.
The software is released under the GPLv3 license.
Python 2.7 with Cython
Has been tested on Unix platforms
The original MWE identification system is described in:
- Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut [errata]
Nathan Schneider, Emily Danchik, Chris Dyer, and Noah A. Smith.
In Transactions of the Association for Computational Linguistics, 2(April):193−206, 2014.
The supersense annotations and the combined MWE+supersense tagger are described in:
- A Corpus and Model Integrating Multiword Expressions and Supersenses
Nathan Schneider and Noah A. Smith.
In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, June 2015.
If this information is inaccurate or incomplete, please submit an update through this form.