an open-source statistical analyzer for multiword expressions and supersenses in context.


Nathan Schneider, Noah A. Smith, et al.


AMALGrAM (A Machine Analyzer of Lexical Groupings And Meanings) analyzes English sentences for multiword expressions (MWEs) and noun and verb supersenses. For example, given the sentence

I do n't think he 's afraid to take a strong stand on gun control , what with his upbringing in El Paso .

the analysis

I do|`a n't think|cognition he 's|stative afraid to take_a_ strong _stand|cognition on gun_control|ARTIFACT , what_with his upbringing|ATTRIBUTE in El_Paso|LOCATION .

will be predicted, grouping "take a stand", "gun control", "what with", and "El Paso" as MWEs and labeling several lexical expressions with supersenses (UPPERCASE for nouns, lowercase for verbs). The model and algorithms implemented in the tool are described in Schneider et al. (TACL 2014, NAACL-HLT 2015), and resources required to use it are available at

More generally, the codebase supports supervised discriminative learning and structured prediction of statistical sequence models over discrete data, i.e., taggers. It implements the structured perceptron (Collins, EMNLP 2002) for learning and the Viterbi algorithm for decoding. Cython is used to make decoding reasonably fast, even with millions of features.

IP Agreement

The software is released under the GPLv3 license.


Python 2.7 with Cython

Has been tested on Unix platforms

Required Acknowledgment

The original MWE identification system is described in:

The supersense annotations and the combined MWE+supersense tagger are described in:


If this information is inaccurate or incomplete, please submit an update through this form.