MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. MontyLingua makes traditionally difficult language processing tasks trivial!MontyLingua performs the following tasks over text:
- MontyTokenizer - Tokenizes raw English text (sensitive to abbreviations), and resolve contractions, e.g. "you're" ==> "you are"
- MontyTagger - Part-of-speech tagging based on Brill94, enriched with common sense.
- MontyChunker - Lightning fast regular expression chunker
- MontyExtractor - Extracts phrases and subject/verb/object triplets from sentences
- MontyLemmatiser - Strips inflectional morphology, i.e. changes verbs to infinitive form and nouns to singular form
- MontyNLGenerator - Uses MontyLingua's concise predicate-arg representation to generate naturalistic English sentences and text summaries
python documentation and api (html) [.html]
java documentation and api [.html]
READ THIS if you are running ML on Mac OS X, or Unix
The distribution ZIP includes datafiles designed for windows. If you are running MontyLingua on Unix or Mac OS X, and the phrase "I love you" is tagged incorrectly, then the datafiles need to be rebuilt. This is simple:
- delete all files of the form, FASTLEXICON_n.MDF, where n is a number.
- re-run the MontyLingua program, either from Python, or Java, and the correct datafiles will be rebuilt. If running Java and you run out of memory during the rebuild process, use the -MX or -Xmx option in Java to increase the memory size. You will only need to rebuild these datafiles once.