1 / 3
Caption Text
2 / 3
Caption Two
3 / 3
Caption Three margin testing

Tuesday, April 21, 2009

MontyLingua - 自然语言处理工具软件

  Free Software Blog里提到了它,MontyLingua是一个流行的自然语言处理工具,它能够根据人的常识来理解英语语言,世界各地有很多人都在用它进行语言研究。官方网站有一段介绍:
MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. MontyLingua makes traditionally difficult language processing tasks trivial!
MontyLingua performs the following tasks over text:
  1. MontyTokenizer - Tokenizes raw English text (sensitive to abbreviations), and resolve contractions, e.g. "you're" ==> "you are"
  2. MontyTagger - Part-of-speech tagging based on Brill94, enriched with common sense.
  3. MontyChunker - Lightning fast regular expression chunker
  4. MontyExtractor - Extracts phrases and subject/verb/object triplets from sentences
  5. MontyLemmatiser - Strips inflectional morphology, i.e. changes verbs to infinitive form and nouns to singular form
  6. MontyNLGenerator - Uses MontyLingua's concise predicate-arg representation to generate naturalistic English sentences and text summaries

python documentation and api (html) [.html]
java documentation and api [.html]

READ THIS if you are running ML on Mac OS X, or Unix

The distribution ZIP includes datafiles designed for windows. If you are running MontyLingua on Unix or Mac OS X, and the phrase "I love you" is tagged incorrectly, then the datafiles need to be rebuilt. This is simple:
  1. delete all files of the form, FASTLEXICON_n.MDF, where n is a number.
  2. re-run the MontyLingua program, either from Python, or Java, and the correct datafiles will be rebuilt. If running Java and you run out of memory during the rebuild process, use the -MX or -Xmx option in Java to increase the memory size. You will only need to rebuild these datafiles once.

No comments:

Post a Comment

Featured Post

Windows和Ubuntu双系统完全独立的安装方法

http://www.ubuntuhome.com/windows-and-ubuntu-install.html  | Ubuntu Home Posted by Snow on 2012/06/25 安装Windows和Ubuntu双系统时,很多人喜欢先安装windows,然...