云博 - 备份 Contempo Theme: 04/21/09

Tuesday, April 21, 2009

MontyLingua - 自然语言处理工具软件

　　Free Software Blog里提到了它，MontyLingua是一个流行的自然语言处理工具，它能够根据人的常识来理解英语语言，世界各地有很多人都在用它进行语言研究。官方网站有一段介绍：

MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. MontyLingua makes traditionally difficult language processing tasks trivial!

MontyLingua performs the following tasks over text:

MontyTokenizer - Tokenizes raw English text (sensitive to abbreviations), and resolve contractions, e.g. "you're" ==> "you are"
MontyTagger - Part-of-speech tagging based on Brill94, enriched with common sense.
MontyChunker - Lightning fast regular expression chunker
MontyExtractor - Extracts phrases and subject/verb/object triplets from sentences
MontyLemmatiser - Strips inflectional morphology, i.e. changes verbs to infinitive form and nouns to singular form
MontyNLGenerator - Uses MontyLingua's concise predicate-arg representation to generate naturalistic English sentences and text summaries

python documentation and api (html) [.html]
java documentation and api [.html]

READ THIS if you are running ML on Mac OS X, or Unix

The distribution ZIP includes datafiles designed for windows. If you are running MontyLingua on Unix or Mac OS X, and the phrase "I love you" is tagged incorrectly, then the datafiles need to be rebuilt. This is simple:

delete all files of the form, FASTLEXICON_n.MDF, where n is a number.
re-run the MontyLingua program, either from Python, or Java, and the correct datafiles will be rebuilt. If running Java and you run out of memory during the rebuild process, use the -MX or -Xmx option in Java to increase the memory size. You will only need to rebuild these datafiles once.

Ubuntu 9.04 Netbook版Remix将于本周四发布

　　免费开源操作系统Ubuntu的商业赞助商Canonical公司昨天宣布， Ubuntu 9.04 的网本版Remix将于星期四4月23日发布，同时发布的还有Ubuntu 9.04桌面版和Ubuntu 9.04服务器版。据Canonical的首席运营官Jane Silber说，Remix的最新增强功能使它更加贴近网本迷的需求，更快的启动速度，增强电源管理功能和方便的网络之间切换，可以达到最佳的netbook用户体验。这也将是第一次，用户可以直接从Ubuntu网站下载完整的Ubuntu Netbook Remix 到USB驱动器，安装和运行于目前市场上最流行的各种netbook机器上。

Tuesday, April 21, 2009

MontyLingua - 自然语言处理工具软件

Ubuntu 9.04 Netbook版Remix将于本周四发布

Featured Post

Windows和Ubuntu双系统完全独立的安装方法