1 / 3
Caption Text
2 / 3
Caption Two
3 / 3
Caption Three margin testing

Wednesday, August 5, 2009

WordNet’s morphological processing

WordNet documentation

Although only base forms of words are usually stored in WordNet, searches may be done on inflected forms. A set of morphology functions, Morphy, is applied to the search string to generate a form that is present in WordNet.

Morphology in WordNet uses two types of processes to try to convert the string passed into one that can be found in the WordNet database. There are lists of inflectional endings, based on syntactic category, that can be detached from individual words in an attempt to find a form of the word that is in WordNet. There are also exception list files, one for each syntactic category, in which a search for an inflected form is done. Morphy tries to use these two processes in an intelligent manner to translate the string passed to the base form found in WordNet. Morphy first checks for exceptions, then uses the rules of detachment. The Morphy functions are not independent from WordNet. After each transformation, WordNet is searched for the resulting string in the syntactic category specified.

The Morphy functions are passed a string and a syntactic category. A string is either a single word or a collocation. Since some words, such as axes can have more than one base form (axe and axis ), Morphy works in the following manner. The first time that Morphy is called with a specific string, it returns a base form. For each subsequent call to Morphy made with a NULL string argument, Morphy returns another base form. Whenever Morphy cannot perform a transformation, whether on the first call for a word or subsequent calls, NULL is returned. A transformation to a valid English string will return NULL if the base form of the string is not in WordNet.

The morphological functions are found in the WordNet library. See morph(3WN) for information on using these functions.

Rules of Detachment

The following table shows the rules of detachment used by Morphy. If a word ends with one of the suffixes, it is stripped from the word and the corresponding ending is added. Then WordNet is searched for the resulting string. No rules are applicable to adverbs.

POS Suffix Ending
NOUN "s" ""
NOUN "ses" "s"
NOUN "xes" "x"
NOUN "zes" "z"
NOUN "ches" "ch"
NOUN "shes" "sh"
NOUN "men" "man"
NOUN "ies" "y"
VERB "s" ""
VERB "ies" "y"
VERB "es" "e"
VERB "es" ""
VERB "ed" "e"
VERB "ed" ""
VERB "ing" "e"
VERB "ing" ""
ADJ "er" ""
ADJ "est" ""
ADJ "er" "e"
ADJ "est" "e"

Exception Lists

There is one exception list file for each syntactic category. The exception lists contain the morphological transformations for strings that are not regular and therefore cannot be processed in an algorithmic manner. Each line of an exception list contains an inflected form of a word or collocation, followed by one or more base forms. The list is kept in alphabetical order and a binary search is used to find words in these lists. See wndb(5WN) for information on the format of the exception list files.

Single Words

In general, single words are relatively easy to process. Morphy first looks for the word in the exception list. If it is found the first base form is returned. Subsequent calls with a NULL argument return additional base forms, if present. A NULL is returned when there are no more base forms of the word.

If the word is not found in the exception list corresponding to the syntactic category, an algorithmic process using the rules of detachment looks for a matching suffix. If a matching suffix is found, a corresponding ending is applied (sometimes this ending is a NULL string, so in effect the suffix is removed from the word), and WordNet is consulted to see if the resulting word is found in the desired part of speech.

Collocations

As opposed to single words, collocations can be quite difficult to transform into a base form that is present in WordNet. In general, only base forms of words, even those comprising collocations, are stored in WordNet, such as attorney general . Transforming the collocation attorneys general is then simply a matter of finding the base forms of the individual words comprising the collocation. This usually works for nouns, therefore non-conforming nouns, such as customs duty are presently entered in the noun exception list.

Verb collocations that contain prepositions, such as ask for it , are more difficult. As with single words, the exception list is searched first. If the collocation is not found, special code in Morphy determines whether a verb collocation includes a preposition. If it does, a function is called to try to find the base form in the following manner. It is assumed that the first word in the collocation is a verb and that the last word is a noun. The algorithm then builds a search string with the base forms of the verb and noun, leaving the remainder of the collocation (usually just the preposition, but more words may be involved) in the middle. For example, passed asking for it , the database search would be performed with ask for it , which is found in WordNet, and therefore returned from Morphy. If a verb collocation does not contain a preposition, then the base form of each word in the collocation is found and WordNet is searched for the resulting string.

Hyphenation

Hyphenation also presents special difficulties when searching WordNet. It is often a subjective decision as to whether a word is hyphenated, joined as one word, or is a collocation of several words, and which of the various forms are entered into WordNet. When Morphy breaks a string into "words", it looks for both spaces and hyphens as delimiters. It also looks for periods in strings and removes them if an exact match is not found. A search for an abbreviation like oct. return the synset for { October, Oct } . Not every pattern of hyphenated and collocated string is searched for properly, so it may be advantageous to specify several search strings if the results of a search attempt seem incomplete.

Special Processing for nouns ending with 'ful'

Morphy contains code that searches for nouns ending with ful and performs a transformation on the substring preceeding it. It then appends 'ful' back onto the resulting string and returns it. For example, if passed the nouns boxesful , it will return boxful .

BUGS

Since many noun collocations contains prepositions, such as line of products , an algorithm similar to that used for verbs should be written for nouns. In the present scheme, if Morphy is passed lines of products , the search string becomes line of product , which is not in WordNet

Morphy will allow non-words to be converted to words, if they follow one of the rules described above. For example, it will happily convert plantes to plants .

ENVIRONMENT VARIABLES (UNIX)

WNHOME
Base directory for WordNet. Default is /usr/local/WordNet-3.0 .
WNSEARCHDIR
Directory in which the WordNet database has been installed. Default is WNHOME/dict .

REGISTRY (WINDOWS)

HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome
Base directory for WordNet. Default is C:\Program Files\WordNet\3.0 .

FILES

pos .exc
morphology exception lists

SEE ALSO

wn(1WN) , wnb(1WN) , binsrch(3WN) , morph(3WN) , wndb(5WN) , wninput(7WN) .

What is WordNet?

What is WordNet

WordNet is a database comprised of nouns, verbs, adverbs and adjectives in a lexical relational structure. These words are then grouped by synonyms call synsets that are easily used to find related words and usage. The result of these synsets can be navigated and explored using a web browser. This basic structure of WordNet allows it to be easily used and researched for computational linguistics and language processing for a variety of applications such as voice recognition software and dictation software. WordNet is free to use and has a downloadable version for UNIX, Linux, PC, and Mac computers.

WordNet has been developed by Professor George A. Miller at the Cognitive Sciences Laboratory at Princeton University. The WordNet database is also available from MIT Press and a Windows interface called WordNet TreeWalk which helps users not familiar with UNIX based pages and command lines. Using WordNet online is as simple as using a browser search function. A simple form is presented on the page and any word may be searched from there. The page includes links to the WordNet home page, the glossary, and a help section with information on how to use WordNet. WordNet is made up of lexicographer files, code to transfer the files into a database, and protocols for searching the database. Synsets are also related to other synsets creating an even easier and wider encompassing list of language devices and terms for the searched word.

Two kinds of relations are used in WordNet, semantic and lexical. The lexical relationships hold between semantically related forms of words and the semantic relationships hold between related word definitions. These relationships point towards related words and synsets after searching a word; different types of words are arranged differently in the database. People who use WordNet can be from a diverse number of fields as language is used in every aspect of every business and industry, but is primarily used in the research of and development of cognitive hierarchies that will give insight into the way humans use language and how the brain interprets it. Specifically it applies to the research of ontology which deals with the idea of what entities exist and how they are related; it was once part of metaphysics and is derived from the Greek philosophies of the Golden Age of civilization. Recently this idea of relating that which exists has been applied to information as in the case of WordNet, and ISPs.

WordNet is supported and used by various organizations and associations for a myriad of reasons and designs. The National Science Foundation is currently supporting the Evocation project at WordNet which aims to connect all the parts of speech in the database. This will allow phrases, exclamations, and the like to be related to the existing nouns, verbs, and adjectives. The American Resort Development Association is also a large supporter of the WordNet database and has given monetary support for the project over the last several years.

External links

Featured Post

Windows和Ubuntu双系统完全独立的安装方法

http://www.ubuntuhome.com/windows-and-ubuntu-install.html  | Ubuntu Home Posted by Snow on 2012/06/25 安装Windows和Ubuntu双系统时,很多人喜欢先安装windows,然...