Roric-Ling About WordNet

About WordNet

WordNet is a proposal for a more effective combination of traditional lexicographic information and modern high-speed computation. It is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets (synsets), each representing one underlying concept. Different relations link the synonym sets, WordNet being organized according to semantic relations which are indicated by pointers between synsets.

WordNet primarily represents an interactive lexical data base developed, during the last 15 years, at Princeton University by a group of researchers led by George Miller. At the same time, WordNet can be viewed as a semantic dictionary since words are located according to conceptual affinities with other words, unlike the case of classical dictionaries where words are ordered alphabetically. Although it resembles a thesaurus, WordNet is much more useful to Artificial Intelligence applications since it is enriched with an impressive set of relations among words and word meanings. WordNet distinguishes between semantic relations and lexical relations, but the emphasis is on semantic relations between meanings. Therefore, unlike standard dictionaries, WordNet organizes lexical information in terms of word meanings, rather than word forms. WordNet maps word forms in word senses using the syntactic category as a parameter. Thus, words belonging to the same syntactic category which can be used to express the same meaning are grouped into a single set, called synset. Therefore the ''building block'' of WordNet is a synonym set (synset) of all words that express a given concept. Polysemous words belong to more than one synset. For instance, corresponding to the English word computer, two different meanings are defined in WordNet. It therefore belongs to two distinct synsets, as follows:

{computer, data processor, electronic computer, information processing system}

and

{calculator, reckoner, figurer, estimator, computer}

In its most used version (ver.1.6), WordNet contains 129,509 English words organized in 99,643 synsets, with the network using a number of 229,152 nodes. Words and concepts are linked through a total of 299,711 semantic relations. However, all numbers are approximate since WordNet continues to grow. Version 1.7 is now accessible as well, at

http://www.cogsci.princeton.edu/~wn/obtain/

The most ambitious feature of WordNet is most probably the semantic attempt and, in this respect, WordNet resembles a thesaurus more than a dictionary. It equally represents an on-line thesaurus and a semantic network.

The rich set of semantic relations established among synsets is what makes this semantic network so powerful and useful for various types of applications. Examples of semantic relations existing in WordNet are synonymy, used in order to form synsets, hypernymy and hyponymy, corresponding to the isa relation and to reverse isa respectively, meronymy, corresponding to the part of relation, the causal relation referring to verbs and others. Using the isa relation nouns and verbs are structured in WordNet as hierarchies. Adjectives and adverbs are organized according to a different structure - the cluster. As its authors note [Miller et. al., 90], the advantage of imposing this syntactic categorization on WordNet ''is that fundamental differences in the semantic organization of these syntactic categories can be clearly seen and systematically exploited.'' Nouns are organized in lexical memory as topical hierarchies, verbs are organized by a variety of entailment relations, and adjectives and adverbs are organized as N-dimensional hyperspaces. Additionally, the typical properties of a specific concept are stated as a gloss attached to each of the concepts. The gloss includes a definition, one or more supplementary explanations and one or more examples.

WordNet has been recognized as a valuable resource in the human language technology and knowledge processing communities. Many researchers who use WordNet especially in the field of Artificial Intelligence view it primarily as a lexical knowledge base and make subsequent use of it. Knowledge processing has gained new dimensions in the U.S. due to the existence of WordNet. Its applicability has been cited in more than 200 papers and systems have been implemented using it. Many groups of researchers expressed their interest in WordNet applications in various fields, such as : Information Retrieval, Information Extraction, Word Sense Disambiguation, Text Inference, Natural Language Generation, Learning, Knowledge Acquisition and others.

The human language research community has encouraged the development of WordNets for languages other than English, at the same time concentrating on the possibility of automatically generating such huge lexical data bases. The main reason for this is the desire and the necessity to create a uniform ontological infrastructure across languages that will simplify machine translation from a language to another and will facilitate the use of the same reasoning schemes and algorithms developed in conjunction with the American WordNet.

IST-2000-26454