GenSynsets is a tool conceived in order to facilitate the development of WordNets for languages other than English. It implements the algorithms described in the paper "On the Semiautomatic Generation of WordNet Type Synsets and Clusters with Special Reference to Romanian". GenSynsets can be used with reference to any foreign language for which you have bilingual dictionaries in electronic format. The program has been tested for the Romanian language and the corresponding output (an XML file) can be explored on the Web.
Installation
GenSynsets is written in the Java programming language and runs on the Java 2 platforms.
In order to run, GenSynsets requires the Java 2. Therefore the operating system of the computer on which GenSynsets is to be installed must be one for which an implementation of Java 2 exists (Windows 95/98/ME/NT/000, most Unix versions, MacOS X).
The system on which GenSynsets is to be installed must be sufficiently powerful (processor speed, memory). In the case of a PC, a minimum of 133 MHz and 32M RAM are necessary.
Installation:
Install Java 2 on your system. If Java 2 is already installed, skip this step. Corresponding to Windows, Linux, Solaris platforms, the necessary installation kits can be found at java.sun.com. The entire JDK development kit or only the JRE runtime environment can be installed. Using version 1.3 or a more recent version is recommended.
SET PATH=c:\path; %PATH%
where c:\path is to be replaced with the actual path leading to the directory where the java executable is placed. Following this operation (and restarting your computer), as a result of the command
java -version
the answer of your system should look like:
C:\>java -version
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0-C)
Java HotSpot(TM) Client VM (build 1.3.0-C, mixed mode)
C:\>
no matter which directory the command was issued from.
Unzip the archive gensynsets.zip and place its content wherever you wish within the existing directory structure.
Remark: The above directions have concentrated especially on installation under the Windows operating system. In order to install and to run GenSynsets on any system, in essence, you must be able to install the Java 2 environment on that system, to unzip the archive gensynsets.zip, and then to run the (Java) class GenSynsets.
Usage
GenSynsets is designed to be used from the command line. The general form of the program command line is:
java -classpath .;jwordnet.jar GenSynsets -pos noun|adj [-enrich] [-cs charset] [-l SynList] WnDictPath E_F F_E OutFile
where:
The files format
bilingual dictionaries
eword fword1;fword2,fword3;fword4,fword5
In order to distinguish among fword1,fword2, etc. two different separators are used. A semicolon separates different meanings of the given word (eword). A comma separates synonyms which refer to one and the same meaning of the word (eword).
fword eword1;eword2,eword3;eword4,eword5
In order to distinguish among eword1,eword2, etc. two different separators are used. A semicolon separates different meanings of the given word (fword). A comma separates synonyms which refer to one and the same meaning of the word (fword).
the output file
The output is delivered as an XML file. Thus, the XML files produced by GenSynsets can be easily transformed, by means of XSLT, into other formats (XML, HTML, etc.) and can be used by other applications. The structure of the file is:
where:
For more details see the DTD file (fsynsets.dtd) on which the output XML file is based.