The annotated texts are saved in XML format, as representing the standard in data description adopted by the linguistic community as the standard way of representing corpora. Although a standard set of XML tags for syntactic annotation does not exist yet, as is the case for morpho-syntactic annotation (XCES), DGA uses a minimal set of tags inspired by XCES. Thus, the XML files produced by DGA can be easily transformed, by means of XSLT, into XML files which are based on a different vocabulary (tag set) meeting the requirements of the user or being in conformity with a future standard.
In order to illustrate the used set of tags, we present the following fragment of a xml file, representing the annotation of the sentence "John has apples" (see What is DGA).
<s> <tok> <orth>John</orth> <ordno>1</ordno> <ctag>Noun</ctag> <syn> <head>2</head> <reltype>Subject</reltype> </syn> </tok> <tok> <orth>has</orth> <ordno>2</ordno> <ctag>Verb</ctag> <syn> <head>4</head> <reltype>Predicate</reltype> </syn> </tok> <tok> <orth>apples</orth> <ordno>3</ordno> <ctag>Noun</ctag> <syn> <head>2</head> <reltype>Object</reltype> </syn> </tok> </s>
Each sentence is marked by tag <s> ... </s>. Each word of the sentence, together with all information concerning its annotation, is marked by tag <tok> ... </tok>. Within this tag, the orthographic form, as it occurs in the annotated text, is marked by tag <orth> ... </orth>. Tag <ordno> ... </ordno> indicates the number of the word within the sentence (counting is performed starting from the beginning of the sentence). By means of tag <ctag> ... </ctag> the part of speech is specified, while tag <syn> ... </syn> marks the syntactic information. Within tag <syn> ... </syn> the head word is specified by means of its number within the sentence, this number being marked by tag <head> ... </head>. The type of the dependency relation existing between the two words (the one to which the annotation belongs and the head word) is specified by means of tag <reltype> ... </reltype>.
>Dependency Grammar Annotator |