Renovating a worldclass tagset: from WOTAN to
WOTAN-2Hansvan HalterenDept. of Language and Speech University of
Nijmegenhvh@let.kun.nl1999University of VirginiaCharlottesville, VAACH/ALLC 1999editorencoderSaraA.Schmidtn 1994, a new wordclass tagset for Dutch was designed (WOTAN; Berghmans, 1994),
for use in the upgrade of a tagged corpus of more than a million words
(including the Eindhoven corpus; uit den Boogaart, 1975) and the subsequent
derivation of an automatic tagger. WOTAN was based on the most popular
descriptive grammar of Dutch (ANS; Geerts et al., 1984), from which the encoded
distinctions were selected using two criteria: a) importance to potential users,
as estimated from interviews and b) feasibility of (semi-)automatic derivation
from the existing tagging, given the lack of time for extensive manual changes.
WOTAN was judged to be a good compromise and has since been used in several
tagging projects and experiments in the Netherlands and Belgium.Yet, WOTAN had its shortcomings, leading to the creation of a successor. WOTAN-2
adds some important distinctions originally left out because they needed manual
intervention, and aims for compatibility with the EAGLES guidelines, the
(extensively) revised version of the ANS (Haeseryn et al., 1997), the CELEX
database and the AMAZON syntactic parser. Another, more uncertain, influence is
the tagset to be used for the Spoken Dutch Corpus, which is presently under
construction.The poster will present:the differences between WOTAN and WOTAN-2the influence of the (sometimes contradictory) compatibility issues on
the tagsetadditions to (or deviations from) the EAGLES proposal necessitated by
decisions for WOTAN-2the upgrade of the WOTAN-tagged Eindhoven corpus to a WOTAN-2
versionReferencesJ.BerghmansWOTAN, een automatische grammatikale tagger voor het
NederlandsDept. of Language and Speech, University of
Nijmegen1994Uitden BoogaartWoordfrequenties in geschreven en gesproken
NederlandsUtrechtOosthoek, Scheltema & Holkema1975G.GeertsW.HaeserynJ.de RooijM.van der ToornAlgemene Nederlandse Spraakkunst (ANS)LeuvenWolters-Noordhoff, Groningen and Wolters1984W.HaeserynK.RomijnG.GeertsJ.de RooijM.van der ToornAlgemene Nederlandse Spraakkunst (ANS)DeurneMartinus Nijhoff, Groningen and Wolters
Plantyn1997