New Paths in Middle High German Lexicography:
Dictionaries Interlinked ElectronicallyJohannesFournierUniversity of Trier, Germany 2000University of GlasgowGlasgowALLC/ACH 2000editorJeanAndersonAmalChatterjeeChristianJ.KayMargaretScottencoderSaraA.SchmidtComputational / Corpus LinguisticsI. Topic:Since September 1997, a small team of lexicographers and computer scientists
at the University of Trier (Germany) have been developing an integrated
electronic dictionary of Middle High German applying TEI Guidelines. The
resulting integrated digital dictionary is expected to be finished by August
2000 and be published on CD-ROM as well as on the Internet. It is not only
meant to facilitate the simultaneous use of the dictionaries concerned, but
also to offer advanced query options to provide essentially new insights for
those involved in vocabulary studies, metalexicography, and the composition
of a new MHG dictionary.II. Digitization as a necessity:The most important dictionaries of the MHG language were written in the last
century and need to be replaced urgently by a new major work. This necessity
arises not only from the enormous increase in the number of editions of MHG
texts since the end of the 19th century, but also from changed insights into
the structure of vocabulary and new ways of describing word usage.
Consequently five years ago, two teams of lexicographers at the Universities
of Trier and Goettingen started to lay the foundations for a new MHG
Dictionary by creating an electronic archive of texts and quotations. It
will probably take up to 25 years, however, for the whole dictionary to be
finished, thus scholars of all disciplines having to deal with MHG sources
will still have to use the older dictionaries for quite a while.The dictionaries that already exist, i. e. the "Mittelhochdeutsches
Woerterbuch" by Georg Friedrich Benecke/Wilhelm Mueller/Friedrich Zarncke
(1854-1866), the "Mittelhochdeutsches Handwoerterbuch" with its supplement,
the "Nachtraege", by Matthias Lexer (1872-1878), and the "Findebuch zum
mittelhochdeutschen Wortschatz" by Kurt Gaertner et al. (1992), are very
closely interconnected and can only be used simultaneously, which is due to
the fact that they must be considered, briefly speaking, as a kind of series
of supplements to supplements to supplements. Therefore they were ideal
candidates for the composition of an integrated digital dictionary. One of
the major aims of the digitization is to make the lexicographical
information of the dictionary entries accessible via a database and thus to
enable sophisticated searches over all four dictionaries independently of
headwords. Applying TEI Guidelines to machine readable versions of the
printed dictionaries seemed the easiest and fastest way of creating the
digital "compound dictionary".III. (Semi-)Automatically generated markup according to TEI
Guidelines:The MHG dictionaries consist of eight volumes with about 1,100 printed pages,
containing more than 80,000 headwords. Therefore it is obvious that TEI
compliant markup of the dictionary entries had to be generated automatically
as far as possible. For the purposes of encoding we used TUSTEP, the
Tuebingen System of Text Processing Programs with its variety of
parameter-controlled functions for user-defined textdata-processing that
facilitate structured entry-input.Some parts of the TEI design scheme were especially relevant for the
dictionary encoding. Some advantages and problems when applying TEI have to
be discussed in detail, such as the hierarchical embedding of elements
within the articles, the use of global attributes for the markup of a wide
range of lexicographical information, and the recoverability of articles. It
should also be mentioned that TEI Guidelines should be improved with regard
to the encoding of dictionaries of older stages of a language, for the
description of such languages poses some problems seldom encountered when
describing modern languages.It is apparent, however, that most problems which arose when using TEI did
not stem from the application of TEI Guidelines as such, but were primarily
due to the fact that the dictionary entries often appeared to lack clear
structure and were rather discursive in style. This has often made automatic
SGML encoding a difficult task. In many cases only manual markup led to TEI
compliant documents. Nevertheless, the results achieved so far fully justify
the decision in favour of TEI Guidelines.IV. New ways of using dictionaries:Through the electronic version, the MHG dictionaries can be used much more
easily and comfortably: hyperlinks connect all the corresponding headwords,
the search for cross-references only takes a mouse-click's time; pop-up
menus contain the relevant information about all sources of citation;
bookmarks and notes can be created easily. PostScript files of all
dictionary pages are interlinked with the electronic articles so that the
compound digital dictionary can be used and cited as a work of reference in
exactly the same way as its printed precursors.Far more important is the access to a database containing the relevant
information for the entire contents of the four dictionaries within the
composite whole. Access via that database not only offers full-text
retrieval but also retrieval of selected information, e.g. of parts of
speech, of word forms in MHG quotations, of definitions or of strings in the
etymology sections of dictionary entries. Highly important for advanced and
complex query options is the linking of a list of all dictionary sources
with the electronic dictionary itself: all sources have been sophisticatedly
classified according to geographical provenance, chronology and genre,
categories that can be used to limit data base queries to a small,
self-defined corpus of texts cited within the entries. Which words were
directly borrowed from Italian, but not through Latin or French? Which words
are only quoted from sources concerning legal issues? Which MHG words denote
the same concepts? These are some of the questions that can now be answered
without great expense of time. What is still more, the integrated electronic
dictionary is especially important for the lexicographers involved in the
creation of the new MHG dictionary where the older dictionaries are used as
pointers to words for which references rarely exist.V. Institutional frame:Some years ago, the Deutsche Forschungsgemeinschaft (DFG = German Research
Council) initiated a program for the so-called "Retrospective Digitization
of Library Materials". The main goal of the program is to facilitate the
access to library holdings that may be rare or highly important for
scholarly interests by providing electronic versions of these holdings. From
the beginning, the program encouraged the use of SGML for full-text
encoding.Since September 1997, the DFG has been funding the creation of an integrated
digital dictionary of Middle High German to be published on CD-ROM as well
as on the Internet. It is intended to serve as a prototype for the
digitization of other historical dictionaries, including the digitization of
the famous "Deutsches Woerterbuch" of Jacob and Wilhelm Grimm.References:ThomasBurchJohannesFournierKurtGaertnerMittelhochdeutsche Woerterbuecher auf CD-ROM und im
Internet. Der Einsatz von SGML in der Retrodigitalisierung
lexikographischer StandardwerkeAkademie-Journal. Mitteilungsblatt der Konferenz der
deutschen Akademien der Wissenschaften17-241998/2JohannesFournierDigitale Dialektik. Chancen und Probleme
mittelhochdeutscher Woerterbuecher in elektronischer FormHerbertErnstWiegandWoerterbuecher in der Diskussion IV. Vortraege aus dem
Heidelberger Lexikographischen Kolloquium(Lexikographica; Series Maior 100)TuebingenHrsg. von Herbert Ernst Wiegand200085-108ThomasBurchJohannesFournierZur Anwendung der TEI-Richtlinien auf die
Retrodigitalisierung mittelhochdeutscher WoerterbuecherIngridLembergBernhardSchroederAngelikaStorrerProbleme und Perspektiven computergestuetzter
Lexikographie(Lexikographica; Series Maior)Tuebingen (forthcoming)Further information on the topic proposed is available at this location
<>