Books into Bytes: The "Deutsches Wörterbuch" on CD-ROM
and on the InternetRuthChristmanUniversity of Trier, Germany 2000University of GlasgowGlasgowALLC/ACH 2000editorJeanAndersonAmalChatterjeeChristianJ.KayMargaretScottencoderSaraA.SchmidtI. Starting position, targets:Jacob and Wilhelm Grimm's "Deutsches Woerterbuch" (DWB) comprises the most
extensive documentation of the German language. Its outstanding position is
confirmed by its history: for more than one hundred years - the longest period
of publication for a German dictionary - generations of lexicographers have
contributed about 350,000 entries to the DWB, which is divided into 16 volumes
(bound as 32), containing a total of 67,744 columns.The DWB reflects more than one hundred years of political, cultural, and
institutional history. Moreover, it shows the influence of varying preferences
of numerous philologists concerning practical lexicography as well as changed
insights into philology and linguistics.Digitizing the DWB not only means preserving the outstanding achievements of
German lexicography but also opens up new possibilites in using the rich
dictionary material. Since November 1998, a project at the University of Trier
has been creating a computerized version of the DWB to be published on CD-ROM
and also made available via the Internet. It is intended to provide
user-friendly search and display software in order to get optimum opportunities
for data retrieval. It will as a result appeal to anyone interested in the
German language. In this way, the poor situation in the field of electronic
dictionaries of German when compared internationally will be decisively
improved.II. Technical issues:Taking into account the developments of international standards for text
encoding, TEI Guidelines are used for a structured markup of the dictionary.
This prepares the way for the production of the CD-ROM version by applying
special SGML tools: starting-point for the application of CoST (Copenhagen SGML
Tool) is our pool of SGML encoded files which have been created from the TUSTEP
(TUebingen System of Text Processing Programs) data and validated by an SGML
parser. CoST is a general-purpose SGML post-processing tool. It is a
structure-controlled SGML application, that is, it operates on the
element-structure information-set (ESIS) representation of SGML documents. CoST
provides a flexible set of low-level primitives upon which sophisticated
applications can be built. These include a powerful query language for
navigating the document tree and extracting ESIS information, an event-driven
programming interface, and a specification mechanism which binds properties to
nodes based on queries. On the one hand CoST generates a set of HTML pages for
displaying the dictionaries by traditional web browsers, on the other hand it
transforms the SGML data into command scripts for Tcl/Tk for the graphical user
interface of the CD-ROM. Tcl, the Tool Command Language, is a very simple
programming language. Tcl provides basic language features such as variables,
procedures, and control, and runs on almost any modern OS, such as Unix,
Macintosh, and Windows 95/98/NT computers. Tk is a Tcl extension, written in C,
designed to give the user a relatively high level interface to their windowing
environment. Finally CoST is used to set up a database that contains all the
information of the dictionary entries necessary to perform queries about the
different components that might be interesting to those concerned with studying
the German language from its very beginnings, such as etymology, language,
quotations, including of course traditional full-text retrievals. The database
is accessible from both platforms: the web browsers connect to it via CGI
scripting, and the CD-ROM GUI uses an integrated Tcl interface.III. Software demonstration:The software demonstration will present the way in which valuable information can
be extracted from an electronic version of the DWB by using full-text retrieval,
links, and a database that facilitates complex queries.The possibility of using different retrieval options enables the user of a CD-ROM
or Internet version of the DWB to search for certain phenomena in up to 33
substantial volumes, independently of headwords. The information hidden within
the different entries is made even more explicit in various ways: via
hyperlinks, the index volume will be connected with the references appearing in
the dictionary. The information needed in order to quote from the sources of the
DWB can thus be accessed very easily via pop-up windows. As a common feature of
electronic dictionaries, it will be possible by a preparation of a list of
headwords to look up every one of the headwords just by activating the
corresponding links and, moreover, to gain access to certain parts of the longer
articles separately by using the original organization of the entries. Specific
information as to the grammatical gender of headwords or sublemmata or the
occurrence of certain words in quotations from certain authors or literature
will be obtainable by a database generated from TEI compliant markup as
mentioned before.One of the major aims of the project is, however, not only to show the user
different means of making use of the DWB for lexicographical, historical, or
linguistic studies and to present the DWB in an appealing way, but to increase
the use of the DWB in general, i.e. as a book in several volumes to be read by
those interested in the German language and the history of German words. For
such purposes, it is absolutely necessary to allow the user an easy access both
to the electronic version of the DWB and to the information stored within the
entries, and to the printed version of the DWB as presented on screen via
PostScript files. Encoding the entries according to TEI Guidelines is as
important as presenting the characters of the different languages exactly as
they appear in print.The procedure of digitizing the DWB is very closely connected to that of one of
the other retrodigitization projects at the University of Trier, Digital Middle
High German Dictionaries Interlinked. As the Middle High German Dictionaries are
quoted very often within the DWB, it is also intended to include links to the
electronic version of these dictionaries. Their CD-ROM is meant as a prototype
for the retrodigitization of other historical dictionaries, thus the
presentation of the Dictionaries Interlinked may conclude the software
demonstration to show how the DWB will look in a final state.At the time of the conference, at least two major volumes of the DWB including
the index volume will be fully encoded, converted into a CD-ROM, and accessible
for searches. Questions to be discussed when presenting the DWB in its
prospective digital version might focus on the application of SGML/TEI to a
dictionary as heterogeneously structured as the DWB, on the necessity for
developing new entities for character representation, and the importance of a
digital DWB for future research.Literature:RuthChristmannVeraHildenbrandtThomasScharesDigitalisierung des Deutschen Woerterbuchs von Jacob
und Wilhelm GrimmNicolasCastrillo Benito et al TUSTEP educa. 6. ITUG-Jahrestagung. Burgosforthcoming, 2000