Mapping from objects to markup: a springboard for
multiple-strategy electronic publishingGaryF.SimonsSummer Institute of Linguisticsgary.simons@sil.org1997ACH/ALLC 1997editorthe secretarial staff in the Department of French Studies at
Queen's UniversityGregLessardencoderSaraA.Schmidtobject-oriented databasestext markupelectronic publishingOne of the challenges of electronic publishing is getting the information
into the right format for the particular publishing strategy being pursued.
Another is keeping up with the fast pace of change as new technologies are
developed that offer more or better ways of electronically publishing
information.This paper reports on the experience of the Summer Institute of Linguistics
in developing electronic publishing solutions for its LinguaLinks product
(SIL 1996). LinguaLinks is an electronic performance support system designed
to assist field workers with a wide range of tasks related to language
learning, language analysis, and language development.The paper first introduces the LinguaLinks model of performance support and
CELLAR--the object-oriented database system that is used to implement it.
Our approach to electronic publishing is to first build the information as a
structure of objects in the database, and then to use multiple CELLAR
stylesheets to map the information onto multiple markup schemes. The object
database thus serves as a springboard that allows us to vault the
information into any number of formats for publishing.The paper illustrates this approach to electronic publishing by focusing on
one application area that LinguaLinks supports, namely, lexical database
development. It first shows how the tutorial and reference documents that
give help on how to build a dictionary are mapped onto different markup
schemes for publication as a Folio Views infobase, a Windows help system,
and an HTML Web document. It then shows how the dictionaries that are built
by using LinguaLinks are mapped onto HTML markup to provide a display format
on the Web and onto TEI markup to provide a richer format for information
interchange and archiving.1. The LinguaLinks model of electronic performance supportThe notion of electronic performance support systems (Gery 1991) is one that
is gaining momentum throughout industry. An EPSS seeks to support a
knowledge worker in performing his or her job by providing an electronic
working environment that integrates the software tools needed to do the job
with the tutorial and reference materials that are needed to know how to do
the job. This goes well beyond the typical help system of a software program
to include not just information on how to use the program but also a library
of background information on the problem domain.LinguaLinks is designed to support tasks in the domains of anthropology,
language learning, linguistics, literacy, and sociolinguistics. Work
continues to add more and more resources in subsequent versions of the
product. In version 1.0, one of the areas that is most deeply developed is
lexical database management. This component includes a data management tool
that helps the user to build a lexical database and then to use the
information in that database to produce dictionaries for publication.LinguaLinks takes advantage of the object-oriented paradigm to provide
performance support that is tailored to what the user is trying to do. One
of the hardest problems in offering electronic performance support is
knowing just what the user is trying to do. For instance, if a word
processing program were being used to write a dictionary, it would be very
difficult to implement performance support that could sense the context
within a dictionary entry and offer appropriate help. By building a
dictionary in an object-oriented database, however, two kinds of performance
support fall out naturally.First, the software developers have performed an object- oriented analysis
(Coad and Yourdon 1991, Booch 1994) with domain experts in order to build
the conceptual model for the database. The definitions of object classes
(including their attributes) that make up this model provide performance
support by guiding the user to create dictionary entries that are structured
like ones the domain experts would have built; these definitions also
prevent ill-formed entries from being constructed.Second, the software always knows what the user is working on by observing
what object and attribute is currently selected or being editing. Thus if
the etymology is being edited, the system knows to offer tutorial and
reference material on how to write etymologies when the user asks for help.
When the user switches focus to the part of speech, then the focus of help
also switches to that aspect of dictionary making, and so on for all the
parts of the conceptual model of a dictionary entry.Section 2 of the paper gives an overview of the object- oriented database
system that is used to implement performance support in LinguaLinks. Section
3 describes how the object database is used as a springboard to markup for
electronic publishing. Section 4 then shows how the tutorial and reference
materials are mapped into markup in order to support multiple strategies for
providing performance support. Section 5 in turn shows how the dictionaries
built by users can be mapped into markup for multiple publishing
strategies.2. An overview of the CELLAR object-oriented database systemThe database system used to implement LinguaLinks is called CELLAR--for
Computing Environment for Linguistic, Literary, and Anthropological
Research. Developed by the Summer Institute of Linguistics, it is an
object-oriented database system for storing multilingual textual
information. A full discussion of the user requirements that motivated the
development of the system is given in Simons 1996. Rettig, Simons, and
Thomson (1993) discuss some of the significant ways in which CELLAR extends
the traditional object model.An application in CELLAR is a declarative model of the problem domain. A
complete domain model has the following four components:Conceptual model. Declares all the
object classes in the problem domain and their attributes, including
integrity constraints on attributes that store values and built-in
queries on those that compute their values on-the-fly.Visual model. Declares one or more
ways in which objects of each class can be formatted for display to
the user.Encoding model. Declares one or more
ways in which objects of each class can be encoded in plain text
files so that users can import data from external sources or export
them.Manipulation model. Declares one or
more tools which translate the interactive gestures of the user into
direct manipulation of objects in the knowledge base.3. Using CELLAR as a springboard to electronic publishing markupThe strategy for implementing multiple markup systems involves the visual
modeling component of CELLAR. For the single conceptual model of the
information, multiple visual models are defined. This approach has been
illustrated in another paper (Simons, in press) where multiple ways of
displaying tagged texts and critical texts are generated from the same
underlying database.In this application, each visual model generates a display format that
happens to be a markup scheme for a particular electronic publishing system.
For each class of object, a view is defined that declares how the
information in that object is to be mapped onto the display format. All of
the views that cooperate to define a single visual model are collected
together into a stylesheet.In this section, the full paper will present some source code examples that
illustrate how the views can be made to map objects onto markup.4. Strategies for publishing helps on building dictionariesThe tutorial and reference materials that provide performance support in
LinguaLinks are objects in the underlying CELLAR database. One strategy we
have followed for electronically publishing them is to present them in a
view that looks like a conventional document. But this is just one strategy.
We have followed three others as well:1. In order to make jumps to the helps virtually instantaneous, we
have mapped them onto RTF markup and compiled them as a Windows help
system.2. In order to offer full-text boolean search and retrieval access
to the library of helps, we have mapped them onto Folio Flat File
format and compiled them into a Folio Views infobase.3. In order to offer access to these helps on the World Wide Web,
we have mapped them onto HTML markup.In this section, the full paper will present examples (with markup and screen
shots) of each of these.5. Strategies for publishing the resulting dictionariesSimilarly, the lexical database that the user builds in LinguaLinks is a
collection of objects in the underlying CELLAR database. One strategy for
electronically publishing a dictionary is to deliver it as a CELLAR database
with views that present it in conventional display formats. But this
strategy will not reach a wide audience. We have thus implemented two other
strategies as well:1. In order to produce a dictionary that can be read on the World
Wide Web by any browser, we can map the objects onto HTML markup. 2. In order to produce a dictionary data file that can be archived
without any loss of structural information, we can map the objects
onto the TEI markup for print dictionaries (Sperberg-McQueen and
Burnard 1994).In this section, the full paper will present examples (with markup and screen
shots) of each of these.ReferencesGradyBoochObject-oriented analysis and design with
applications2nd ed.Redwood City, CABenjamin/Cummings Publishing Co1994PeterCoadEdwardYourdonObject-oriented analysis2nd edEnglewood Cliffs, NJPrentice-Hall, Inc.1991GloriaJ.GeryElectronic performance support systemsBostonWeingarten Publications1991MarcRettigGarySimonsJohnThomsonExtended objectsCommunications of the ACM 36819-241993SILLinguaLinks: electronic helps for language field work(Version 1.0). CD-ROMDallas, TXSummer Institute of Linguistics1996See also GaryF.SimonsThe nature of linguistic data and the requirements of a
computing environment for linguistic researchDutch studies on Near Eastern languages and
literatures21111-1281996Also to appear in John Lawler and Helen Dry (eds.), Using computers in linguistics: a practical guide.
Routledge.GaryF.SimonsConceptual modeling versus visual modeling: a
technological key to building consensusComputers and the Humanities1996In press.C.M.Sperberg-McQueenLouBurnardGuidelines for electronic text encoding and
interchangeChicago and OxfordText Encoding Initiative1994