Solutions for the Delivery of Thematically-Tagged
TextTerryButlerUniversity of Alberta, Canada GregCoulombeUniversity of Alberta, Canada SueFisherUniversity of Alberta, Canada 2000University of GlasgowGlasgowALLC/ACH 2000editorJeanAndersonAmalChatterjeeChristianJ.KayMargaretScottencoderSaraA.SchmidtText EncodingIntroductionThe Orlando Project has developed prototype delivery software which gives end
users access to our literary history textbase. Richly-tagged SGML data is
automatically converted to XML, and presented to users through a custom
application (which runs locally on their machine, and communicates with a
back-end XML server). The design of the user interface has been developed
through a formal user needs analysis, conducted with a local Pilot Users
Group. In the process, we have learned a great deal about how to exploit the
richness of a heavily-tagged textbase, and how to present this information
selectively to end users (meeting their information requirements without
overburdening them with complexity).The Goals of our ProjectThe Orlando Project is applying state-of-the-art software technology to
traditional fields of study in the humanities. We are writing a literary
history of women's writing in Britain, as both a conventional published text
and as an SGML tagged textbase. At present (November 1999) we have documents
on 850 British women writers and documents on 590 other writers. For each
author we have a pair of interdependent documents - a biography and a
writing life history. This material is supplemented by 13,600 events, which
are discrete dated items providing the further essential and enriching
political, social and cultural background to the work. Events vary in their
depth of coverage, but are in every case in one way or another related to
the literary history which we are writing. Here are three examples of
events:1863: Selective chronology: British Women
writers: Florence Nightingale privately printed an anonymous
pamphlet, Note on the supposed protection afforded
against venereal disease by recognizing and putting it under police
regulation. [keyword: law and legislation] [keyword: body/health
- venereal disease]August 1863: Selective chronology: British Women
writers: Florence Nightingale corresponded with Harriet
Martineau, outlining the case against the Contagious Diseases Acts. (Vicinus
441)by 1871: Comprehensive chronology: Social
climate: The Royal Commission on the Contagious Diseases Acts
rejected a suggestion that soldiers and sailors be required to submit to the
same regular examinations required of the prostitutes they frequented. The
commission believed "there is no comparison to be made between prostitutes
and the men who consort with them. With the one sex the offence is committed
as a matter of gain; with the other it is an irregular indulgence of a
natural impulse." This illustrates the double standard that held women to be
sexually unresponsive and men to be prey to strong desire; paradoxically,
this belief coexisted with the notion that women were emotional and
irrational, while men were more enlightened and controlled.Delivery PlansThe Orlando Project received SSHRC funding in 1994. Our grant proposal at
that time argued that SGML was the only feasible means to capture and encode
the complex thematic approach to literary history which the project
required. As to the ultimate means of delivery for this information, we
anticipated that the technology landscape would be utterly changed 5 years
on. We believed that there would be ways to deliver SGML to end users at the
very end of the 20th century (we were also aware, in 1994, that there were
perfectly acceptable ways of converting and delivering SGML information). We
have found that XML is the means to the end which we hoped would appear. XML
is a rapidly developing W3 Consortium standard, which will permit the direct
delivery of tagged information to end users. An XML audit of our textbase,
carried out in 1998, showed us that (for delivery purposes) our textbase
could be transformed from SGML to XML without any loss of its intellectual
value. We are able today to deliver our richly-tagged information to a
client program (running in an XML browser, such as Internet Explorer 5; or
in a custom application which support XML though third party software such
as IBM's XML toolkit.User NeedsAssessmentHaving received a great deal of positive encouragement from the scholarly
community that the information we are developing is of considerable
interest to them, we began a formal process of user needs assessment. A
richly-tagged textbase such as ours can be exploited by end users in a
wide variety of ways:subject-specific searches can create customized chronologies
and research texts for readingimposing chronological limits can highlight issues and create
connections which standard "period" labels obscureconsistent tagging allows one or more documents to be compared
"side-by-side", to reveal new insights about authors and their
contextThe most important issue for us was to "bridge" between the complex tag
set which we have created and the terminology and information
expectations which will characterise our end users. The strengths of our
tagging are their rigour, and the highly detailed descriptions of their
meaning. Their deficiency (from the point of the end user) is that this
knowledge is locked up in a single tag name which may be opaque (such as
our Cultural Formation tag) or dangerously obvious (such as our Name
tag, which has a precise meaning and occupies a specific niche in a
constellation of about a dozen "personal name" tags). In order to drive
the development of the software from the users' point of view (rather
than our own), we struck a Pilot Users Group. This group (about a dozen
people) were drawn from representative communities who we expect will be
interested in accessing our information, including: professors, graduate students, and undergraduate
studentsscholars in fields such as English literature and
Historylibrarians and information scientistsThe program for this group was devised in order to elicit their
expectations and desires for our software, without raising the question
of what the software would look like or how it would work. We began with
meetings where the group were given only written and oral accounts of
our Project's goals and content; we elicited the group's own
descriptions and terminology for our areas of interest. In the fall of
1999, building upon our team's sense of what kinds of access we could
provide to end users, the Pilot Users Group was asked to comment on an
on-screen mock-up of our delivery software. These sessions were
conducted as formal focus groups [Greenbaum; Jordan]; the sessions were
recorded and team notetakers wrote down the comments and suggestions
from the users group. Because the software on-screen was truly "throw
away", we are able to genuinely encourage the users to critique it and
explore their preferences and expectations. We also surveyed the
computer equipment and level of experience of the user group; we will
expand this survey, to make sure we create delivery software which our
target users can run, and which they will be able to learn to use
effectively.Software ArchitectureOur prototype delivery software is being written in a client/server
fashion. The client end is a Java program which uses XML-aware code to
request XML documents from the server to process them (by sorting,
selecting, and sub-setting), and then displays them using XSL (the XML
stylesheet language). Although it is technically possible to execute
this part of the process inside an XML-capable browser, the nature of
our textbase and the kinds of interaction which we wish to provide are
rather unlike the Web-page metaphor. Our textbase can be queried to draw
together coherent document sub-sections from many documents at once,
which can be presented to the user in various forms, such as a
customised chronology or a synoptic view of relevant sections from the
lives or works of many authors at once. For this reason we feel the
creation of an independent delivery program is desirable. A similar
consideration operates with respect to linking within our textbase. We
are implementing a much richer form of linking that the web at present
provides; a great deal of the linking which end users will be able to
explore will be generated automatically through the carefully and
consistently tagged text. Users who are viewing text of interest will be
able to pursue that interest by traversing automatic links which will
open up from our elaborately tagged text. The server side of this
architecture will make available our tagged textbase (as an XML document
collection) which will respond to user queries by selecting and sending
XML documents to the client program. We have explored various
technologies to provide this searching and delivery on the back-end,
including Java and CGI formats (using both Perl and SGREP to handle the
searching). The obvious advantage of this approach is that the server
can be implemented in more than one way (and be revised and extended as
new technologies appear), while the front end client program remains the
same (or is extended and improved on an independent trajectory). We are
making extensive use of standard technologies, such as XML, XSL, and
HTTP (for the communication between client and server). This will aid
the process of generalising this software to meet the needs of other
users who wish to present SGML or XML text to users without "rendering
it down" to display-only formats like HTML.IssuesXML is an emerging standard. The software support for XML is beginning to
appear; our strategy will be more effective as XML becomes ubiquitous
and a variety of robust XML-capable tools emerge.The current effort is a "prototype"; the exercise of deploying it will
have both successes and failures, from which we will learn.We have been very careful to avoid using the "Web" metaphor - our
textbase can be delivered in ways which are much more dynamic and more
informative that a Web delivery metaphor would imply. This ambition is
to some extent undercut by the expectations of our Pilot Users Group,
who came to the material with "Web on the brain". A classic case of this
was the specific comment that we ought not to use a certain shade of
blue for text if it was not a link, because "blue means link".ReferencesThomasL.GreenbaumThe Handbook for Focus Group ResearchSecond editionSage Publications1997PatrickW.Jordan et al Usability Evaluation in IndustryTaylor & Francis1996SteveMcConnellRapid DevelopmentMicrosoft Press1996TerryButlerSueFisherOrlando Project: Issues when Moving from SGML to XML
for Delivery of Content-Rich Encoded TextPresentation at Markup Technologies '98, Chicago, Nov.
12-13, 19981998TerryButlerCan a Team Tag Consistently? Experiences on the Orlando
ProjectPresentation given at ACH-ALLC 1999, Charlottesville
VA, June 19991999