Metainformation Strategies for Electronic
ResourcesSusanSchreibmanUniversity College Dublin, Eire 2000University of GlasgowGlasgowALLC/ACH 2000editorJeanAndersonAmalChatterjeeChristianJ.KayMargaretScottencoderSaraA.SchmidtText EncodingThis paper will address the theoretical and practical issues in devising and
implementing a project-specific metainformation scheme for electronic resources.
While one can argue that a scheme like the Text Encoding Initiative provides for
encoding which greatly enhances plain text retrieval, in practice without
extensive use of the keyword or indexing elements, retrieval of information is
limited to what is explicit in the text. Searching for what is explicit in the
text, even if that text has been encoded logically (as opposed to physically),
does not provide the kind of functionality most humanists expect from digital
archives.This paper then is an exploration of the advantages and disadvantages in creating
a meta-meta information or classification scheme for electronic resources. For
this talk I will draw heavily on theoretical models (both pre-and post-computer
indexing models) from library and information studies. I will also adopt the
position that creators of electronic resources are encoding their primary
material in a SGML or XML-based metainformation scheme, such as the Text
Encoding Initiative. I will also assume that the project directors have already
made certain specific decisions in encoding what is explicit in the text in
accordance with the project's goals. In other words, I am assuming that a
digital project is already taking advantage of the tagging structure afforded in
a scheme like the TEI in providing for the encoding of titles of text, place,
personal, geographic and organisation names, etc., as deemed important to a
particular project.There can be no doubt that this type of tagging greatly enhances retrieval, for
example by distinguishing the occurrence of WB Yeats as a title as opposed to a
personal name, or facilitating the searching of all strings within a
<placename> element. And although this type of encoding of electronic
resources gives users unprecedented access in locating very specific strings of
text, in practice users are frustrated by limited and relatively simplistic
search and retrieval strategies. In most electronic resources, users are limited
to retrieving only what is explicit in the text, i.e. strings of text, some of
which have been encoded logically. In the case of images, the situation is even
more problematic. Unless a project has developed a header consisting of detailed
metainformation, most images can only be retrieved by image title. Boolean and
proximity searches go a very small way in solving the problem of retrieving more
than single word searches, but do not provide the conceptually and theoretically
rigorous searches most scholars in the humanities want and expect from
electronic resources.Specifically, this paper will address the practical and theoretical issues raised
by devising a classification or indexing scheme which facilitates search and
retrieval by going beyond encoding what is explicit in the text. To this end,
several points will be raised:although encoding what is implicit in the text facilitates retrieval
of concepts not possible by explicit encoding, this process is much more
subjective;how this subjectivity influences retrieval;the concept of granularity will be raised, and the problems of
encoding to various levels;the problems of encoding implicit metainformation which is transparent
to users.While at past ALLC/ ACH conferences many papers have discussed the difficulties
in consistent encoding of explicit text in large projects in which many people
participate in the encoding process, the possibilities for inconsistent encoding
of implicit text multiplies exponentially. Yet, I would argue, that without the
development of classification or indexing schemes, digital archives remain
hidden behind front ends which may look resplendent, but which barely reveal
their complexity and richness.To this end, the rest of the paper will be divided into three parts. Part I will
provide an overview of some of the major metainformation schemes which were
developed in a pre-digital environment, such as AACR2, the Dewey Decimal
Classification, and the Library of Congress Subject Headings. Topics to be
covered will include:the theoretical impetus behind these schemes;how and why these schemes were conceived and made extensible;why these schemes cannot be transferred to a digital environment
without adaptation.The second part of the paper will explore current applications of some of these
schemes to a digital environment, such as the Art and Architecture Thesaurus and
the Thesaurus for Graphic Materials. Specifically, I will address how these
schemes have been adapted from facilitating indexing codex-based texts to
digital ones. In addition, the special case of indexing images will also be
discussed.The third part of this paper will explore metainformation schemes devised for
several specific digital archives, including The Blake Archive and The Thomas
MacGreevy Archive, both published at the Institute for Advanced Technology in
the Humanities at the University of Virginia. In the case of the Thomas
MacGreevy Archive, I will demonstrate how we, working within the TEI, developed
a metainformation scheme which facilitated very specific genre searching for
both texts and images.