Supporting Digital Scholarship: a Project Funded by the
Andrew W. Mellon FoundationJohnUnsworthUniversity of Virginia, USA WorthyMartinUniversity of Virginia, USA ThorntonStapleUniversity of Virginia, USA KenPriceUniversity of Virginia, USA 2000University of GlasgowGlasgowALLC/ACH 2000editorJeanAndersonAmalChatterjeeChristianJ.KayMargaretScottencoderSaraA.SchmidtSummary:To date, digital library efforts have focused on library-based production of
digital primary resources. This project will, for the first time, address
second-generation digital library problems, where the focus is on scholarly
analysis, reprocessing, and the creation of digital primary resources. With
$1m in support from the Andrew W. Mellon Foundation over three years
(2000-2002), the University of Virginia's Institute for Advanced Technology
in the Humanities (IATH) and the University of Virginia Libraries' Digital
Library Research and Development Group will address three closely related
problems: 1. scholarly use of digital primary resources;2. library adoption of "born-digital" scholarly research;
and3. co-creation of digital resources by scholars, publishers, and
libraries.We predict that these problems will confront all research universities within
the next decade and - because both faculty-driven humanities research
computing and digital library activities have been underway at the
University of Virginia for most of the past decade - we believe we are
uniquely positioned to address these problems now. The outcome of this
project will be methods and guidelines for, and examples of, the management
of digital objects across the scholarly information continuum, from creation
(by libraries or scholars) to use in research, re-presentation in
scholarship, and re-integration into library collections as scholarly
publications and tools for research.Institutional Background:Since its inception in 1992, the Institute has focused intensive support and
advanced computer resources on long-term humanities research proposed by
faculty at the University of Virginia and elsewhere. To date, the Institute
has supported more than forty fellows in architecture, landscape
architecture, architectural history, art history, religious studies,
classics, anthropological linguistics, medieval and 19th-century British
literature, 19th-century American literature, American history, classical
history, history of science, archaeology, film, and music, among other
disciplines.The majority of this research - indeed, most of the Institute's work -
involves intensive collaboration among groups of scholars, and between
scholars and the Institute's technical experts. The Pompeii Forum project,
for example, sends an interdisciplinary group of researchers to Pompeii each
summer, where a systematic survey of the Forum at Pompeii is being conducted
using an extremely accurate surveying device known as a laser Total station,
and feeding data from that device into a laptop in the field. These
measurements are then brought back to the Institute, where they are
processed into two-dimensional plans and three-dimensional CAD models.
Further field-research provides an extensive photographic survey of the
buildings at Pompeii, and these photographs are used in conjunction with
advanced photogrammetric software to create accurate, photo-realistic
surfaces for the three-dimensional CAD models. Finally, using modeling tools
custom-built at the Institute, the researchers are able to combine
individual building models into a model of the entire site and even render
the walls transparent, in order to see both sides at once, thus producing an
analysis of the Forum more detailed, more accurate, and more flexible than
any other to date.The University of Virginia Libraries have established a number of electronic
data centers that work closely with the Institute's staff and fellows: the
Electronic Text Center, the Geospatial and Statistical Data Center, the
Digital Media Center, and the Special Collections Digital Center. Library
digital centers have provided support to many of the same faculty involved
in research with the Institute, and staff from these centers meet regularly
with IATH staff and others in a digital library interest group. Most
recently, the Libraries have established a Digital Library Research and
Development Group, charged with long-range planning of digital library
architectures, systems, and procedures. Having begun to assemble a broad
digital collection, they recognize that no library management system yet
exists to handle it and they have dedicated themselves to developing an
appropriate solution to the problem.Further information about library digital centers is available on the Web at
<>Information about Digital Library Research and Development is available at
<>.Project Goals:Much of what has taken place in digital library contexts to date has aimed at
producing large collections of digital data, often - in fact usually -
without the involvement of the intended audience for that data, scholars and
researchers. In this project, we aim to foreground the scholarly user -
something we believe we are uniquely positioned to do - and from this
perspective we will look at the issues of collections development, data
management, metadata, and digital library systems. We expect to complete a
number of trials in these areas, and although we do not believe the scope of
this project is sufficient to provide universal or definitive solutions, we
do expect to arrive at a better understanding of the problems that will be
involved in the next generation of digital library activities.So much hyperbole attends the current phase of digital library development
that it may seem surprising to suggest there are things scholars need to do
that digital libraries cannot support. Three scenarios are presented here as
examples of some of those unsolved, second-generation digital library
problems:Scenario 1: Scholarly use of digital primary resourcesA literary scholar researching the history of a
particular poem knows that its author also painted the subject of
the poem. She can find information about the poem and the painting
in the digital library, and can even retrieve a digital image of the
painting. The scholar knows that other dual-media works were
produced by this author, and she suspects that the author's
arrangements of his paintings in exhibitions might well be
significant in understanding the related literary works: therefore,
the scholar would like to use the digital library to find out when
the painting in question was exhibited and, for a given exhibition
date, would like to know what painting was to its left and what
painting was to its right - and then see those paintings together in
a virtual reconstruction of the exhibit.In this example, we consider the possibility that the scholar of the very
near future will want to do something more than browse or perform
keyword searches in the digital library. The promise of the digital
library is that it will enable scholars to frame questions that would
have been inconceivable without this technology. And yet, in practice,
we find that digital libraries support only very narrowly defined
investigative activities. Partly this is because we tend to treat
objects in the digital library as though they had no other temporal or
spatial contexts - as though they had always and only existed, discrete
and timeless, in our information systems. Partly, too, these limitations
are a sign that the digital library is mainly concerned, at this point,
with providing simple access to the discrete digital object, rather than
with supporting context, comparison, or analysis - the building blocks
of scholarship.We could begin to grapple with this problem by producing several
proof-of-concept example projects, in which data and metadata expressly
support more complex kinds of "behaviors" in the digital library, and
are associated with other objects in the digital library (e.g., Java
applets) that actualize those behaviors on the end-user's machine. This
follows the Fedora model that the library is already developing,
specifically that aspect of Fedora that permits "client access to
multiple views, or disseminations, of the object's data through the
transparent activation of external mechanisms that execute these content
type behaviors"<>.Scenario 2: Library adoption of "born-digital" scholarly
researchAn archaeologist spends decades producing detailed
digital records of an important classical archaeological site. The
records include CAD reconstructions of individual buildings,
topographical maps, photographs, and maps locating particular
artifacts in areas and layers of excavation, and large-scale
computer models of the entire site. Upon retirement, the
archaeologist offers his entire collection of digital records to the
library (since no publisher has ever known what to do with them) -
but he offers them on the condition that the library treat these
records as a special collection, catalogue them, and make them
available through the web to other researchers and students of
archaeology.This example makes plain the problems that libraries will inevitably face
as they come to collect digital resources produced by scholars outside
of library (and quite possibly, publishing) frameworks. The problem is
likely to be especially acute in the areas of architecture and
archaeology, where data is likely to have been produced by researchers
in digital form, and where we have few (if any) established conventions
for collecting, normalizing, cataloguing, providing, or preserving such
data. A single map or CAD drawing could represent hundreds of hours of
research, data gathering, and expert analysis - as valuable, in
principle, as a monograph or a journal - and yet libraries might well be
unable to accept it, for lack of appropriate systems and procedures. As a pilot project in this area, we can recruit large existing
collections of digital architectural and archaeological data (from The
Pompeii Forum, Victorian London, The Waters of the City of Rome,
Jefferson's Architecture, and other IATH projects), and use that data to
experiment with cataloging, collections, and preservation issues raised
in such contexts. At the end of three years, we would expect to have
brought several such collections into the library.Scenario 3: Co-creation of digital resources by scholars, publishers,
and librariesA historian, working together with technical experts
in the library's Geospatial and Statistical Data Center, uses census
data, eyewitness accounts, military records and contemporary GIS
information to generate a time-indexed, geo-referenced
reconstruction of troop movements in a famous civil-war battle. The
research is going to be published by a university press, and the
press has contributed original vector data for the underlying map.
At different points in this process, the press, the historian, the
historian's graduate research assistants, and library experts all
need to share editorial control of the evolving data set. At the end
of the process, the data set needs to be published by the press,
collected in the library, and connected to textual records of the
event.Increasingly, we believe, scholars and libraries and publishers will
enter into collaborative arrangements involving the production of
digital primary resources by the library, a scholarly treatment of those
resources, and electronic publication of the result. We have already
seen many instances of this pattern in IATH research projects. In
retrospect, it seems perfectly reasonable that the institution owning
the primary resources (a rare book, a painting, a statue, a map) would
want to produce its initial digital representation; once that digital
representation exists, it seems inevitable that scholars will want to do
what they have always done - edit, contextualize, re-present, and
analyze the (now digital) object. And, if not inevitable, it seems at
least likely that the result of this scholarly engagement with digital
primary resources will be the stuff of scholarly publishing. There are
many unanswered questions, though, behind these three reasonable
assumptions: should it be a goal to have a single authoritative version
of the digital object? If so, how might scholars and/or publishers
register corrections or revisions to the original, if the original is
produced (and presumably owned) by a library or museum? If several
scholars disagree on the verisimilitude of the digital representation,
how will their range of opinions be recorded and connected to that
representation? If electronic editions of the artifact become the norm,
instead of an authoritative version with apparatus, then how should
those editions be derived and denoted?At IATH, we already have several projects that raise this sort of problem
- the Valley of the Shadow, the Walt Whitman Archive, the Victorian
London project, and others. We have a document management system
(Astoria) that will help to address some of the practical procedural
issues involved in managing multiple authorship; we will experiment with
integrating that system into the library's production strategies, to
address those situations in which a single authoritative version is
necessary or desirable, but we would also expect to experiment with
managing and coordinating multiple divergent editions of a single base
object, or multiple perspectives on an object.In order to address the many problems - some technical, some social, some
intellectual - raised in these three scenarios, we need to move beyond
the simple production and cataloguing of digital collections, and begin
to recognize that, in the library of the future as in libraries of the
past and present, most materials will be produced by many hands, not
few; most materials will incorporate many perspectives, not one; and
most materials will need to support specialized and pointed research as
well as general, blunt queries.Recognizing these things, we will undertake a collaborative investigation
of advanced digital library problems, including library absorption of
scholar-produced digital resources, library/scholar co-creation of such
resources, and analytical use of digital humanities data. Within this
investigation, our emphasis will be on metadata practices, library
systems, and production protocols that support scholarly use. And though
we don't promise to solve all the problems that might be raised in this
area, we will establish guidelines that will be useful to others,
produce examples that others can imitate, and learn which problems are
easy to solve and which are difficult.Content:We will focus in particular on visual and spatial data, with an emphasis on
architecture and archaeology, but also considering visual arts, especially
in complex spatial and temporal contexts. There are a number of research
projects already underway in scholarly contexts that are producing and
freely distributing digital data in architecture and archaeology. The
problem these disciplines face is that there is no well-established
institutional mechanism for collecting, preserving, or publishing digital
objects of this sort (CAD drawings, digital topo-maps, 3D models, even
digitized photo or slide collections). Moreover, the strategies for
cataloguing and describing of art objects do not work very well with the
more hierarchical and complex information structures that characterize
architectural and archaeological data. With respect to visual arts, we are
particularly interested in developing and applying metadata structures that
would support comparison, contextualizing, and analysis of art works, and in
producing some sample applications that would demonstrate to other libraries
and scholars the value of spatial and temporal metadata.Part of the budget for this project will go, in small one-year awards,
directly to ongoing faculty research. A library/IATH committee will
administer these funds, and they will be used to support experimentation, in
the context of faculty research, the results of which would generalize
readily to other contexts. Normalizing data, standardizing metadata,
capturing new data in accordance with recently specified best practices -
all of these are appropriate activities for this committee to fund.Intellectual Property:The University of Virginia will grant to The Andrew W. Mellon Foundation a
non-exclusive, royalty-free right to access, use, and distribute for
educational, social and/or charitable purposes, the software technologies,
tools, and related documents developed as a result of this project and to
incorporate such software technologies, tools, and related documents in
other projects supported by The Andrew W. Mellon Foundation.Systems, Procedures, and Standards:This project raises technical challenges at the level of information systems
design and at the level of standards design and implementation, and it
requires a coordinated investigation of these issues by IATH and the
Library's Digital Library Research and Development Group. Fairly high-level
staff will be needed for this: at IATH, we need to support cutting-edge
technical work in architecture, archaeology, mapping, and other visual-data
fields. We also need to hire a second person at IATH, to concentrate on
database and document management systems, as the production end of a
continuum that delivers data to library systems. In the Library, we would
like to add a position to the Digital Library Research and Development Group
to implement systems and standards for producing, managing, and
disseminating visual and spatial data in library contexts, to ensure that
those library systems respond appropriately to the needs of research users,
and to work with IATH and others on the difficult issues of adoption and
co-creation, mentioned above.Software tools and environments for producing, managing, and publishing large
image collections are of interest to IATH, and even more so to the Library,
inasmuch as many of our research projects involve the creation and use of
extensive image collections. IATH's principal interests in this area would
be workflow and data management: on the library side, Thornton Staples has
been working with the Cornell Digital Library Group (Carl Lagoze) on
technical issues involved in the creation of digital repositories, and on
the implementation of Lagoze's Flexible and Extensible Digital Object and
Repository Architecture.Both IATH and the Library are very interested in applications of SGML, XML,
and HyTime to the problem of describing art, architecture, and
archaeological sites. In particular, we believe there is significant work
yet to be done in the description of art collections, the treatment of
three-dimensional objects as information structures, and capturing the
passage of time as an element of these collections and structures. Thornton
Staples has been working in this area, developing something he calls the
General Descriptive Modeling Scheme (GDMS), an Extensible Markup Language
(XML) document type definition (DTD) that is intended to be used to create
textual models describing real-world phenomena (such as creations, events,
places and people) and giving a context for describing the content of, and
relations among, digital objects.All of these interests could have a direct relevance to Mellon's projected
work in the areas of art, architecture, and archaeology, in ARTSTOR. In
order for collections of two- and three-dimensional image data to be useful
for teaching and research, the ARTSTOR collections will need to be embedded
in data structures that can support annotation, multiple spatial and
temporal arrangements of works and sites, and the representation of change
over time.Personnel:Library and Institute staff directly involved in design aspects of this
project throughout the three years, as part of their regular duties, include
Worthy Martin, Technical Director, IATH and Associate Professor, Computer
Science; Daniel Pitti, Project Director, IATH; Thornton Staples, Director,
Digital Library Research and Development; John Unsworth, Director, IATH, and
Associate Professor, English.Other Library personnel who would contribute some part of their time to
implementation, as part of their regular library employment, would include
Edward Gaynor (Special Collections); Rick Provine (Digital Media Center);
David Seaman (Electronic Text Center); Ross Wayland (Digital Library
Research and Development); Patrick Yott (GeoSpatial Information Center).IATH fellows whose ongoing research will be directly involved in this project
include: Ed Ayers et al., Valley of the Shadow;
David Blair, WaxWeb; John Dobbins, Kirk Martini
et al., The Pompeii Forum Project; Morris Eaves,
Robert Essick, Joseph Viscomi, The Blake
Archive; Lavahn Hoh, The Circus in Europe and
America; Jerome McGann, The Rossetti
Archive; Michael Levenson et al., Monuments
and Dust (Victorian London); Kathy Poole, Boston Back Bay Fens; Ken Price et al., Walt Whitman Archive; Ben Ray, The Salem
Witch Trials; Katherine Rinne, Waters of the
City of Rome; Marion Roberts, Salisbury
Cathedral; Ken Schwartz, Charlottesville
Urban Design; Richard Guy Wilson, Jefferson's Architecture.Management Plan:This project will be jointly managed by John Unsworth and Thornton Staples,
with close cooperation among IATH personnel, faculty fellows, and library
staff. Fellows will provide digital objects (maps, photographs, models,
etc.) and the metadata to accompany those objects, as well as some
functional specifications for scholarly use of those objects. The Digital
Library Research and Development Group will work with IATH and its fellows
to establish guidelines for the production of digital data and metadata to
be collected and disseminated by library systems, and they will advise IATH
and its fellows on the systems design and development issues that attend the
adoption of information produced by IATH fellows. IATH staff will support
data production to agreed-upon standards, will consult with fellows and
library staff on the specification of those standards, and will work with
library staff to prototype the functionality requested and specified by the
scholars who produce (and intend to use) the data.Work Plan:Year One: Primary objectives in the first half of
this year will be hiring, training, and information-gathering (which would
include external consultation as well as a thorough analysis of our own data
and systems). In the second half of the year, we will finalize a first
version of the General Descriptive Modeling Scheme, while working with
individual projects to establish and document standard procedures for
producing descriptive, structural, and administrative metadata.Year Two: In the second year, we will attempt to
deposit information from the Waters of Rome, Boston Back Bay Fens, and the
Pompeii Forum projects into the Digital Library, and we will experiment with
Java applets provided via Fedora as disseminators for the comparison and
analysis of visual art objects based on metadata, probably using projects on
Blake, Rossetti, and Salisbury Cathedral.Year Three: In the third year, we will focus on the
difficult issues involved in co-creation of scholarly resources, both
technical and social. We will experiment with multi-author/single version
solutions (in the Valley project, Jefferson's Architecture, and Victorian
London), and we will also look at multi-author/multiple edition solutions
(with some of the same projects, plus Whitman, Salem, and others).Dissemination:Information about the problems encountered and lessons learned in the
experiments described here will be reported at the conferences that project
participants normally attend - annual meetings of the Association of
Research Libraries, the Research Libraries Group, the Digital Libraries
Federation, the American Association of University Presses, the Modern
Language Association, the Association for Computers in the Humanities, the
Association for Literary and Linguistic Computing, the annual XML/SGML
conference, the Markup Technologies conference; these results would be
appropriate to publish in the journals associated with some of these
professional associations as well. Our presentations at these and other
conferences would be supported in many cases by the travel portion of the
budget.In addition to these venues, the Web itself is obviously an important medium
in which to publish project results and documentation - for example,
Document Type Definitions, production manuals, best practices, and reports
on our failures and successes. Reusable technical products of the research
such as DTDs or software will be freely distributed, updated, and documented
through the Web. Finally, the web-based content that is produced in the
different scholarly projects that participate in this research can provide
links to "how-to" information.