'DNA' and Non-traditional Authorship Attribution: An
Inclusive ModelJosephRudmanCarnegie Mellon Universityjr20@andrew.cmu.edu2002University of TübingenTübingenALLC/ACH 2002editorHaraldFuchsencoderSaraA.Schmidt Anything a person writes contains the code of his intellectual
DNA, or whatever you want to call it.Webb 1994The greater the number of features and the more the features
belong to different categories (e.g., syntactic structures, type of
grammatical subject, inflexions, vocabulary, spelling, and so on)
the stronger the case for shared authorship. Eagleson 1989INTRODUCTION:For many years it has been obvious from the literature that most
non-traditional authorship attribution studies using one or some other small
number of style markers do not carry the weight of scientific validity with
either the majority of other authorship attribution practitioners, the
specialists in the field of the study, or the general public. (In addition
to Eagleson, see Banks and Rudman -- also, Rudman 1998)During a talk on the "Style-Marker Mapping Project" at the ALLC-ACH 2000
conference in Glasgow, I mentioned, in passing, an attribution model based
on a "DNA" concept. (Rudman 2000) It was illustrative and not "on topic."
However, the audience picked up on this and some of the ensuing questioning
and discussion kept trying to move away from the Style-marker Mapping
Project.This paper presents a non-traditional authorship attribution model based on a
"DNA" analogy. This paper emphasizes that it is only an analogy -- a
framework to explain the techniques of the "Inclusive Model" -- there are
obvious fundamental differences between DNA and style.Because some of the terms in this paper could be unfamiliar to the expected
audience, a clear and concise definition is given the first time each such
term is used.I. BACKGROUND AND DEFINITIONSIf we look at style as a living organism, style-markers are its genetic
material -- making the Style-Marker Mapping Project (Rudman, 2000) analogous
to the human genome project. I would like to extend this biology analogy:
The Inclusive Authorship Attribution Model being analogous to the DNA
analysis.The earliest reference to DNA and style that I have seen is Bailey's
comparison of the tools used to decode the underlying makeup of the two --
X-ray diffraction for DNA, the computer for style. Bailey does not move
towards a DNA model for stylistics. (Bailey)The lead quote by Webb also is quoted in Forsyth's dissertation. Yet Forsyth
does not use the intent of the quote to move into a DNA model. (Forsyth)I have been leaning towards a more inclusive attribution model that would
utilize a large number of style-markers since the mid 1980's. Other
researchers also have recognized the need to expand the number of style
markers in attribution studies. As the DNA structure became decoded and the
comparison methods refined, it became the analogous model of choice. I first
mentioned the model at the ALLC-ACH Oxford conference in 1992. (Banks and
Rudman) The thrust of that presentation was towards a statistical method of
combining the results of different statistical results on various
style-markers. This section briefly traces the evolution of the DNA model
through various publications and presentations.Clear and concise definitions of the DNA autoradiogram are given. (Kirby) A
brief explanation of why this model is necessary closes this section.
(Willing)II. THE MODELOutline a method of analysis which will allow organization of these
features [the entire range of linguistic features] so as to facilitate
comparison of any one use of language with any other(Carter, Crystal and Davy, and Darbyshire). McMenamin 1993A) How the Inclusive Model differs from other models (e.g.
multivariate models and Burrows' Delta Project). (Holmes, Burrows) B) The DNA Analogy is Explicated.It is shown how each locus of
the autoradiogram is equivalent to a different style-marker. The
determination of each style-marker locus is discussed.Forsyth's
suggestion at the Glasgow conference that a list of "proven"
style-markers should be provided and used is discussed.C) Visual RepresentationA Method of visual representation of the
results of the model is shown.D) The following two statistical methods of combining each
style-marker locus into a final answer are presented and discussed:(1) If the style-markers that are used can be shown to be
independent of one another (e.g. word length distribution,
percentage of nouns starting sentences, type/token ratio) a
procedure based on Fisher's method for combining significance
probabilities from independent statistical tests can be used.
(Fisher)(2) If the style-markers that are used are not independent of
each other (e.g. word length distribution, word length
correlation, percentage of latinate words) the statistical
method employed by DNA researchers can be used.CONCLUSIONThe method of determining the DNA loci and style-marker loci are different. A
single technique is employed to determine all of the DNI loci. Each
style-marker locus is determined, for the most part, by different
experimental techniques. And some of the style-marker loci are actually the
result of multivariate statistical analysis.The Inclusive Authorship Attribution Model promises a degree of acceptability
not seen in most non-traditional attribution studies -- especially in types
of studies such as McMenamin's, "`Population Model' where there are no
obvious authorship candidates, and texts from an entire population of
possible authors are considered against texts by one suspected author."
(McMenamin)Preliminary BibliographyRichardW.BaileyThe Future of Computational StylisticsALLC Bulletin74-111979[First presented at the Association for Literary and Linguistic
Computing Fifth International Meeting, Friday, December 15, 1978, King's
College, University of London. Also in LITERARY
COMPUTING AND LITERARY CRITICISM. Ed. Rosanne G. Potter.
Philadelphia: University of Pennsylvania Press, 1989, 3-12.]DavidJ.BaldingPeterDonnellyInference in Forensic IdentificationJOURNAL OF THE ROYAL STATISTICAL SOCIETY A158[Part 1.]21-531995DavidL.BanksJosephRudmanQuestionable Attribution in the Canon of Daniel Defoe:
A Study of TechniquesALLC-ACH'92 Conference. Oxford University, April 7,
19921992JohnBurrowsQuestions of Authorship: Attribution and Beyond. A
Lecture Delivered on the Occasion of the Roberto Busa AwardACH-ALLC01 Conference. New York University, New York,
June 14, 20012001RobertD.EaglesonLinguist for the ProsecutionGeraldineBarnes et al WORDS AND WORDSMITHSSydneyThe University of Sydney Press198922-31R.A.FisherSTATISTICAL METHODS FOR RESEARCH WORKERSLondonHafner1969RichardS.ForsythStylistic Structures: A Computational Approach to Text
ClassificationDissertationUniversity of Nottingham1995DavidI.HolmesAuthorship Attribution and the Book of Morman: A Case
Study in Stylometric TechniquesPh.D ThesisUniversity of London, Kings CollegeMay 1990DavidI.HolmesVocabulary Richness and the Prophetic Voice(A supplement to the main thesis.) Ph.D ThesisUniversity of London, Kings CollegeNovember 1990LorneT.KirbyDNA FINGERPRINTING: AN INTRODUCTIONNew YorkW. H. Freeman1992GeraldR.McMenaminFORENSIC STYLISTICSAmsterdamElsevier1993JosephRudmanThe Style-marker Mapping Project: A Rational and
Progress ReportALLC/ACH 2000 Conference, University of Glasgow,
Scotland, July 25, 20002000JosephRudmanThe State of Authorship Attribution Studies: Some
Problems and SolutionsCOMPUTERS AND THE HUMANITIES314351-3651997CharlesWebbInterview inTHE INDEPENDENT MAGAZINE355 February 1994[Quoted by Forsyth, 8.]RichardWillingMismatch Calls DNA Tests Into QuestionUSA TODAY3A8 February 2000