That Was Then: Canonicity in the TrésorSusyC.SantosUniversity of Manitobaumsant06@UManitoba.CAPaulA.FortierCentre on Aging, University of ManitobaFortier@cc.umanitoba.ca2002University of TübingenTübingenALLC/ACH 2002editorHaraldFuchsencoderSaraA.SchmidtThe Trésor de la Langue Française (TLF) corpus () was set up almost half a century ago. When one reads the description of
how this was done, the distance becomes evident. Professor Imbs quite openly
admits that the goal is to reflect "elite" usage of the French language;
texts were chosen after consultation of histories of literature, some of
which were quite dated even then (Imbs 1971, I, xv-xl). Considerations of
inclusiveness, of representativity, as discussed in Scholes (1992) or von
Hallberg (1984), do not seem to have concerned the committee which finalized
the corpus. One is entitled to wonder to what extent this corpus represents
the interests of scholars of French literature a half century later.PurposeIt is legitimate to evaluate the extent to which the texts included in the
TLF database do represent important trends in French literature, as judged
by what interested scholars at the time it was constituted, and as reflected
by what has interested scholars of the present.More specifically, it is possible to see whether the choices embodied in the
TLF reflect what scholars of the time judged important by comparing the
choices of texts in a given genre - the novel - to the number of lines
dedicated to the authors chosen for the TLF found in the Oxford Companion to French Literature (Harvey &
Heseltine 1959).Similarly, the MLA Bibliography () provides online
data showing the number of publications in the modern languages and
literatures for the periods 1963-90 and 1991 to the present. A comparison
between the number of publications mentioning a novelist found in this
bibliography and the number of texts by the same novelist in the TLF will
show the extent to which choices made by the TLF group have been confirmed
by the interest of later scholars. Given the volume of data involved these
questions must be dealt with using statistics.DataA subset of the TLF database was chosen for analysis: novels published
between 1789 and 1954 (See Table 1). The name of the novelist (Author) and
the number of novel texts included in the database for each writer (Texts)
was recorded, along with the publication date of the text included in the
database (Pub Date). When more than one novel by a given author is in the
TLF Pub Date records the date of the earliest one published. In cases where
authors were better known for other genres rather than prose fiction, they
were removed from the test data, because they would be a source of
ambiguity.These numbers were compared to three series of test data. The column OxC in
Table 1 records the number of lines devoted to the novelist and to the
included novels by that author which are found in the Oxford Companion to French Literature (Harvey &
Heseltine 1959), a volume contemporary with the formation of the TLF
database. Columns MLA 1 and MLA 2 record the number of articles mentioning
the novelist or work(s) found in the MLA online bibliography of learned
articles dealing with language and literature. MLA 1 covers the period
1963-1990 and MLA 2, 1991-2000.For analysis the entire set of 128 frequencies concerning novels was used.
Subsequently subsets of roughly equal numbers of authors were generated,
covering the periods 1789-1859 (33), 1860-1907 (35), 1908-23 (25), and
1925-54 (35).AuthorPub DateTextsOxCMLA 1MLA 2Abellio19461090About185721410Adam190212514Alain-Fournier1913193294Ambriere19461010Aragon1936125445305Arland192910374Ayme193317389Baillon19271036Balzac1824165771986781Barbusse19161165213Barres18885879372MethodA glance at the frequencies of the texts recorded for individual authors
shows a large number of authors with one text, and a very small number of
authors with ten or more, a distribution pattern quite familiar to people
who work with word frequencies in natural languages. These data do not form
the familiar bell-shaped curve typical of the Gaussian or normal
distribution.Since the data are not normally distributed, Pearson's product-moment
correlation analysis cannot legitimately be used on them. Similarly these
data would produce a very high proportion of predicted values smaller than 5
in a contingency table for a chi-squared analysis, so this method cannot be
employed. The usual way of handling such a problem (grouping the data) is
not appropriate, since it is the treatment of individual authors which is of
interest.Spearman's rank correlation analysis does not require normally distributed
data nor predicted frequencies greater than five; it has been chosen as the
primary analytic technique and applied in pairwise fashion to the data, and
to the four subsets of the data. At the same time, jackknifed outlier
analysis provided by JMP-IN (Sall & Lehman 1996) has been used to
identify authors whose distribution varies the most from the trends in the
data.ResultsTaken as a whole, the data show a high degree of correlation among the number
of texts in the TLF database, the number of lines in the Oxford Companion, and the two sets of MLA Bibliographic data
(See Table 2). There is no measurable probability that these correlations be
the result of chance alone.Table 2: Nonparametric Measure of AssociationVariable byVariableSpearman RhoProb>|Rho|OxCTexts0.5528<.0001 MLA_1Texts0.4475<.0001 MLA_1OxC0.6101 <.0001 MLA_2Texts0.4047<.0001 MLA_2OxC0.5918<.0001MLA_2MLA_10.9084 <.0001The data divided into four sections show a higher correlation in the earlier
period than in the later, and outliers in the earlier two periods tend to be
the greats of French literature, like Balzac, Stendhal and Zola, whereas in
the later periods they tend frequently to be novelists whose literary
fortunes are less obvious, like Simenon or Giono.ConclusionThe analysis carried out on the number of novel texts included in the TLF
database shows that the texts included tend to be about the same as what
might have been included if a different team of scholars had drawn it up in
the late 1950s. Similarly the works included do correspond - particularly
for the period up to 1908 - to what scholars of our day find sufficiently
interesting to be included in their published studies.It is thus reasonable to conclude that the TLF database is a valid
representation of important French literary texts for the period from 1789
to 1954. As more and more databases become commercially available, the
method presented here for validating the representativity of a database
using readily-available online bibliographical information would seem to
have a significance which goes beyond modern French literature.AcknowledgementsThe research reported here has been supported by the Social Sciences and
Humanities Research Council of Canada (SSHRCC) under grant number
410-98-1348.BibliographyPaulHarveyJ.E.HeseltineThe Oxford Companion to French LiteratureOxfordOxford UP1959PaulImbsLe Trésor de la Langue Française: Dictionnaire de la
langue du XIXe et du XXe siècle16 vols.ParisCNRS1971JohnSallAnnLehmanJMP Start StatisticsBelmont, Ca.SAS Institute1996RobertScholesCanonicity and TextualityJosephGibaldiIntroduction to Scholarship in Modern Languages and
LiteraturesNew YorkMLA1992Robertvon HallbergCanonsChicagoU of Chicago P.1984