LEXSTATS: A program for the statistical analysis of
word frequency distributionsHaraldBaayenUniversity of NijmeganMax Planck Institute for Psycholinguisticsbaayen@mpi.nlFionaJ.Tweedie Department of Statistics University of
Glasgowfiona@stats.gla.ac.uk1999University of VirginiaCharlottesville, VAACH/ALLC 1999editorencoderSaraA.SchmidtVarious computationally intensive statistical models are available for the
analysis of word frequency distributions (e.g., Carroll, 1967; Sichel 1975,
and Chitashvili and Baayen, 1993). These models provide linguists and
lexicographers with elegant means for obtaining sample-size invariant
characteristic textual measures, for extrapolating the development of the
vocabulary beyond sample sizes larger than the observed text size, and for
estimating the population vocabulary size.Thusfar, these models have not been used widely, which is not surprising
given the absence of software implementing these models. At the conference,
we will present the beta version of LEXSTATS, a user-friendly GUI interface
to a series of C programs that implement a wide range of word frequency
analyses. LEXSTATS and the underlying C code will become available as
freeware under the GNU software license.We will illustrate LEXSTATS by applying it to word frequency distributions of
various kinds of texts as well as to word frequency distributions of a range
of morphological categories.ReferencesJ.B.CarollOn Sampling from a Lognormal Model of Word Frequency
DistributionH.KuceraW.N.FrancisComputational Analysis of Present-Day American
EnglishProvidenceBrown University Press1967406-424R.J.ChitashviliR.H.BaayenWord Frequency DistributionsG.AltmannL.HreibicekQuantitative Text AnalysisTrierWissenschaftlicher Verlag Trier199354-135H.S.SichelOn a Distibution Law for Word FrequenciesJournal of the American Statistical Association70542-5471975