Binomials and the Computer: a Study in Corpus-Based Phraseology Ourania Hatzidaki University of Birmingham, UK 2000 University of Glasgow Glasgow ALLC/ACH 2000 editor Jean Anderson Amal Chatterjee Christian J. Kay Margaret Scott encoder Sara A. Schmidt Computational / Corpus Linguistics This paper presents the results of a large-scale corpus-based study (Hatzidaki 1999) of an important feature of English phraseology, namely binomial pairs (e.g. chalk and cheese, up and down, prim and proper, through and through, Laurel and Hardy). The purpose of the research was two-fold, firstly, to conduct, on the basis of a large and varied corpus of English textual data, a thorough and in-depth structural and functional analysis of a much-studied and yet not fully explored phraseological phenomenon; and secondly, to examine the hypothesis that the use of systematically collected samples of authentic language data results in more accurate and comprehensive descriptions of the form and function of linguistic phenomena than does the sole reliance on introspection. The analysis yielded an extensive and rigorous taxonomy of the various structural variations of binomials, as well as significant new information on their function in the communicative process. Binomials, namely sequences of "two or more words or phrases belonging to the same grammatical category, having some semantic relationship and joined by some syntactic device such as 'and' or 'or'" (Bhatia 1994:143), have long been objects of interest for idiomatologists and stylisticians. The several existing studies of this phenomenon have mainly focussed on its marked occurrence in the works of certain literary authors such as Chaucer, Lydgate, Shakespeare, Swift, Shaw, etc. (see, respectively, Héraucourt 1939 and Potter 1972, Tilgner 1936, Nash 1958 and Gerritsen 1958, Milic 1967, and Ohmann 1962), as well as in English legal texts; the semantic and syntactic characteristics and idiosyncrasies of the various paired forms, especially the semantic relationship between the linked members of a binomial (synonymy, able and talented; antonymy, boys and girls; complementarity, bow and arrow; Malkiel 1959), or the notion of irreversibility, i.e. the tendency of binomials to occur in only one sequence, as in here and there and not *there and here, and the possible causes of this phenomenon (e.g. 'proximal before distal'; Cooper and Ross 1975); and the incidence of binomials in languages other than English (e.g. Fix 1985 for German; Abraham 1950 for French and Italian; Malkiel 1959 for Russian, Portuguese, Spanish, Ancient Greek and Latin; Gold 1991 for Yiddish; Koch 1983 for Arabic; Szpyra 1983 for Polish; etc.). As opposed to literary studies, where binomials are treated as a flexible and interesting stylistic device which serves as a powerful means of expressing the authors' ideology and worldview, most studies of the occurrence of this feature in general language implicitly or explicitly regard binomials as a small and probably finite set of structurally and semantically idiosyncratic forms. Moreover, although many studies of the formal characteristics of binomials are available, there exists no comprehensive account of the full structural variability of the binomial pairs used by the average speaker, no detailed information on the distribution of the different patterns, and no organized taxonomy of forms. Finally, with the notable exception of studies of binomials as a distinctive feature of the language of the law which fulfils the requirements of legal draftsmanship for precision, clarity, unambiguity and all-inclusiveness (Mellinkoff 1963, Gustafsson 1984, Bhatia 1994), minimal attention has been given to the functions of binomials in non-literary language. Crucially, with very few exceptions (notably Gustafsson 1975), previous treatises on binomials have been intuition-based. A glance at a general corpus, however, instantly reveals a number of new and interesting facts concerning this feature. Firstly, numerous paired forms emerge, which appear to have been modelled on an abstract dualistic structure of the A + link + B type, very few of which, however, represent familiar, idiomatic locutions such as the oft-quoted rough and ready and out and out: the majority of the couplets appearing in corpus data constitute novel sequences such as calm and united, gently and effectively, inflation and unemployment, etc., whose formation seems to be governed by the specific lexicogrammatical, discoursal and pragmatic rules pertaining to the production of the texts in which they are encountered. Secondly, although couplets are extremely varied in their structural details, they all seem to fall into a set of identifiable lexicogrammatical patterns. And thirdly, the occurrence of the various dualistic patterns in textual sources with different situational characteristics demonstrates substantial distributional fluctuations. The above facts indicate that, in order to effectively account for the phenomenon of binomial pairing as it is observed in a corpus of texts, a new and more flexible data-driven framework needs to be devised. In the light of the data used in the present research, rather than a list of structurally and semantically peculiar couplets, binomials are analyzed as an abstract mechanism which speakers have at their disposal for the generation of a very wide range of paired types that serve a variety of important communicative purposes. As a theoretical model for the identification and extraction of binomials from the corpus and the classification of their various lexicogrammatical variants into a set of categories, we exploit the notion of phraseological frame or formal idiom, as posited and developed by Moon (1998:154f) and Fillmore, Kay & O'Connor (1988:505f). This, in very broad terms, represents an abstract structural formula which, as Fillmore et al. put it, 'serves as host' (ibid.:506) to institutionalized expressions as well as novel, spontaneously created forms. Binomials emerge as a major frame which can be represented by means of the general formula A link B. Our data analysis, which results in the construction of a detailed and comprehensive data-driven taxonomy of binomial patterns, involves, firstly, the identification and extraction of the various binomial forms from our corpus of textual data; secondly, the devising of a prototypical system of abstract representations to which each extant pair is assigned on the basis of its lexicogrammatical attributes; thirdly, the detailed recording of any interesting lexicosemantic preferences displayed by the patterns (for instance their semantic prosodies; Louw 1993 and Sinclair 1996); and, finally, the calculation of the frequency of occurrence of each pattern in the corpus. We also discuss in detail the important but rarely addressed issue of the function of binomials in the communicative process. Specifically, we examine the incidence of the various binomial patterns in each of the six subcorpora comprising our corpus (a set of written publications in book form, both fiction and non-fiction; a broadcasting medium; a semi-specialized periodical publication; two daily newspapers, a broadsheet and a tabloid; and a set of spontaneous and semi-spontaneous spoken texts), and seek explanations for the very substantial distributional perturbations. The main purpose of this exercise is to establish the nature and extent of the correlation between the form and structure of binomial patterns on the one side, and the extralinguistic and situational factors pertaining to each subcorpus on the other, and, thus, to determine the precise functions served by each binomial pattern in communication. Our data strongly suggest that binomials constitute a phraseological device which makes a highly significant contribution to the communicative process. Our analysis demonstrates that, depending on their structure as well as the type of text in which they are encountered, binomials serve a wide range of communicative functions. For instance, it is shown that the abundant use of informationally dense binomials (e.g. government and parliament, political and monetary, commercial and investment banks) on the part of journalists serves most effectively the institutional requirements of the mass media for factuality, informativeness, precision, conciseness and stylistic uniformity (Crystal & Davy 1969, Tuchman 1978, van Dijk 1988, and elsewhere), whilst simultaneously disguising the highly fragmented process of production of news texts (Bell 1991). On the other hand, the frequent employment of repetitive, vague or informationally sparse pairs in conversation (ages and ages, here and there, try and get) reflects the efforts of conversationalists in the face of the exigencies of real-time communication. In the context of unplanned talk, binomials act as a lexicalized and, therefore, elegant and well-integrated temporal space which speakers create automatically and with the minimum of cognitive effort whilst coping with delays in the formulation of thought and argument. Binomials in extemporaneous conversation act as a crucial discourse-cohesive device, which helps keep speech 'glued together' (Johnstone 1987), whilst minimizing the effect of fragmentation (Chafe 1982) created by phenomena such as false starts, random repetition (Norrick 1987), etc. At the same time, binomials may be used by speakers as a means of expressing emphasis and emotional involvement and of creating rhetorical presence (e.g. faster and faster, ringing and ringing). On the whole, the corpus-based structural and situational analysis of binomials not only offers new and significant information on a well-known linguistic phenomenon, it also offers substantial empirical support for the hypothesis that phraseology plays a major part in the accomplishment of the communicative goals of speakers or writers (for a review of relevant studies, see Hatzidaki 1999). Bibliography R. D.Abraham Fixed Order of Coordinates Modern Language Journal 34 276-287 1950 A. Bell The Language of News Media Oxford Blackwell 1996 V. Bhatia Cognitive structuring in legislative provisions J. Gibbons Language and the Law London Longman 1994 W. L.Chafe Integration and Involvement in Speaking, Writing, and Oral Literature D. Tannen Spoken and Written Language New Jersey Ablex 1982 W. E.Cooper J. R.Ross World Order R. E. Grossman J. L. San T. J. Vance Papers from the Parasession on Functionalism Chicago Chicago Linguistic Society 1975 D. Crystal D. Davy Investigating English Style London Longmans 1969 C. J. Fillmore P. Kay M. C. O'Connor Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone Language 64 3 501-538 1988 U. Fix Wortpaare im heutigen Deutsch Sprachpflege 34 8 112-113 1985 J. Gerritsen More Paired Words in Othello English Studies 39 212-214 1958 D. L. Gold Reversible Binomials in Afrikaans, English, Esperanto, French, German, Hebrew, Italian, Judesmo, Latin, Lithuanian, Polish, Portuguese, Rumanian, Spanish and Yiddish Orbis 36 104-118 1991 M. Gustafsson Binomial Expressions in Present-day English Turku Turun Yliopisto 1975 M. Gustafsson The syntactic features of binomial expressions in legal English. Text 4 1-3 123-141 1984 O.Hatzidaki Part and Parcel: A Linguistic Analysis of Binomials and its Application to the Internal Characterization of Corpora Ph.D. Thesis University of Birmingham 1999 W. Héraucourt Die Wertwelt Chaucers Heidelberg Carl Winters 1939 B. Johnstone An Introduction Text 7 3 205-214 1987 B. J.Koch Arabic Lexical Couplets and the Evolution of Synonymy General Linguistics 23 1 51-61 1984 B. Louw Irony in the Text or Insincerity in the Writer - The Diagnostic Potential of Semantic Prosodie M. Baker G. Francis E. Tognini-Bonelli Text and Technology Amsterdam John Benjamins 1993 Y. Malkiel Studies in Irreversible Binomials Lingua 8 113-160 1959 D. Mellinkoff The Language of the Law Boston Little, Brown & Co, Boston 1963 L. T. Milic A Quantitative Approach to the Style of Jonathan Swift The Hague Mouton & Co 1967 R. Moon Fixed Expressions and Idioms in English Oxford Clarendon Press 1998 W. Nash Paired Words in Othello: Shakespeare's Use of a Stylistic Device English Studies 39 212-214 1958 N. R.Norrick Semantic Relations and Motivation in Idioms E. Wiegland G. Tschauder Perspektive: Textintern Vol. 1 Tübingen Niemeyer 1980 R. M. Ohmann Shaw: The Style and the Man Middletown Wesleyan University Press 1962 S. Potter Chaucer's Untransposable Binomials E. Ohmann V. Vaananen A. Kurvinen Studies Presented to Tauno F. Mustanoja on the occasion of his sixtieth birthday Helsinki Modern Language Society 1972 J. Sinclair The Search for Units of Meaning Textus 9 75-106 1996 E. Tilgner Die Aureate Terms als Stilelement bei Lydgate Germanische Studien 182 1936 G. Tuchman Making News New York The Free Press 1978 T. A. van Dijk News Analysis New Jersey Lawrence Erlbaum Associates 1988