Corpus Methods for Interlingual Machine
TranslationMichelleVanniGeorgetown University, U.S.
Dept. of Defensevanni@guvax.georgetown.edu 1996University of BergenBergen, NorwayALLC/ACH 1996editorAnneLindebjergEspenS.OreØysteinReigemencoderSaraA.Schmidtmachine translationknowledge basescorporaCorpus Methods for Interlingual Machine Translation (MT)That corpus analysis has become a fundamental element in the process of designing
natural language processing (NLP) systems is generally recognized:Efforts in the development of NLP and [information technology] are
converging on the recognition of the importance of some sort of
corpus-based research as part of the infrastructure for the development
of advanced language processing applications(Atkins, Clear and Ostler 1992:1).In order to be effective, NLP systems must handle not only those linguistic
structures occurring in text which are predictable from explanatory models but
also those which are idiosyncratic, occur less frequently, and whose meaning is
derived from convention rather than composition. Corpus studies provide evidence
of such usages. In the close examination of categories of linguistic phenomena,
they also offer insight into new generalities not considered by rational
theorists.There is a noteworthy congruence between findings in corpus analysis studies and
those in MT research regarding the actual coverage of theoretical models which
view syntax and semantics independently. In support of the suggestion that these
two levels are instead interdependent, Sinclair (1991) states that a certain
structure may only be appropriate for a particular sense of a word and that,
conversely, one word sense may have associated with it only a finite set of
common syntactic patterns. Lexical studies in support of interlingual MT make a
similar point. It has been recognized (Levin and Nirenburg 1991, 1993, 1994a,
1994b) that two levels of representation, one which indicates semantic
properties from which syntactic behavior can be predicted (B.Levin 1993) and one
which expresses meaning as a set of relationships to concepts as defined in a
structured model of a particular semantic domain (Goodman and Nirenburg 1992),
must exist in an interlingual MT lexicon in order adequately to account for the
meaning of conventional linguistic expressions which have come to be known as
constructions (Fillmore 1988, Goldberg 1994).While the MT research work uses cross-linguistic data to argue that neither of
the levels, alone, provides sufficient representation, monolingual data from
on-line corpora can be shown to support a similar conclusion, that models of
processing which have been developed from rational theories only account for a
small percentage of what actually occurs in language and that further research
on patterns of actual language use is required in order to derive effective
grammars which handle the majority of linguistic phenomena occurring in
text.In this paper, we use corpus methods to explore approaches to the analysis of
Italian verbs in related semantic fields and lexical variation associated with
three of a particular verb's morphological forms. Hypotheses regarding the
complementary argument structure of frequently occurring verbs in the domains of
sensation, cognition and emotion will be tested and variation among the
structures in which present, imperfect and preterit forms appear will be
observed for the changes in semantic interpretation with which they may be
associated. Based on preliminary findings, an interlingual structure will be
proposed to account for these domains and forms. ReferencesB.T.S.AtkinsJ.ClearN.OstlerCorpus design criteria.Language and Linguistic Computing711-161992C.FillmoreP.KayM.C.O'ConnorRegularity and idiomaticity in grammatical
constructions: the case of let aloneLanguage64501-381988A.GoldbergConstructions: A Construction Grammar Approach to
Argument StructureChicagoUniversity of Chicago Press1994K.GoodmanS.NirenburgKBMT-89: a Case Study in Knowledge-Based Machine
TranslationSan MateoMorgan Kaufmann1992B.LevinEnglish Verb Classes and Alternations: A Preliminary
InvestigationChicagoThe University of Chicago Press1993L.LevinS.NirenburgSemantics-driven and ontology-driven lexical
semanticsLexical Semantics and Knowledge Representation:
Proceedings of the First SIGLEXWorkshop, University of California at
Berkeley, June 19911991L.LevinS.NirenburgPrinciples and idiosyncracies in MT lexiconsWorking Notes of AAAI-93 Spring Symposium Series:
Building Lexicons for MT, Stanford University1993L.LevinS.NirenburgThe correct place of lexical semantics in interlingual
machine translationProceedings of COLING-941994aL.LevinS.NirenburgConstruction-based MT lexiconsA.ZampolliN.CalzolariM.PalmerCurrent Issues in Computational Linguistics: Studies in
Honor of Don WalkerNorwell, MAKluwer1994bJ.SinclairCorpus, Concordance, CollocationOxfordOxford University Press1991