Korean Analysis and Transfer in Multilingual Machine
Translation SystemSung-KwonChoiSystems Engineering Research Instituteskchoi@seri.re.krTae-WanKimSystems Engineering Research Institutetwkim@seri.re.krSoo-HyunLeeSystems Engineering Research Instituteshlee@seri.re.krDong-InParkSystems Engineering Research Institutedipark@seri.re.kr1997ACH/ALLC 1997editorthe secretarial staff in the Department of French Studies at Queen's UniversityGregLessardencoderSaraA.Schmidtmachine translationmultilingualismcommon grammatical knowledgeAbstractMultilingual machine translation means translation between more than two
languages. The existing multilingual machine translation systems can be
classified into the transfer-based and interlingual-based multilingual
machine translation. In the former the analysis and generation rules were
written each other differently, so that the commonness of the languages was
ignored and the whole memory space led to increase. The latter had the
difficulty in implementing the linguistic universal model available to many
languages. In order to get over the shortcomings of these existing
multilingual machine translation systems, this paper describes the
multilingual MT systems through the common rules which can accept the
commonness of languages and many languages can share.1 IntroductionThe analysis and generation rules in the existing transfer-based multilingual
machine translation systems (SYSTRAN, EUROTRA, METAL, LOGOS, GETA etc.) are
independent and different according to target languages.[Hutchins 1992] It
says that the existing multilingual machine translation systems don't
acknowledge the commonness of languages. For this reason the existing
multilingual machine translation systems have the form like the bundle of
bilingual MT systems and this leads to a result increasing the size of
system. There are the transfer-based multilingual machine translation
systems that use interlingual method for reducing the transfer processes
(CETA, SALAT, DLT, KANT etc.), however they have difficult problems to
complete the linguistic universal model[Lewis 1992]. From this point of view
this paper describes the new multilingual machine translation method by the
common rules and constraint rules to overcome the problem the existing
multilingual machine translation systems have. The common rules mean the
rules that are in common with more than two languages. It is the merits of
common rules that can reduce the memory space, augment the consistency of
grammatical information and standardize the information structure of lexicon
because the common rules are loaded into memory only once. They also have
another merit for MT. That is, new grammar modules can be created easily
through the combination of 'common' rules when we add a new language to the
existing system and translate it into the existing languages. The constraint
rules mean the rules controlling the linguistic characteristics of
individual languages. This paper consists of three parts: In the chapter 2
the construction of the whole system is introduced. The chapter 3 describes
the modules consisting of common rules, that is, the common grammatical
rules, the common lexicon information structure, the common structural
transfer rules, and the common information transfer rules. In the chapter 4
we explain the analysis and transfer of Korean through the parameterized
common rules and the constraint rules.2 System constructionThe Figure 1 shows the system construction of multilingual machine
translation by the common rules and constraint rules:Figure1: Construction of multilingual machine translation system
The middle field of Figure 1 means the common module. 'rn' is a file of
common rules consisting of the common module. These files of the common
rules are called by the grammar modules of the individual languages and
constitute the grammar rules of an individual language together with the
constraint rules for the language. For example, Korean, Japanese, English
and German in Figure 1 have in common a rule file r3, but Korean and
Japanese share more rule files r2 and r4 because they are more similar in
the language typology than English and German3 Common rulesIn this chapter I will show the construction of common rules. Common rules
for analysis consist of the common grammar rules and the common lexicon
information structure and those for transfer consist of the common
structural transfer rules and the common information transfer rules.3.1 Common grammatical rulesTo handle many languages in multilingual machine translation system,
common grammatical rules should explain linguistic phenomena of as many
countries as possible. For explanation of linguistic phenomena of
configurational language (e.g. English) as well as nonconfigurational
languages (e.g. Korean, Japanese, German) whose word order is relatively
free, we have made new grammar rules where X-bar syntactic
theory[Jackendoff 1977] and HPSG [Pollard 1994] were mixed. The new
grammar was made in binary structure except the coordination structure
which was made in triple structure.Table 1. Common grammar ruleshead-final-structurehead-first-structurehead-middle-structure1PRED => ARG PREDPRED => PRED ARGCOORD => ARG1 COORD ARG22MODED => MOD MODEDMODED => MODED MOD3FUNCT => ARG FUNCTFUNCT => FUNCT ARGThe common grammar rules of the table 1 are described in Appendix 1
according to the notation of the CAT2 machine translation system.3.2 Common lexicon information structureWe need to make the lexicon information structure in order to input,
manage and correct consistently the lexicon information of the
multilingual machine translation system. It is desirable to build not
monotonic, but multiple structure so that the information structure of
lexicon may represent the possible linguistic information and be moved
collectively. From this point of view I have selected the feature
structure as the multilingual lexicon information structure and made the
attributes be the same in many languages. Appendix 2 shows an example of
multilingual lexicon information structure.3.3 Common structural transfer rulesThere is also the part in the transfer process the many languages can
share. It is the compositional transfer that copies the node of the
source language to that of the target language if the analysis structure
of the former and the generation structure of the latter are the same.
We make use of the method deleting the functional words and then
transforming the syntactic nodes to the 'predicate-argument-modifier'
nodes in our multilingual machine translation system in order to
transfer compositionally the different structures between the languages.
We have recorded the noncompositional structural rules unusable to the
common structural transfer rules in the transfer lexicon because they
depend on the lexemes. The transfer rules have the priority order: the
noncompositional structural transfer rules are applied first to the
transfer process, second, the common structural transfer rules and last,
the lexical transfer rules in the lexicon. The following rule shows the
common structural transfer rule:(1) common_structural_transfer_rule = {}.[+node] <=>
{}.[+node].The rule (1) says that all compositional transfer trees, that is, '+node'
are transferred unvaryingly from the source language to the target
language.3.4 Common information transfer rulesSimplifying the transfer process in the multilingual machine translation
is also able to result from the separation of the structure from the
information. In the existing transfer-based machine translation systems
the structural transfer has included the information transfer. It has
brought out the duplication of the information and the increase of the
memory space. But the isolation between the structure and the
information results in excluding the shortcomings of the existing
machine translation systems. In this sense, the common information
transfer rules have the function to transfer the common information
available to many languages, that is, they are the rules that copy the
semantic informations from the source language to target language. The
semantic informations are produced by the mapping from form to its
meaning in the analysis. The following rules show the common information
transfer rules: (We use the notation of the CAT2 system.)(2) Common information transfer rulesLexical_semantic_transfer = {head:{ehead:{sem:SEM}}}.[*]
<=> {head:{ehead:{sem:SEM}}}.[*].Transfer_of_semantic_roles = {role:ROLE}.[*]
<=> {role:ROLE}.[*].The lexical semantic transfer says that the lexical semantic information
of the source language is copied to that of the target language on the
same node level and the reverse too ('<=>' means the
bidirection). The transfer of semantic roles shows the copy of the
information of the semantic role between the source language and the
target language.4 Korean analysis and transfer by constraint rulesThe grammar of individual languages consists of the universal rule and its
parameter [Chomsky 1981]. The language typology can be classified by the
parameter [Greenberg 1963]. There is an example of machine translation[Dorr
1993] that has used the univeral principle and its parameter. According to
the Greenberg's parameterized word order we can consider the Korean standard
word order as follows:(3) Standard Word Order of KoreanSOVNumber-NounDemonstrator-NounAdjective-Noun Possessive Pronoun-NounRelative clause-NounThis standard word order gives an individual language a clue for its
parameter. In the next section we will see the paramterized common
grammatical rules for Korean.4.1 Korean analysis by parameterized common rulesAccording to the Korean standard word order the head word must always
follow its argument or modifier. From this point of view we can select
the head-final common rules for Korean under the multilingual common
grammatical rules in the Figure 1. The head-final rules in Figure 1 and
Head Feature Principle percolating the information of lexical head into
that of its phrase are as follows: (the coordination structure of Korean
can be considered as part of the 'Argument-Functional word structure'. I
hold the coordination structure of Korean as the triple structure for
the efficient analysis.)Table 2. Parameterized common grammar rules for Koreanhead-final-structurehead-middle-structure1PRED => ARG PREDCOORD => ARG1 COORD ARG22MODED => MOD MODED3FUNCT => ARG FUNCT(4) Head_Feature_Principle =
{head:HEAD}.[{},{head:HEAD}].A Korean sentence that is analysed by the parameterized common grammar
rules and the HFP results in what follows:(5) cengpwunun saylowun kyeyhoykanul malyenhayessta.
government+SUBJ new plan+OBJ make+PAST+DECL
The government made a new plan.In (5) the fine line shows the application of 'FUNCT => ARG FUNCT', the
dotted line that of 'MODED => MOD MODED' and the thick line that of
'PRED => ARG PRED'.4.2 Korean analysis by grammatical constraint rulesWith analysing Korean in the machine translation, we must consider
specially the following [Oh 1994]:(6) Korean CharacteristicsPhonological peculiarity sonyen-i, sonye-ka
boy-SUBJ, girl-SUBJ boy, girlDouble objects kunun seoulul yehayngul hayessta.
He-SUBJ Seoul-OBJ trip-OBJ make-PAST-DECL He made a trip to
SeoulHonorifics kyoswunimkkeyse osipnita.
professor-SUBJ(HON) come-HON-DECL The professor
comes.These peculiarities of Korean can be explained by the constraint rules.
The table 3 shows the relation between common rules and their constraint
rules.Table 3. Common rules and constraint rulesKorean characteristicsCommon rulesConstraint rules Phonological peculiaritiesFUNCT => ARG FUNCTPhonological rule Double objectsPRED => ARG PREDArgument exchangeHonorificsHFPContext information- Phonological ruleAll morphemes contain their last phoneme that is subcategorized and
predicted by a functional word.example) sonyen{phon:con} i{phon:voc,frame:{arg1:{phon:con}}}
boy{phon:con} SUBJ{phon:voc,frame:{arg1:{phon:con}}}- Argument exchangeThe subcategorization structures of functional verb 'hata (= do/make)'
and those of predicate noun are exchanged for each other in the lexicon:Table 4. Lexicon of 'hata (do/make)'lexhataarg1ARG1arg2ARG2framecatnounarg3framearg1ARG1arg2ARG2example) kunun(arg1) seoulul(arg2) yehayngul(arg3(arg1,arg2)) ha(arg1,arg2,arg3)yessta.
He-SUBJ Seoul-OBJ Trip-OBJ make-PAST-DECL
He made a trip to Seoul.- Context informationThe context information of sentence subject agrees with that of verb
phrase.example) kyoswunimkkeyse(context:honor) osi(context:honor)pnita.
professor-SUBJ(HON) come-HON-DECL.
The professor comes.4.3 Transfer constraint rulesThe syntactic tree of Korean results in the semantic tree through tree
transformation. The semantic tree has the 'predicate-argument-modifier'
arrangement. HFP also is applied to nodes of the semantic tree. We are
transducing the Korean syntactic tree (5) to the following semantic tree
through the transformation rules.(7) cengpwunun saylowun kyeyhoykanul malyenhayessta.
government-SUBJ new plan-OBJ make-PAST-DECL.
The government made a new plan.The semantic tree becomes the input of transfer. All semantic trees that
can be transferred compositionally are transferred to target language by
the 'common structural transfer rules' and 'common information tranfer
rules'. There is, however, the compositional transfer that is not able
to apply to the common information transfer rules. The idiomatic
expressions with functional verbs 'hata(do/make)' or 'toyta(be done/be
made)' belong to this example. We delete 'hata' during transformation
from syntactic tree to semantic tree and copy the information of 'hata'
to the feature 'functional verb' of predicate noun, so that the
predicate of a sentence becomes the predicate noun during transformation
from syntactic tree to semantic tree and copy the information of 'hata'
to the feature 'functional verb' of predicate noun, so that the
predicate of a sentence becomes the predicate noun. But there is no
multilingual rule that can control the relation between the predicate
noun of source language and the predicate noun of target language or
between predicate noun of source language and verb or adjective of
target language. For this reason we need the rule constraining the
common transfer rule. Now we have the transfer constraint rules for the
common information transfer rules.(8) Constraint rule of predicate nounidiomatic expression vs idiomatic expressionLet copy the
information of Korean functional verb to that of functional verb of
target language, if the lexeme of target language has the functional
verb that is equalent to the Korean idiomatic expression with
'hata'.ex.) sanpolul hata => take a walk, einen
Sparziergang machen, sanpowo suru ilul hata => sikotowo
suruidiomatic expression vs verb or adjectiveLet copy the
information of Korean functional verb to that of the lexeme of
target language, if the lexeme of target language has no functional
verb that is equivalent to the Korean idiomatic expression with
'hata'.ex.) ilul hata => work, arbeiten 5 ConclusionIn this paper I have proposed a new philosophy of multilingual machine
translation that accepts the commonness of languages to reduce the memory
space of the multilingual machine translation system and to simplify the
transfer process. This philosophy is explained by the common rules for many
languages and the constraint rules for the individual languages. For
example, the analysis of Korean is explained by the parameterized common
rules and the constraint rules and the transfer from Korean to other target
languages is explained by the common structure transfer rules, the common
information transfer rules, and the transfer constraint rules. The following
table shows the size of the common and constraint rules used for the
analysis and transfer of Korean in the translation from 300 Korean sentences
to English or German.Syntactic AnalysisSemantic AnalysisTransferCommonConstraintCommonConstraintCommonConstraint955398433- Further workAlthough the multilingual machine translation by the common rules and the
constraint rules is performed reasonably well, reducing the analysis rules
and simplifying the transfer process, there are yet many problems to be
solved:Truncation of the number of the parse treesConflict between the old and the new lexical informationRecognizing the idiomatic expressions and collocationsDisambiguation of polysemyIn order to solve the problems we are testing the following methods: Usage of the probabilistic methodInformation processing by the multiple inheritanceImplementation of the compound unit recognizerUsage of the domainKil-LokOhKey-SunChoiSey-YoungParkKorean Language EngineeringTae-Young-Sa1994(in Korean)N.ChomskyLectures on Government and Binding. The Pisa LecturesStudies in Generative Grammar 9Dordrecht Holland & Cinnaminson U.S.A.Foris Publication1981J.H.GreenbergSome universals of grammar with particular reference to the order of meaningful elementsJosephH.GreenbergUniversals of Language2nd editionCambridge, MassachusettsThe M.I.T. Press1963B.J.DorrMachine Translation: A View from the LexiconCambridge, Massachusetts and London, EnglandMIT Press1993W.J.HutchinsH.L.SomersAn Introduction to Machine TranslationAcademic Press1992R.S.JackendoffX-bar Syntax: A Study of Phrase Structure.CambridgeMIT Press1977D.LewisComputers and TranslationChristopherButlerComputers and Written TextsBlackwell199275-114C.PollardI.SagHead-Driven Phrase Structure GrammarStudies in Comtemporary LinguisticsChicago & LondonThe University of Chicago Press1994R.SharpCAT2 Reference Manual Version 3.6IAI Working Papers N.27Saarbruecken, Germany1994EndnoteThis paper summarizes the experiment of the multilingual machine translation system CAT2 [Sharp 1994]. The CAT2 system is now working on a UNIX-workstation. Its programming language is PROLOG and it uses the 'constraint bottom-up chart' parser. We are now translating Korean into English as well as German and are testing the translation from Korean into French, Chinese, Russian, and Japanese as the target languages.Appendix 1. Multilingual common grammar rules written in CAT2 notation Appendix 2. Multilingual Lexicon Information Structure