Computational Methods for the Study of Multilingual Corpora Silvia Hansen University of the Saarland, Germany 2000 University of Glasgow Glasgow ALLC/ACH 2000 editor Jean Anderson Amal Chatterjee Christian J. Kay Margaret Scott encoder Sara A. Schmidt Corpus linguistics is becoming increasingly important for translation studies (cf. Baker, 1995; Granger, 1999). In the past, the application of corpus linguistic methods was limited to the applied branch of this discipline. In particular, they were used in the fields of terminology, translation aids (e.g., to develop translation memories or machine translation programs), translation criticism and translation training (to improve the final product with the help of corpus-based contrastive analysis and the study of translationese). Recently also, in the theoretical and descriptive branches of translation studies corpus linguistic methods have been introduced. In particular, one issue that is receiving more and more attention is the question about translation as a particular text type (Baker, 1996; Laviosa-Braithwaite, 1996; Teich, 1999; Hansen, 1999). In this paper, I present the analysis of a corpus of translated texts and its comparison with a corpus of originals produced in the target language in order to investigate the universal features of translations (cf. Baker, 1996). Furthermore, on the basis of the analysis of universal features, I analyse the source language texts in order to see what has happened during the translational process. Thus, my aim is to identify both the universal features of translation (comparing the translation corpus with the originals of the target language) and, on this basis, the translation procedures (comparing the translation corpus with the originals of the source language). In particular, I discuss the use of various standard corpus tools, such as concordance programs, aligners and taggers for the analysis of parallel and comparable corpora. But the use of these tools is limited: only parts-of-speech and grammatical categories can be analysed with the help of such tools. Thus, it is not possible to say anything about translation procedures, translation strategies or the translational process because the results gained through standard corpus tools are quantitative values (cf. Hansen & Teich, 1999). But we need qualitative data, i.e., a linguistic description of the phenomena which occur in the translations, to test hypotheses concerning the translational process and the universal features of translations. In order to use the information which is provided through the standard corpus tools and in order to carry out deeper investigations, we need tools which are able to analyse more abstract linguistic categories. For this reason, we use the tool TATOE () with which we annotate the corpus using Systemic Functional Linguistics (SFL; Halliday, 1978; Halliday, 1985). The systemic functional model, which allows the analysis of the relationships between the different linguistic levels (grammar, semantics, context), is used for various disciplines, e.g. for language teaching, for the area of functional stylistics, for grammatical text analysis, and for computational linguistics (in this discipline especially for automatic text generation (cf. Teich, 1995; Bateman, 1997), TATOE enables us to define systemic functional categories and, on this basis, to annotate the texts. These annotations make a systemic functional analysis of the parallel and comparable corpora possible, and thus a cross-linguistic description of the phenomena which occur in the texts. On this basis, hypotheses concerning the translational process and the universal features of translations can be tested and new ones can be generated. Literature M. Baker Corpora in translation studies: An overview and some suggestions for future research Target 7 2 223-243 1995 M. Baker Corpus-based translation studies: The challenges that lie ahead H. Somers Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager Amsterdam Benjamins 1996 175-186 J. Bateman KPML Development Environment: multilingual linguistic resource development and sentence generation. Deutsches Forschungszentrum Informationstechnik (GMD) Bonn (Birlinghoven) Institut für Integrierte Publikations und Informationssysteme (IPSI) 1997 S. Granger Proceedings of Symposium 'Contrastive Linguistics and Translation Studies. Empirical Approache', Louvain-la-Neuve, Belgien, February 1999 1999 M. A. K. Halliday Language as social semiotic London Edward Arnold 1978 M. A. K. Halliday An introduction to Functional Grammar. London Edward Arnold 1985 S.Hansen A Contrastive Analysis of Multilingual Corpora (English-German) Diploma Thesis University of the Saarland, Saarbrücken 1999 S. Hansen E. Teich Kontrastive Analyse von Übersetzungskorpora: ein funktionales Modell J. Gippert Sammelband der Jahrestagung der GLDV 99 Frankfurt a. Main 1999 311-322 S.Laviosa-Braithwaite The English Comparable Corpus (ECC): A Resource and a Methodology for the Empirical Study of Translation. PhD Thesis UMIST, Manchester 1996 E. Teich Towards a methodology for the construction of multilingual resources for multilingual generation Proceedings of the IJCAI workshop on multilingual generation, International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, August 1995 1995 136-148 E. Teich Towards a model for the description of cross-linguistic divergence and commonality in translation E. Steiner C. Yallop Beyond content: Exploring translation and multilingual text production Berlin Mouton de Gruyter 1999