An Assistant Tool for Verse-making in Basque-based on
Two-Level MorphologyBertolArrietaUniversity of the Basque Country, Spain XavierArregiUniversity of the Basque Country, Spain IñakiAlegriaUniversity of the Basque Country, Spain 2000University of GlasgowGlasgowALLC/ACH 2000editorJeanAndersonAmalChatterjeeChristianJ.KayMargaretScottencoderSaraA.SchmidtIntroductionIn this paper we present a specialised word generator, which aims to be an
assistant tool for Basque troubadours. Such a tool allows verse-writers to
generate all the words that match with a given word termination. We coped
with some interesting aspects, i.e. the dimensions of the generated list and
the need to establish an order of relevance among the listed items.This work can be seen as a way of re-using computational linguistic tools in
the context of the Basque cultural means of expression. The technical
foundations of this tool lie in a two-level morphological processor. The way
in which words must be generated (starting from the end of the word) leads
us to inverse the generation process."Bertsolaritza": What Is It?"Bertsolaritza" (the Basque term for verse-making) is an oral or written
literary form with old tradition and great popularity in the Basque Country.
Similar forms are manifested in other countries like Cuba.While the written mode is similar to poetry, the oral mode has a peculiarity:
troubadours sing verses without previously knowing the theme. In other
words, a theme is given to the singers and in a few seconds they have to
think of a set of verses adjusted to the theme. These verses must hold to
the formal conditions (measurement and rhyme) of the discipline.This verse-making task is quite difficult, so great expertise is required.
Because of that, some schools are devoted to teaching how to improvise this
type of verses. From our view, the tool we are presenting may be quite
useful in the verse-schools. For some decades, an oral verse-making
competition has been organised in the Basque Country every four years. The
high diffusion of this event (thousand of Basques follow this competition
with great interest, live or on TV) is a clear demonstration of the
importance of this discipline. From this background was formed the idea of
designing the tool here presented. We hope that such an application will be
a useful assistance-tool in the task of finding rhymes, namely for those
inexperienced troubadours.Reversing Of The Morphological DescriptionTo make this tool we have re-used a morphological analyzer/generator for
Basque developed few years before (Alegria et al., 96) and integrated
several tools such as spelling correctors and ICALL systems (Maritxalar et
al., 97). The morphological description is based on the Koskenniemi's
two-level morphology model (Koskenniemi, 83).The two-level system is based on two main components:A lexicon where the morphemes (lemmas and affixes) and the
possible links among them (morphotactics) are defined. The lexicon
is divided into different sublexicons and each lexicon entry
specifies its morphotactical information by means of a continuation
class which is a set of sublexicons. Combining sublexicons (nodes)
and continuation classes (arcs) the graph of morphotactics is
defined.A set of rules which controls the mapping between the lexical
level and the surface level (changes at surface level when morphemes
are linked) due to the morphophonological transformations
(morphophonemics).In order to get our inverted morphological analyser/generator for Basque we
needed to reverse this morphological description. The goal is to build an
inverted morphological generator for Basque, which will control the order of
the proposals according to their suitability for being a rhyme. The inverted
morphological generator will obtain all the possible forms corresponding to
a known ending, instead of generating the possible forms corresponding to
the beginning. We took into account two choices to reverse the morphological description.1. The first one consists of manipulating the automata that is
created from the morphological description of the Basque. This
option initially looked good because we did not need to manipulate
the lexicon and the rules; we only manipulated the automata. But,
analysing this option in depth, we realised that our inverted
Deterministic Finite Automata (DFA) would actually become a
Non-Deterministic Finite Automata (NDFA) in an intermediate state of
the transformation process; and trying to re-convert the NDFA in a
DFA would cause a combining explosion.2. The second option consists of manipulating and reversing the
lexicon and the rules directly, before using the compilers
(Karttunen and Beesley, 92)(Karttunen, 93). This approach,
therefore, involves the implementation of the programs that invert
the lexicon, the morphotactics and the phonological rules
automatically.Considering the risks of the first choice, we decided to develop the second
method. This process was divided into three steps:1. Reversing the lexicon: This task
deals with the inversion of all the morphemes. The order of the
characters inside the morphemes is inverted. For instance 'big'
would be converted to 'gib'.2. Converting the continuation classes in
"backward classes": The basis of the morphotactics in the
two-level model is the continuation classes (Koskenniemi, 83). We
have programmed a script to convert the continuation classes in
"backward classes", so that we have a group of morphemes that can go
before an inverted morpheme. This looks easy, but it has some
problems. Lexicons containing final classes have to be defined as
root lexicons, and consequently the backward class of the original
root lexicons must be null.For example: Let ADJECT be
the continuation class in adjectives with two syllables or less.
Suppose that this class has a unique lexicon containing the
stems -er, and -est, and that the continuation class of these
stems is null. Once the conversion has been made, -er and -est
will be in the root lexicon and in their backward class will be
included the adjectives with two syllables or less.3. Reversing the rules: The rules
are expressed as
following:<correspondence>
<operator> <left context> _
<right context>To reverse the rules
only contexts have to be changed, interchanging between them and
reversing each one. The contexts are regular expressions and it is
necessary to distinguish between data (to be reversed) and regular
operators and reserved characters.For example, the
rule y:i <=> _ +: s #:; will be converted to y:i
<=> #: s +:_ ; Application To "Bertsolaritza": Finding Words That Rhyme With An
EndingOnce the inverted analyser/generator for Basque was developed, we tried to
reuse it in an application that got the rhymes based on the final part of a
word. We needed to invert the character sequence given by the user and then
launch the generation with our inverted morphological generator tool. The
output of the generation process - that is, all the words that match with
the given ending - must be inverted before showing them to the user in a
Tcl-Tk made screen. In this way our tool returns all the Basque words that
have the same final sequence of characters as the sequence given by the
user. So, the application finds all the words that rhyme with the
word-ending given by the user.In order to improve the usefulness of the application, we considered it
necessary to face the problem of the huge quantity of generated words that
match with the sequence given by the user. Two solutions were implemented: 1. Establishing a kind of categorisation or class-partition among
the morphemes, so that only one example (representative of the
class) is returned when all the elements of the class are suitable
to be shown. For instance, if the input is 'est', instead of
returning all the adjectives with the superlative form added (too
long!) big + est--> biggest
small + est --> smallest
thin + est --> thinnest
...the application will
return only one example and a short explanation:BIG+ est -->
biggest (ADJECTIVE + est)2. Returning words sorted in the order that verse- makers
appreciate more. The quality of the rhyme is better if the word is
not composed or declined. In the example above it would be better to
use rhymes like 'guest' than words declined like 'smallest'.Conclusions And Future ImprovementsBasque is a Pre-Indo-European language of unknown origin and quite different
from the surrounding European languages. The declension of the Basque
language has fourteen different forms for each singular, plural and
undefined form. All of these forms are added at the end of the words.
Besides, it is an agglutinative language which accepts morphemes being added
to other morphemes. These characteristics show us the relevance of the final
parts of the Basque words. That reason leads us to think that the inverted
morphological analyser/generator would be useful for different applications.
We have found an interesting use for such a generator in the world of the
"bertsolaritza". Given that final parts of words (rhymes) are very important
in verses, the inverted morphological analyser/generator can be an important
assistant tool for writing verses. Furthermore, an automatic method for
inverting the morphological description has been defined. Such a method can
be reused in any other language, always starting from a two-level
description.We are considering as future works, (i) returning words with assonance rhyme;
(ii) dealing with semantics in the selection module in order to improve the
order of presentation, and (iii) publishing the application as a web page.
AcknowledgementsWe would like to thank Xerox for letting us use their tools, and specially to
Lauri Karttunen.ReferencesI.AlegriaX.ArtolaK.SarasolaM.UrkiaAutomatic Morphological Analysis of BasqueLiterary and Linguistic Computing114193-2031996L.KarttunenK.R.BeesleyTwo-Level Rule CompilerXerox ISTL-NLTT-1992-2Xerox1992L.KarttunenFinite-State Lexicon CompilerXerox ISTL-NLTT-1993-04-02Xerox1993L.KarttunenConstructing Lexical TransducersProc. of COLING«941994406-411K.KoskenniemiTwo-level Morphology: A general Computational Model for
Word-Form Recognition and ProductionDissertationUniversity of Helsinki, Dept of General
Linguistics1983Publications No. 11M.Maritxalar A>Diaz de IlarrazaM.OronozFrom Psycholinguistic Modelling of Interlingua to a
Computational ModelProc. Of CONLL97 Workshop (ACL Conference)Madrid1997Lekuona et alBertsolaritzaJakinDonostia14 eta. 151980