"Grotefend", a tool for deciphering ancient syllabic
scriptsHeikkiS.SärkkäUniversity of Joensuu, School of Translation
Studies SARKKA@cc.joensuu.fi1996University of BergenBergen, NorwayALLC/ACH 1996editorAnneLindebjergEspenS.OreØysteinReigemencoderSaraA.Schmidtdecipheringsyllabic scriptsThe "Grotefend" program that will be introduced in this paper is intended to
facilitate the making of hypotheses concerning the decipherment of an unknown
syllabic script. In essence, the program consists of two modules, one that can
be used for assigning readings to the signs and the other for analysis of the
text. The two modules can be used independently of each other, that is, a text
can be analysed without making any assumptions about how any given sign should
be read. The program was written in Visual Basic at the Department of Computer
Science of the University of Joensuu by Tuomo Pusa and Kari Tanskanen.At the outset, the user establishes the range of signs used in a given text and
gives each sign a number. This converts the text into a sequence of numbers and,
as the case may be, gaps representing word separators. The use of numbers is a
purely practical device obviating the need for choosing a standard visual
representation for each sign. The strings of numbers and any word separators
where these can be recognized are then inputted as the data to the program. Once
this has been done, tentative readings can be given to the individual signs by
using an input table. Any time a reading is assigned to a given sign, the same
reading is automatically given to all the occurrences of that sign in the text.
Any reading can be changed afterwards without affecting the readings of other
signs.The following reports can be generated by the program: 1. Total frequency of basic signs2. Total frequency of basic signs that may occur word-initially3. Total frequency of basic signs that may occur word-finally4. Basic signs only occurring word-initially and their absolute
frequency5. Basic signs only occurring word-finally and their absolute
frequency 6. Repeated strings. This is a list of strings of 2-5 basic signs
occurring more than once in the text with their line numbers.7. Repeated strings found on the assumption that the text runs
boustrophedon. If repeated strings are found on lines X and Y (X and Y
being line numbers), there is a probability that the text was written
left to right and right to left on alternative lines if the difference
X-Y is an odd number. The higher the occurrence of repeated strings
under the said conditions and the longer they are, the higher the
probability that the text in fact does run boustrophedon.Apart from the above, the data generated can be used as a basis for generating
further, perhaps more interesting data such as the relative frequencies of
different signs, which, in turn, should be helpful for formulating hypotheses
about the genetic or structural kinship of the language concerned with languages
of known structural characteristics.A fundamental problem that has to be addressed in calculating relative sign
frequencies is the nature of the unknown system, that is, whether we are dealing
with a syllabary consisting of vowels and open syllables only like the system of
Linear B or the two kana syllabaries of Japanese as opposed to, say, the system
of Akkadian cuneiform which uses closed syllables as well in a seemingly
unsystematic way that allows the same word to be written in numerous different
ways. Given a long enough text, the number of different signs is likely to give
us a clue to the nature of the system used. Strings of signs regularly repeating
in the text would be indicative of a stable graphemic system while few
repetitions would lead one to expect a "variable key" system comparable to
cuneifroms that would be correspondingly more difficult to break in a way that
commands confidence.Worth studying is the question of how long a text should be in order for us to be
able to draw valid inferences about the language. Not unnaturally, that depends
on the type of inferences we would like to make. At the most basic level, the
decipherer has to make sure that the textual material s/he is looking at
consists of samples of the language. If we manage to determine the minimal
length of text that is needed for the identification of a given text as
representing a given language with a given degree of probability, we are in a
better position to collect a corpus of texts that are indeed written in the same
language. Any advance made in the decipherment on the basis of one text could
then be checked against other texts in the same language.Problems of decipherment are further compounded by the fact that the script tells
us very little about the phonology of the language unless we know the degree of
fidelity with which phonemic contrasts are reflected by the it. A case in point
here would be the differences between the older and the younger futhark in
Scandinavia.Even in the face of the above uncertainties concerning the fit between the
graphology and phonology of a language, there are certain features that are a
priori likely to prove more fertile than others. If it is a question of a
syllabic script, word-initial vowels are an obvious starting point. Even if the
vowel 'a' is the most frequent vowel across a range of languages, absolute
frequencies will vary depending on the historical phonology of the language
concerned.Depending on the language, identification of proper nouns may occasionally be
possible because of their greater length. The following regularity is suggested:
in a text consisting of otherwise shorter reoccurring units, a reoccurring unit
considerably longer than average is indicative of a proper noun. The rationale
behind this may be either that sentential strings like 'Marduk will help him'
are used as names or the fact that a longish repeated element consists of a noun
plus one or more epithets. On comparison with known languages from the same
area, identification of proper nouns in turn will give valuable clues to the
surrounding textual material both in terms of its semantic content and syntactic
function.The reports given by the program are only an aid to researchers that takes away
some of the tedious spadework necessary for successful decipherment. As such,
however, the program is a tool that should speed up the process by allowing
scholars to direct their creative efforts towards more demanding tasks.ReferencesYvesDuhouxThomasG.PalaimaJohnBennetProblems in DeciphermentLouvain-la-NeuvePeeters1989