COLT on TACT A demonstration of the TACTweb software as
applied to the Bergen Corpus of London Teenage LanguageKristineHasundDept. of English, University of BergenGisleAndersenDept. of English, University of Bergengisle.andersen@eng.uib.no 1996University of BergenBergen, NorwayALLC/ACH 1996editorAnneLindebjergEspenS.OreØysteinReigemencoderSaraA.SchmidtTACTweb search programThe Bergen Corpus of London Teenage Language (COLT) is the first large English
Corpus focusing on the speech of teenagers. It was collected in 1993 and
consists of the spoken language of 13 to 17-year-old boys and girls from
different boroughs of London.The aim of the COLT project is to compile a 500.000 word corpus of spoken teenage
language, and make it available for students of English at the University of
Bergen, as well as for language researchers world-wide.This poster presents the use of TACTweb on the COLT corpus. TACTweb, which
connects the text-retrieval program TACT to the World Wide Web, enables the user
to search in a database of spoken conversations for the location of words, word
combinations and word formation patterns. In the COLT database, TACTweb is
applied to give the distribution of an item in relation to certain
non-linguistic variables.Searches in the corpus are made possible through the indexing of the texts in the
database. The COLT database has the following indices: 1. Reference number for each text file (eg. <REF>
B132401) 2. who= index for speaker identity (eg who=1)3. id= index for speaker turn number (eg id=1). This index is the same
as is used in the BNC (the British National Corpus)4. speaker's age (eg <AGE1> 14)5. speaker's gender (eg <GEN1> f)6. speaker's socioeconomic group (eg <SOC1> 2)7. speaker's occupation (eg <OCC1> student)8. location of conversation (eg <LOC> Hackney)9. setting of conversation (eg <SET> classroom)10. number of participants (eg <AUD> 5) Four different types of display systems are available for searches in the
corpus:KWIC - Key Words In ContextA KWIC display lists all the occurrences of a word with one line of context.
Here is an example that shows the occurrences of the word "Peter" see Figure
1.The number in parentheses in the top line shows the total number of
occurrences of "Peter" in the entire corpus. The numbers at the front of
each line give the reference number, and then the turn number where the word
can be found. The target word appears in the middle of the line. Clicking on
the target word shows the full text, which allows a closer study of each
occurrence.The KWIC display allows the user to quickly browse a large number of
occurrences to see how a particular word is used, or to search for a word
which has many occurrences.Figure 1Variable Context DisplayWhereas the KWIC display gives only one line of context, the Variable Context
Display allows the user to control the amount of context in which a word is
to be displayed. For example, one can ask for the word "Peter" to be
displayed in a context of 3 lines before and 3 lines after the
occurrence:DistributionThis display allows the user to search for the occurrence of a word as it is
distributed across the variables speaker identity, age, gender,
socio-economic group, location, setting, occupation, and number of
participants. Here is an example of how the word "shit" is distributed
according to ageWord ListThe Word List display gives a list of all the words that match a particular
pattern. For instance, it is possible to produce a list of all words ending
in a particular letter or sequence of letters. This is particularily useful
for a researcher who is interested in the productivity of certain morphemes,
such as -able: unavailable (1)unbelievable (4) uncomfortable (2) unfuckingtouchable (2) unreliable (1)unscrewable (1) unsociable (1)untouchable (1)up-gradable (2)vulnerable (4)The purpose of the poster presentation is to demonstrate these and other
facilities, focusing on TACTweb as a useful tool for the linguistic
researcher. Moreover, an overview of ongoing research will be given.