The Academic vs. Subject Corpus: Development of
Criteria for the Teaching of ESP According to Lexical Needs in Spanish
Polytechnic CoursesAlejandroCurado FuentesUniversidad de Extremadura, Spain 2000University of GlasgowGlasgowALLC/ACH 2000editorJeanAndersonAmalChatterjeeChristianJ.KayMargaretScottencoderSaraA.SchmidtThe aim of this paper is to offer the details gathered from the lexical
analysis of English texts read in Information Science related majors in
Spanish universities. In such a textual collection, lexical items are
arranged according to the notions of word frequency and range across text
types and genres, or within given subject fields and topics in the
Information science and technology disciplines. How strong these lexical
combinations are, based on their statistical M.I. (Mutual Information)
measurement, is also quite pertinent to our study. The degree of collocation
is thus assessed in the light of common coreness. That these patterns are
more or less consistent in our corpus is, indeed, a key characteristic to
value so that a reference with the total number of texts and running words
can be established. Finally, as the findings show that there exist
representative lexical items for a limited or reduced number of texts,
keywords must be explored. For the observation of results drawn according to
the three approaches mentioned - word frequency / range, collocations and
keywords - the focus is placed on both the text and the subject-matter. This
is essentially done to follow the priority of working with language and
content from the ESP (English for Specific Purposes) perspective. As a
consequence, a categorization is made regarding a specified kind of context
- e.g. text types.As in the case of genre, the environments of text and discourse are of prime
importance for the situation of lexical items in the scope of academic
linguistic competence. Text types are approached in relation to how text is
organized and reflects coherence and cohesion, while the second setting -
genre - registers the writer's inclination and intentionality to produce
discourse for a community (e.g. academic). There are two other parameters -
subject and topic - on which the distribution of the lexical items of our
corpus is based. In their case, a framework based on content is provided,
and the findings yield the core lexis according to thematic / conceptual
fields.a. Word frequency and range.In this first division, the most frequent text type words are provided
according to how recurrent they are across six sets of ten texts. These are
grouped as follows:1. Definitions.2. Descriptions.3. Classifications.4. Exemplifications.5. Discussions.6. Conclusions.The samples are taken randomly to represent the rhetorical functions and
sub-sections of genres with which the learner must cope and come to
grips.1. Only these six types are chosen due to the fact
that others, such as the discourse function signalling contrast, are
contained by Discussions on five occasions, whereas illustrations are
coped with in Exemplifications. In turn, the two sections of research
articles included - Discussions and Conclusions - are given priority
over Abstract, Introduction, Method and Results since these are already
selected for the compilation of Descriptions, Classifications,
Definitions and Exemplifications to a greater or lesser degree (see
distribution of the text type sub-corpus). The relevant
vocabulary analyzed from this perspective is classified as argumentative,
procedural and discourse/grammar items, examined in demarcated domains such
as distinctive subject fields and genres. 2. How certain genres
and types can be characterized by core or subject-core lexis is
described, among others, by Carter (1997, 1988).Our immediate concern thus lies in having all the interrelationships among
the subject fields represented visually in order to make the selection of
text samples accordingly.We offer figures which refer to the number of sources belonging to four
specified disciplines in Informatics-related majors - pointed out by
abbreviations (e.g. 'C.S.' stands for Computer Science and so forth).In addition, in our corpus, capital letters refer to the codes used for the
subjects/topics within disciplines as shown in the Appendices (Appendix 1).
As will be observed, in addition to all the labels A - F, each single
subject field is also represented individually by some texts (not shared
with other studies).There are more texts in the 'F' category, which the four scientific areas
share - Computer Science, Information Science, Audio-visual Communication
and Optical/Wave communication. In contrast, the subject 'Communication
Theory', included in the Audio-visual Communication, Information Science and
Telecommunication programs of studies, is formed by only four sources. In
turn, Audio-visual Communication is the discipline with the smallest amount
of readings involved - only one text for each genre. 3. The
correspondence of number of texts and disciplines obeys the aim of
assembling core language and subject matter: the lower the measure of
samples, the more subject-specific the texts tend to be.The selection of the texts is made by having as yardsticks the overall
distribution and length of these in the corpus. As a result, if there are up
to five descriptions (out of 10 possible ones) included in the 'F' or 'All
disciplines' category, this is due to the fact that these passages are quite
common in these readings. In addition, these five samples are not as long as
other types, such as definitions in this division. Finally, that the balance
in relation to the entire corpus be kept is, as has been pointed out, a
chief consideration.So that the text type findings based on frequency and range are also framed
with the detailed knowledge of the subjects and topics comprised, the
distribution of text type sources within each sub-category or label must be
provided.The maximum number of texts encompassed is three - e.g. in the case of
Descriptions on the topic of 'Information infrastructure' (F6 category).
This distinction reflects both the larger amount of readings existent in the
corpus dealing with issues of this kind and the recurrence of this type of
rhetorical function employed in sub-division F6. In contrast, where no
samples are contained within a given sub-category, the reason is that the
model was either less developed or not included at all in the content of the
text (e.g. Conclusions in 'Perspectives on information' [F1], 'Media theory'
[D2], 'Media documentation' [C2], 'Automated Knowledge-based systems' [B3],
etc).A final comment must be made regarding the importance of keeping a balance
with the representation of three academic genres - textbooks, reports and
research articles - in the construction of the corpus. The intent of this
arrangement is to offer a weighed basis for text selection and analysis.
Since the end of such an organizing procedure is to provide adequate ground
for lexical sifting, this text type sub-corpus should incorporate as many
different language and content settings - i.e. contextual factors - as
possible. In this sense, even some text units as characteristic of one
single genre as sections of research articles - Discussions and Conclusions,
in this case - can be located in the other two genres (e.g. a discussion
appearing in a textbook on Communication Theory [E1] or a conclusion taken
from a report on Software Programming [A1], as figure 4 shows).The end results should thus be adequate and fitting for the design of both
written and oral lexical activities and tasks that reveal the importance of
academic and subject lexis, based on the analysis of common texts across
different disciplines.