DOC

In search of the critical lexical mass How 'general' is the GSL

By Shirley Rodriguez,2014-06-13 20:38
7 views 0
In search of the critical lexical mass How 'general' is the GSL

    In search of the critical lexical mass: How 'general' is the GSL?

    How 'academic' is the AWL?

    by

    Steve Neufeld & Ali Billuroğlu

    December 2005

    ABSTRACT

    The concept of vocabulary profiling texts as an aid to teaching and learning English is becoming more widespread due to the availability of computer-based tools. Two commonly used tools are RANGE, a PC-based vocabulary profiler for corpora developed under the auspices of Paul Nation, and the Compleat Lexical Tutor, a web-based suite of lexical analysis tools and resources developed by Tom Cobb, which includes a vocabulary profiler for individual texts. These two popular tools, which are free for anyone to use, rely on two main word lists in common use today: the General Service List (GSL) and the Academic Word List (AWL). Used in conjunction with each other, these two lists comprise between 85% and 90% of the actual words (tokens) in any academic text. However, close examination of vocabulary profiles created using three bands defined by splitting the GSL into the first thousand (K1) and the second thousand (K2) commonly used words, and adding on the AWL words as the third, shows that this breakdown does not yield a vocabulary profile that reflects a natural distribution of words based on common use. This article puts forward an argument that neither the GSL nor the AWL is genre-specific. Rather, their combined word members, redistributed to reflect the natural frequency of distribution of commonly used words, provide a vocabulary profile of a broad range of genres of written texts. It also reports additional research on the identification of contemporary words in common use, leading to the creation of a critical lexical mass of 2,716 word families that consistently provides 90% to 95% coverage of the tokens (not including proper nouns, acronyms or abbreviations) in academic corpora.

TABLE OF CONTENTS

    ABSTRACT ............................................................................................................................................. I 1. THE DEVELOPMENT OF LISTS OF COMMONLY USED WORDS ...................................... 1 1.1. WHICH WORDS TO FOCUS ON? ................................................................................................. 1 1.2. NAGGING DOUBTS ABOUT THE VALIDITY OF THE GSL .............................................................. 2 1.3. ENGELS K2 DEFICIENCY ......................................................................................................... 2 2. THE GSLIN NEED OF A FACE-LIFT OR MAJOR SURGERY? ......................................... 4 2.1. INCOMPLETE WORD FAMILIES .................................................................................................. 4

    2.1.1. American / British spelling ................................................................................................. 5

    2.1.1. Word forms ........................................................................................................................ 5

    2.1.2. Word families .................................................................................................................... 5

    2.1.3. Plural and singular forms .................................................................................................. 5 2.2. BASED ON THE ENGLISH OF THE 1920S AND 1930S ................................................................... 5

    2.2.1. The emergence and ascendancy of commonly used words ................................................... 5

    2.2.2. The decline and fall of some commonly used words ............................................................ 6 2.3. ADDING ON THE AWL ............................................................................................................. 6 3. ON THE ROAD TO A CRITICAL LEXICAL MASS ................................................................. 7 3.1. BASIS OF COMPARISON ............................................................................................................ 8 3.2. RANKING ................................................................................................................................ 9 3.3. RESULTING REDISTRIBUTION ................................................................................................... 9 4. COMPARISON OF GSL/AWL TO BNL ................................................................................... 10 4.1. NATURAL DISTRIBUTION OF A VOCABULARY PROFILE ............................................................. 10 4.2. THE ADDED VALUE OF THE BNL OVER OTHER LISTS ............................................................... 12

    4.2.1. Re-prioritized to address the Engels K2 deficiency and updated with contemporary lexis

    with easier staging for learners ...................................................................................................... 12

    4.2.2. The BNL has active, dynamic and dedicated web-based support and maintenance ............ 13

    4.2.3. Enhancements .................................................................................................................. 13 5. CONCLUSION ............................................................................................................................ 13 REFERENCES ..................................................................................................................................... 15 APPENDIX: GSL FACE-LIFT ........................................................................................................... 16 APPENDIX: REDISTRIBUTING AWL IN A UNIFIED CRITICAL LEXICAL MASS ................. 17 APPENDIX: COLOUR CODING OF VOCABULARY PROFILE IN WORD MACROS .............. 20

    - ii -

LIST OF FIGURES

    Figure 1. Distribution of commonly used words in different text types (p. 17). .............. 2 Figure 2. Graph of distribution of commonly used words in different text types (p. 17). 3 Figure 3. Graph of text coverage in the BNC based purely on frequency of

    words.(Chujo, K & Utiyama, M, 2005). ......................................................... 4

    Figure 4. Screenshot of the list comparer, with ranking column at the far right. ............. 9 Figure 5. Breakdown of component constituents of the BNL, illustrating how the BNL

    has emerged as a much better approximation of the natural vocabulary profile

    of English texts. ........................................................................................... 10 Figure 6. Tabulated data of text coverage of a 730,000 word academic corpus,

    contrasting the natural distribution of words given by BNL to the GS/AWL,

    highlighting the Engels K2 deficiency. ......................................................... 11 Figure 7. Line graph of text coverage of a 730,000 word academic corpus, contrasting

    the natural distribution of words given by BNL to the GS/AWL, highlighting

    the Engels K2 deficiency. ............................................................................. 12

    - iii -

1. The development of lists of commonly used words

    Debates over learning vocabulary in a foreign language are likely to remain with us as long as there are teachers, revolving around the core issues of what it means to know a word, which words to learn, how many and in what order. It has long been a goal of researchers, inspired by pioneers such as Ogden (1930), Palmer and Lorge (Chujo & Utiyama, 2005), to determine the core lexis required for proficiency in English. Some have been particularly interested in the size of vocabulary of native speakers of English, using this as a starting point in order to estimate the learning task that learners of English would face. Recent studies show that an average educated native speaker knows around 20,000 word families (Goulden, Nation and Read, 1990, as stated in Nation & Waring, 1997; Zechmeister, Chronis, Cull, D‘Anna and Healy, 1995, as stated in Nation, 2001). This daunting task might be something to consider if learners are in an English as a Second Language (ESL) context (Milton and Maera, 1995, as stated in Nation & Waring, 1997) where learners are continuously exposed to the language and pick up language incidentally besides explicit learning. For the learners of English as a foreign language, however, such a level of lexical competency is an extremely ambitious, if not simply unattainable, goal to aim for.

    1.1. Which words to focus on?

    Since there are so many words to learn but neither sufficient time nor the necessary conditions for foreign language learners to acquire minimal lexical competence, the key question is what words to select to teach in the first place. From the point of frequency, Nation (1990, 2001), and Nation & Waring (1997) state that the most frequent 2000 words in English (West, 1953) are the most useful, for knowing these would allow learners a good degree of comprehension (around 80%) of what they hear or read. Research by Liu Na and Nation (1982, as stated in Nation & Waring, 1997), on the other hand, showed that knowing the 2000 words only is not sufficient for overall comprehension, arguing at least 95 percent coverage is needed for a good comprehension of a text. Coxhead (2001) came up with a list of specialized vocabulary consisting of 570 word families most frequently occurring in academic texts. It is asserted that knowing these words in the AWL (Academic Words List), in addition to 2000 most frequent words (the GSL), would be a good basis for learning English for academic purposes (Nation, 2001). Nation also proposes that since these

    words of high frequency are clearly crucial, teachers and learners should place considerable emphasis on them especially when time is limited, such as the case of intensive pre-sessional university preparatory programmes in non-English speaking countries.

    Until the advent of the Internet, the practical application of word lists in everyday teaching remained outside the realm of ordinary teachers. However, there are now excellent software programs such as the web-based ―Vocabulary Profiler‖ in the

    "Compleat Lexical Tutor" (Cobb, 2005) and the PC-based "RANGE" (downloadable

    from http://www.vuw.ac.nz/lals/staff/paul-nation/nation.aspx) that can assist teachers

    in effortlessly calculating text coverage. These two software tools are perhaps the most accessible and easy-to-use applications for in-service English language teaching. The former is a simplified version of the latter, restricted to use on single texts rather than a corpus and also restricted to measuring vocabulary levels by comparing the word lists made from the targeted text with the General Service List (first 1,000-word

    - 1 -

    and second 1,000-word lists) and Academic Word List. RANGE comes with these lists in the form of three baseword files, but unlike ―Vocabulary Profiler‖ in RANGE users can also define and use their own word lists and are not restricted to using only three.

    1.2. Nagging doubts about the validity of the GSL

    Researchers have expressed doubts about the adequacy of the GSL because of its age and the relatively low coverage provided by the words not in the first 1000 words of the list (Engels, 1968). Engels was in particular critical of the limited vocabulary chosen by West (1953), and while he concurred that the first 1000 words of the GSL were good selections based on their high frequency and wide range, he was of the opinion that that the words beyond the first 1000 of the GSL could not be considered ―general service words" because the range and frequency of these words are too low to be included in the list. He further suggested that the lower frequency words in the GSL should be revisited. The results of a subsequent study, detailed by Hwang and Nation (1995) support Engels' suggestion to a degree (see also Chujo & Utiyama, 2005). However, until now, despite the growing concern that the GSL is in need of replacement to redress the embedded errors, in addition to its antiquity and written focus from the early part of the twentieth century, it is still widely used. However, when an add-on list like the AWL becomes the third band of a vocabulary profile, the oddities in vocabulary profiles are accentuated. This does indeed support Engels‘ original concern about the the GSL as a ‗general service list‘ and reaffirms his suggestion to go back to the drawing board and reexamine the issue of commonly used words from a new perspective. It also begs the question whether one can actually define a lexis specific to an ‗academic‘ genre, or at best simply provide an extended lexical scope of the GSL to cover a range of more formal or educated English that can be found in a wide range of genres, ‗academic‘ being one of them.

    1.3. Engels K2 Deficiency

    With Engels in mind, consider the following table from Nation (2001), Levels conversation fiction newspapers academic text

    1st 1000 84.3% 82.3% 75.6% 73.5%

    2nd 1000 6.0% 5.1% 4.7% 4.6%

    AWL 1.9% 1.7% 3.9% 8.5%

    Other 7.8% 10.9% 15.7% 13.3%

    Figure 1. Distribution of commonly used words in different text types (p. 17).

    Engels concern about the validity of the second 1,000 words in the GSL becomes much more apparent when dealing with more academic texts. Extrapolating Engels basic concern about the second thousand words leads one to ponder whether the decision to add on the AWL as a separate band beyond the K1 and K2 does indeed reflect a natural vocabulary profile and distribution for academic texts. As can be seen in the table above, academic texts often yield a vocabulary profile in which the AWL represents double or treble the percentage of total tokens than K2. This runs counter to the basis of research into the frequency distribution of words in texts, in which the coverage provided by each set of commonly used words is less than the previous. Obviously, the proportion of words to text coverage in the higher (less frequent)

    - 2 -

    profile bands will be much less than the lower (more frequent) profile bands. This basic principle does not, therefore, support the addition of a set of words like the AWL, which provides a much higher text coverge than the preceding band, i.e. K2, with only half the number of words

    95.00%

    90.00%

    85.00%

    conversationfictionnewspapersacademic text80.00%

    75.00%

    70.00%1st 10002nd 1000AWL

    Figure 2. Graph of distribution of commonly used words in different text types (p. 17). In Figure 2 above, the distribution curve for conversation and fiction follow the pattern of text coverage for the natural distribution of words in any text based on frequency alone (see Figure 3). However, the curve shows an unnatural inverse relationship for academic texts. This definitely indicates that the words in the list for ndthe 2 1,000 most commonly used words are not the naturally occurring common words in that range in academic texts. Even the first 1,000 words appear to be wanting in terms of text coverage, covering much less than the 80% one would expect of a truly representative set of commonly used words. Upon closer examination of the GSL this could be attributed to the omission of words in the GSL, such as television, video, etc., which betrays the legacy GSL owes to the English of the first half of the twentieth century and confirms the general consensus of opinion that the GSL may be in need of a face-lift or perhaps something more radical.

    - 3 -

    Figure 3. Graph of text coverage in the BNC based purely on frequency of words.(Chujo, K &

    Utiyama, M, 2005).

    2. The GSLin need of a face-lift or major surgery?

    The GSL has stood the test of time remarkably well. Although nearly 90 years have passed since the primary research that it is based on (Chujo & Utiyama, 2005), the GSL still provides over 80% token coverage of any written text, and upwards of 90% of spoken English.

    The GSL evolved over several decades before West‘s publication in 1953. Contrary to popular belief, the GSL is not a list based solely on frequency, but includes groups of words on a semantic basis (Nation & Waring, 2004; Dickins, J, n.d.). Today there is no version of the GSL in print; it only exists in virtual form via the Internet. Various versions float around the Internet, and attempts have been made to improve it (Bauman, 1995). However, for practical purposes, one of its most accessible formats exists in the Compleat Lexical Tutor web site (Cobb, 2005), where it can be viewed, downloaded or used for vocabulary profiling of texts. Of most interest to the average English teacher, the CLT site includes a host of free easy-to-use web-based tools for text analysis, using the GSL and AWL as the basis.

    The main tool for vocabulary profiling on the CLT site, the Vocabulary Profiler (http://www.lextutor.ca/vp/eng/), produces output in coloured formblue for K1 (the

    first 1,000 words of English), green for K2 (the second 1,000 words), yellow for AWL (academic words based on the Academic Word List, see Coxhead, 2005), and red for words that are not in any of the lists. Since this is the obvious tool of choice from most teachers, the following highlights of deficiencies in the GSL have been generated directly from this source.

    2.1. Incomplete word families

    The following examples have been generated directly from the Vocabulary Profiler at the CLT site, September 2005 (http://www.lextutor.ca/vp/eng/). These

    selected examples illustrate the nature of certain deficiencies in the GSL in terms of allowing for variations between American and British spelling, gaps in word families, inconsistency in plural forms, and inconsistencies in word forms. The words in red are not present in the GSL.

    - 4 -

    2.1.1. American / British spelling

    1. special specially specialist speciality specialities specialists specialize specializes

    specialization specialized specializations specialised specialise specialises

    specialisation specializations

    2. travel traveled traveler travelers traveling travelled traveller travellers travelling

    travels

    2.1.1. Word forms

    1. half halved halves

    2. rise rises rising rose risen

    3. length lengthening lengths lengthy lengthen lengthened

    4. pure purest purer purely purity impure impurity

    5. thirst thirsty

    6. tour tourism tourist touring toured tours

    7. wheel wheels wheeler wheeled wheeling

    2.1.2. Word families

    1. hope hoped hopeful hopeless hopelessly hopelessness hopes hoping hopefully

    2. mother mother-in-law mothers mom moms motherhood mum mummy mums

    3. present presence presented presenting presently presents presenter presenters

    presentation presentations

    4. record recorded recorder recording recordings records

    5. taste tasted tastes tasteless tasting tasty

    6. sweet sweeten sweetness sweetly sweets

    7. understand understanding understands understood understandable misunderstand

    misunderstanding misunderstandings misunderstood 8. view viewed viewing viewer viewers

    9. week weekday weekdays weekend weekends weekly weeks

    2.1.3. Plural and singular forms

    1. keep keeper keepers keeping keeps kept

    2. patient patients patiently patience

    3. strength strengthen strengthened strengthening strengthens strengths

    2.2. Based on the English of the 1920s and 1930s

    2.2.1. The emergence and ascendancy of commonly used

    words

    Many new words enter common use, but most have a limited life dictated by

    fashion or trends. However, some words related to major changes in technology or

    more permanent shifts in politics and lifestyle do become candidates for addition to a

    list of most commonly used words. Here are some examples of words that are not in

    the GSL (or AWL), but have gained currency and frequency of use since the pre-

    World War II period of primary data collection for the GSL.

    - 5 -

AIRCRAFT COPE OK

    AIRLINES COPED OKAY

    AIRPORT COPES PLASTIC

     AIRPORTS COPING PLASTICS

    AIRWAYS DATABASE PROTEST

    AWARD DATABASES PROTESTED

     AWARDED DRUG PROTESTING

     AWARDING DRUGGED PROTESTOR

     AWARDS DRUGGING PROTESTORS

    BANG DRUGS PROTESTS

     BANGED E-MAIL TELEVISION

     BANGING E-MAILS TELEVISIONS

     BANGS FUEL TELLY

    BATTERY FUELLED TV

     BATTERIES FUELLING TVS

    BUDGET FUELS URBAN

     BUDGETARY INTERVIEW URBANISATION

     BUDGETED INTERVIEWED URBANISED

     BUDGETING INTERVIEWER URBANIZATION

     BUDGETS INTERVIEWERS URBANIZED

    CAMPAIGN INTERVIEWING VICTIM

     CAMPAIGNED INTERVIEWS VICTIMISATION

     CAMPAIGNER JOURNALISM VICTIMISE

     CAMPAIGNERS JOURNALIST VICTIMISED

     CAMPAIGNING JOURNALISTS VICTIMISING

     CAMPAIGNS LAUNCH VICTIMIZATION

    CAREER LAUNCHED VICTIMIZE

     CAREERS LAUNCHES VICTIMIZED

    CASH LAUNCHING VICTIMIZING

     CASHED MAGAZINE VICTIMS

     CASHES MAGAZINES VIDEO

     CASHIER MESS VIDEOED

     CASHING MESSED VIDEOING

    CELL MESSES VIDEOS

     CELLS MESSINESS VIEWER

     CELLULAR MESSY VIEWERS

    2.2.2. The decline and fall of some commonly used words

    The compilation of the GSL by West in 1953 was a remarkable achievement, considering that by and large the word list is still as valid today as it was over fifty years ago. Our analysis indicated a few words that seem dubious in terms of currency, e.g. cultivator, shilling, oar, sow, beak, madden, scold, hurrah. In addition, there are a few odd terms that must have crept in as mistakes, e.g. advantaging, wheats, and wiseness.

    2.3. Adding on the AWL

    Coxhead (2001) produced a word list consisting of 570 headwords, based on a comprehensive study of frequency patterns in a wide range of academic texts. Prior to her study, a much larger University Word List (UWL) had been compiled, but Coxhead‘s word list has proved to be the more useful and popular. In her compilation of the AWL, Coxhead excluded any words that were in the GSL. As a corollary, any word not in the GSL was fair game as a candidate for the AWL. However, in reality, the subsequent words that Coxhead identified contained not only words which were definitely ‗academic‘ in nature within the context of her corpus of academic texts, but also others

    - 6 -

    which were, in effect, words in English that had become common since the compilation of the GSL, and words that do not exclusively exist as common words in an ‗academic‘

    genre.

    Coxhead acknowledged that there was considerable ‗range‘ in the AWL according to frequency of use, dividing the 570 head words in 10 sublists, and identifying word family members that were the most frequently occurring form of that family. The lists were packaged in lots of 60 word each, except for the final list, which ended up with only 30. Since word frequencies tend not to occur naturally in such even and convenient blocks, the relative differences in frequency between the ten sublists could be quite significant. Indeed, as seen in the previous table, the resulting 570 head words, when added on as the third band of a vocabulary profile of academic texts, actually provide almost double the text coverage as the preceding vocabulary band, K2, even though this has almost double the head words. This means that a good percentage of the AWL words are more frequently occuring than those in the preceding vocabulary band. Rather than adding the AWL on as a third profile band, it would make more sense to filter and evaluate these words within the context of a broader critical lexical mass for English for General and Academic Purposes.

    The common misconception of English teachers who have only a superficial knowledge of the lexical approach is that K1 words are ‗beginner‘ words, K2 words are ‗intermediate words‘ and AWL words are ‗advanced‘ words. This sort of division can be easily ‗institutionalized‘, as in the case of preparatory schools at universities in non-

    English speaking countries. The consequences of such a simplistic interpretation can be dire indeed, leading to the ‗packaging‘ of vocabulary according to convenient ‗sets‘ and the tendency not to recycle or revisit words introduced at lower levels. As a result, students can easily lose focus on the critical lexical massi.e. very common words are

    often treated as ‗beginner‘ or ‗elementary‘ words and not recycled, whereas in fact these words are the ones that really need to be explored in depth as they often have collocations, multiple meanings, and special meanings within an academic context. Eldridge (personal correspondence) suggests that ―the point about academic lexis, is it becomes academic through collocation and context, not because of any particular inherent properties of individual words, e.g. 'The research field can be divided into three distinct areas' is unmistakeably academic, but there is no genre bound item contained in it.‖ A further study of how teachers perceive readability, and the function of lexis in this regard, by Hancioğlu and Eldridge (forthcoming) discusses the theoretical basis and practical issues with regard to productive skills, especially writing.

    3. On the road to a critical lexical mass

    The central question in the article relates to the efficacy of the GSL as a ‗general‘ word list. As discussed above, there are deficiencies in the GSL. These have in turn had an effect on the validity of the AWL which was created on the basis of examining only those words not in the GSL within the target corpora of academic texts chosen by Coxhead (2001). Consequently, adding the AWL as a third profile band beyond K1 and K2 leads to rather pear-shaped vocabulary profiles. On what basis, though, could these deficiencies and discrepencies be detected and rectified?

    - 7 -

Report this document

For any questions or suggestions please email
cust-service@docsford.com