In putting our heads above the parapet with the first general textbook on corpus linguistics we knew that we would collect the odd brickbat as well as the odd bouquet. Both have indeed been received. However, to take the heat out of any ensuing debate we had adopted the aforementioned policy of openness. This policy has failed in the case of this review, consequently we must write in response to elements of criticism we feel unfair. As for the fair comment, the review says nothing which has not been said, in an all together friendlier and more collegial fashion, by other corpus linguists and by interested linguists of the generative camp.
Stubbs complains that not all of the words that are in bold in the book are in the glossary. Bold is used for emphasis as well as denoting glossary terms. For this reason also, not all bold terms are in the index. There is no hidden agenda here.
Stubbs complains that we give no concordance program listings. While we do not list concordances, we do provide appendices which detail available concordancers and corpora for use with them. We must also note here that it was never our intention to cover the practical "how to do it" side of corpus analysis in our book. When we conceived the series, we envisaged a "foundation course" of three volumes. Ours was to be an overview and in particular an introduction to the background and achievements of the corpus-based approach; Geoff Barnbrook's volume was seen as covering the how-to-do-it of actually working with text on a computer (including concordancing); and Michael Oakes's recently published volume was intended as a detailed introduction to the statistics of corpus linguistics. Geoff Barnbrook has made an excellent job of discussing the practicalities of concordancing and it would have been pointless to repeat this information within the same series.
Stubbs criticises the student exercises in chapter 2 by claiming that "A quite unrealistic study question, asks students to 'compare two or more' corpus annotation systems (p. 60)". This is a very misleading representation of the question. The question asks the student to look at ONE scheme, and gives a reference to a book in which one may be found if the student does not have easy access to such a scheme. Additionally, at the book's web site (http://www.ling.lancs.ac.uk/staff/andrew/data.htm) we have two annotation schemes ready for downloading. Even if this were not the case, considering the increasing availability of annotation schemes this question hardly seems infeasible.
At the end of the question, we say "If possible, try to compare two or more systems". We do not see how the question we set marries up to that which Stubbs claims we set. Even so, we do not see that the question that Stubbs claims we set is any less realistic than the one we did set, especially since he does not bother to explain why he considers it unrealistic.
Stubbs complains that chapter three is too brief. This is an introductory textbook. We give references to the literature for those students interested in depth. We are trying to provide a fair degree of breadth within limited space.
Stubbs complains that the review of statistics for corpus linguistics given in the book is too brief. Page 67 of our book clearly states that we do not intend to produce a 'how to' guide to statistics in corpus linguistics. We note that this is the issue of a forthcoming book in the series (see above on our conception of the series). Rather our aim for this chapter is that students should be aware of what statistics are used and what people try to claim on the basis of them. "Our recommendation is that the student reads what we have to say here as a brief introduction to how the techniques can be used and then progress to the more detailed treatments in other texts for further explanation". We do not think that this is an unreasonable approach to take in an introductory overview, especially when references are given and a detailed treatment is forthcoming in the same series.
Furthermore, many of the techniques described can now be performed automatically in programs such as SPSS for Windows. Our view is that corpus linguists can and should make use of statistical tests but that they do not need to know the ins and outs of the mathematics, so long as they understand (a) when it is appropriate to use the tests, (b) what data are required, and (c) how to interpret the results. To take an analogy, a medical doctor is typically not trained how to perform the many laboratory analyses that he requests in the course of his work. But he is taught what to test for, what samples he needs to take, and how to interpret the results. This approach is no less valid with corpus statistics, especially since it will empower that not insignificant number of linguists who are inclined to glaze over at even the first paragraph of even the most non-technical guide to statistics.
Stubbs criticises the study question on page 85 by saying "A study question (p.85) asks students to choose an appropriate statistical test for different problems: the chapter does not give enough information to do this". Again the reviewer indulges in a rewording of the question which misleads - we actually say " Which of the statistical tests in this chapter would you use in the following situation". We are not asking the students to do anything other than consider the description of the statistical tests which we have used in the chapter. Given that the situations we then list have clear parallels in the studies covered in the chapter, we have set a simple comprehension test. We also list what we think (on the basis of the chapter) the answers should be and why.
Stubbs criticises the depth of the review of corpus based linguistics given in chapter four. The coverage here is, of necessity, brief, since we were aiming for breadth rather than depth. We state this quite clearly at beginning of the chapter and refer the student to various collections of papers for further examples. Although we are conscious of not being exhaustive, it seems to us that the main limitations of the chapter are not those which the reviewer suggests. We are much more keenly aware of having omitted the mass of work which is becoming available in the area of learner corpora and corpus based contrastive linguistics. In all honesty, we can say that when, as is planned, we produce a second edition of the book, we will devote space to a brief review of these as a matter of priority rather than anything suggested by the reviewer.
While in the grammar section we could have devoted some space to Cobuild, we thought it more important to devot e space to describing the work of the Nijmegen team, whose work (marrying rationalist and empirical approaches to linguistics) is very much in keeping with the spirit of the book and, we think, relatively under-reported. We are well aware of, and value, the work of Burro ws - and other scholars (e.g., Fortier 1989, 1991) - in literary computing and stylistics, but we chose to omi t that work here because it deals mainly with individual literary texts or oeuvres rather than with corpora in the sense that we use the term within our book (e.g., pp. 87 and 101). Also, literary detective work is a main theme of Statistics for Corpus Linguistics within our ser ies.
Stubbs complains of an absence of references to Cobuild and the work of Birmingham corpus linguists. The work o f the Birmingham corpus linguists is touched upon within the book. We appreciate their work and value it - inde ed the second book in our series was by a Birmingham corpus linguist, and Lancaster has co-operated fruitfully with corpus linguists in Birmingham in the past. However, we were writing the book from the point of view of a different tradition - the Lancastrian tradition of co rpus linguistics. Other reviewers have noted the bias and viewed it as understandable. We did not omit the work of Birmingham on grounds of spite or wilful neglect. The Birmingham team has its own story to tell, and we fel t that we should rightly leave them to tell it.
Stubbs criticises the off-putting tone of our introduction to chapter five, where we warn readers that our intr oduction to corpus based computational linguistics is brief and preliminary. It is curious that the reviewer pi cks up on the statements which we made explaining to readers how to approach chapter five when similar comments in chapter three were overlooked.
Stubbs makes a series of criticisms of chapter six:
Stubbs then proceeds to complain that we do not cover the empiricist/empirical distinction raised by Chomsky. W e do not acknowledge or use this distinction, and merely use empiricist as the nominal form of empirical. As our book focuses on the empirical side of Chomsky's distinction, and we saw no re ason to dwell on Chomsky's leading role in the defeat of behaviourism (which we do not view as being synonymous with early corpus linguistics) we did not introduce his definitions of empiricist and empirical.
Stubbs then argues that we miscast Chomsky as a realist. Regarding the question of whether Chomsky has adopted a realist or an instrumentalist position, it is clearly possible to argue either - as Chomsky has in fact appea red to do. Chomsky states clearly (1975) that he was always a realist, and that even when he made statements wh ich appeared instrumentalist he was at heart a realist. Rather than confuse the narrative by exploring Chomsky' s inconsistent position on this question, we decided to take him at his word: in his works "the realist positio n is taken for granted" (Chomsky, 1975:37) and "linguists must describe reality" (Chomsky, 1975:81). If we bas e our views of Chomsky's attitudes to realism/instrumentalism on a reading of 1965 only, then it is possible fo r a reviewer to come to the conclusion, as Stubbs wrongly does, that Chomsky could never be described as a real ist.
Stubbs wonders why we do not attack the validity of the competence/performance distinction. With respect to thi s point, we can see that for those who follow the ideas of Firth (who as we recall saw the distinction to be a false one), such work is essential. We are not followers of Firth, but we do argue that the gulf between I and E langauge has been exaggerated - we do not dodge the issue.
Stubbs goes on to suggest we should have examined Chomsky's three tier system of adequacy for grammars. In our initial draft of the book we did aim to cover this topic, but dropped it for two reasons. One reason that we w ere interested in the question was that it enabled us to examine the concept of grammaticality. We decided to d o this, however, by a reference and brief review of Aarts (1991) who covers grammar induction from corpora and concepts such as the grammaticality and acceptability of corpus sentences. The second major reason we wanted to avoid the Chomskyan three fold adeq uacy argument was that it is "confused and unrealistic" (Seuren, 1998:256). To have brought it in openly wouldhave required a lengthy discussion which, given that we had the reference to Aarts, was largely unnecessary for our purpose.
Stubbs finishes with a complaint that we describe corpus linguistics as a methodology rather than giving it som e theoretical status. This comment alone shows the clear difference in orientation between the reviewer and the authors. We maintain that corpus linguistics is a methodology. While the fact that it clearly has an impact up on linguistic theory and has been buffeted by theoretical debate is beyond doubt, we do not view it as being in eluctably bound to a particular theory of language, though some will rely on it more or less than others.
Chomsky, N. (1975). The Logical Structure of Linguistic Theory, Plenum Press, New York & London.
Church, K., Gale, W., Hanks, P. and Hindle, D. (1991). Using statistics in lexical analysis. In: U. Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. New Jersey: Lawrence Erlbaum Associates, pp. 115-164.
Daille, B. (1995). Combined Approach for Terminology Extraction: Lexical Statistics and Linguistic Filtering, UCREL Technical Papers No. 5., UCREL, Lancaster University.
Fortier, P.A. (1989). Some statistics of themes in the French novel. Computers and the Humanities 23: 293-99.
Fortier, P.A. (1991). Theory, methods and applications: some examples in French literature. Literary and Linguistic Computing 6(3): 192-96.
Seuren, P. (1998). A Brief History of Western Linguistics, Blackwell, London.