| |
CRG Timetable - Term 1: 11th October-13th December 2010
All meetings for Term 1 are in Meeting Room 1, IAS Building at 3 pm, unless otherwise stated.
wk 1. (11 October)
-Paul Rayson and Andrew Hardie (Lancaster Uiversity) Overview of Corpus Resources at Lancaster
wk 2. (18 October) - Geoff Leech (Lancaster University) Decline and disappearance: the Negative Side of Recent Change in English
wk 3. (25 October) Paul Baker (Lancaster University) Who Benefits? Muslims on benefits and the British press
wk 4. (1 November) LVLT & CRS Joint EventFaraday Andrew Hardie (Lancaster University)Extending a Corpus Analysis Tool to Support the Analysis of Field Data: CQPweb and Minority Languages of South Asia
wk 5. (8 November) Alistair Baron (Computing Department, Lancaster University)I didn't spel that wrong did I. Oops': Analysis and Standardisation of SMS Spelling variation.
wk 6. (15 November) READING WEEK
wk 7. (22 November) Stephen Pumfrey (Department of History, Lancaster University) Discourses of Liberty in Early Modern English: What happens when a historian discovers CQPweb and EEBO?
wk 8. (29 November) Faraday Building A036 Paul Rayson (School of Computing and Communications, Lancaster University) WMatrix and Log-likelihood Workshop
wk 9. (6 December) Tony McEnery (Lancaster University) Trust the Text - A Statement of the Obvious or a True Insight?
wk 10. (13 December) Rajab Alzaharani (Lancaster University)Argumentative Indicators and Argumentation Schemes in Modern Saudi Religious Discourse (MSRD): A Corpus-Based Study |
| |
wk 1 Monday 11 October 2010
Overview of Corpus Resources at Lancaster
Paul Rayson and Andrew Hardie
(Lancaster University)
Andrew Hardie and Paul Rayson will give a brief introduction to available corpus resources at Lancaster University
↑top |
| |
wk 2 Monday 18 October 2010
Decline and disappearance: the Negative Side of Recent Change in English
Geoffrey Leech
(Lancaster University)
In studying the history of a language, linguists tend to focus on innovation and increase of frequency as the positive side of change. For example, the process of grammaticalization is associated with increase of frequency - e.g. the increasing use of the progressive aspect in English. However, there is also the neglected, negative side of change: linguistic forms become less frequent, and eventually may disappear. In this talk, I am going to focus of the declining frequency of a number of different features of grammar and lexis in recent English - e.g. the preposition upon, the conjunction for, the passive voice, and wh-relative clauses - and ask why does decline happen?
I will refer to corpus linguistic findings from the Brown family of corpora - particularly the British members of the family (BLOB-1931, LOB, FLOB, Paul Baker's BE06, and the unfinished BLOB-1901 corpus).
↑top |
|
wk. 3 Monday 25 October 2010
Who Benefits? Muslims on benefits and the British press
Paul Baker
(Lancaster University)
This talk uses a 170 million word corpus of newspaper articles from the British press, written between 1998 and 2009 in order to focus on one aspect of the representation of Muslims. Karim (2006: 119-20) points to a primary stereotype of Muslims as having fabulous but undeserved wealth, while a pilot study found that a common type of story about Muslims involved cases of "extremists" living in Britain and claiming benefits from the state. In the talk I use corpus methods to examine which newspapers have helped to "set the agenda" regarding such stories, and examine how the focus of the stories has widened over time. I argue that although these stories only represent a tiny minority of Muslims, their strong emotional saliency coupled with their high frequency in some newspapers, can impact on readers' understandings of what it means to be a Muslim.
↑top |
|
wk 4. Monday 1 November 2010 - LVLT & CRS Joint Event
Extending a Corpus Analysis Tool to Support the Analysis of Field Data: CQPweb and Minority Languages of South Asia
Andrew Hardie
(Lancaster University)
The sub-disciplines of corpus linguistics on the one hand, and field linguistics (and typology) on the other, share a number of core concerns. Most notably, both are data-centric approaches to the study of language: collecting, annotating and analysing examples of language use is at the core of the methodology in both cases. But to date there has been relatively interaction between the two fields, although arguably several of the technologies developed within corpus linguistics could, with relatively minor modification, be usefully applied to the storage, dissemination and exploitation of field data.
In this presentation, I will use samples of data from a number of languages of north-east India and Nepal – including Bodo, Limbu, and Tamang – to give a practical illustration of how a single corpus analysis tool (a) can be extended and enhanced to handle field data optimally and (b) can be used to facilitate certain forms of analysis and to simplify some aspects of the process of data dissemination. The corpus tool in question is CQPweb, a graphical front-end to the Open Corpus Workbench (CWB). CQPweb was originally developed to support research and teaching in corpus linguistics at Lancaster University; however, recent work on the system has extended its capabilities with the aim of making it a useful tool for linguists with a range of interests.
I will explain how certain aspects of the CQPweb system – in particular, its visualisation of search results – have been amended to support work with field data, most notably to allow the rendering of the traditional three-line-example format within a concordance. I will also demonstrate the compatibility of CQPweb’s underlying data model with annotated field data – although the issue of whether the word or the morpheme is to be the base unit for analysis remains somewhat vexatious – and finally illustrate some steps towards automating the process of importing field data to the system.
↑top |
|
wk. 5
Monday 8 November 2010
'I didn't spel that wrong did I. Oops': Analysis and Standardisation of SMS Spelling variation
Alistair Baron
(Computing Department, Lancaster University)
Spelling variation, although present in all varieties of English, is particularly prevalent in SMS text messaging as well as, for example, historical varieties, child and non-native learning language and Computer-Mediated Communication (e.g. instant messaging). Researchers argue that the choices made regarding spelling variants in SMS are functional, principled and meaningful. They are principled in the sense that they follow orthographic principles of English and as such reflect and extend existing patterns of variation across historical and contemporary texts (Shortis 2007); and meaningful because they contribute to the performance of social identities (Tagg 2009). As yet, however, little attempt has been made to empirically validate SMS spelling patterns and verify the extent to which they mirror those in other texts.
Here we report on the use of VARD 2 (Baron and Rayson, 2009) to analyse and standardise the spelling variation in CorTxt (Tagg, 2009), a corpus of over 11,000 SMS messages collected in the UK between 2004 and 2007. A second tool, DICER, was also used to extract letter replacement rules that transform SMS spellings to their standard equivalents and build a detailed database of these rules and their frequencies. Through categorising the spelling variants and examining DICER’s analysis we build a picture of spelling trends in SMS for comparison with other text types.
Our aim, however, is not only to better understand the nature of SMS spelling, but to build a set of spelling rules which can be used to automatically standardise spelling in larger SMS corpora. Whilst VARD was developed to deal with spelling variation in Early Modern English, here we show that with rules from DICER and additional training, the latest version, VARD 2.3, can be used to accurately standardise SMS corpora.
References
Baron, A. and Rayson, P. (2009) Automatic standardisation of texts containing spelling variation: How much training data do you need? In proceedings of Corpus Linguistics 2009, University of Liverpool, UK, July 2009.
Shortis, T. (2007) ‘Revoicing Txt: spelling, vernacular orthography and unregimented writing’ in Posteguillo, S., M. J. Esteve and M. L. Gea (eds) The Texture of Internet: netlinguistics. Cambridge, Cambridge Scholar Press
Tagg, C. (2009) A Corpus Linguistics study of SMS text messaging. PhD thesis, University of Birmingham.
↑top |
|
wk. 6 Monday 15 November 2010
READING WEEK
|
|
wk. 7 Monday 22 November 2010
Discourses of Liberty in Early Modern English: What happens when a historian discovers CQPweb and EEBO?
Stephan Pumfrey
(Department of History, Lancaster University)
Lancaster University possesses a unique version of Early English Books Online (EEBO). Of EEBO’s 125,000 works, spanning the years 1473 to 1700, 25,000 exist as searchable text and these have been converted into a rudimentary corpus of over 1,000,000,000 words. The CQPWeb interface allows non-expert linguists like me to pose and answer new questions of great interest to historians. My talk will illustrate the striking conclusions made possible by simple analyses, taking as case studies the words/concepts of “liberty” and (if time permits) “experiment”. It is hoped that the session will generate further interaction between Lancaster’s historians and corpus linguists, with the historian developing research questions to be asked and the corpus linguist showing how they can be answered with rigour.
|
|
wk. 8 Monday 29 November 2010
WMatrix and Log-likelihood Workshop
Paul Rayson
(School of Computing and Communications, Lancaster University)
Today’s seminar will include an introduction to the Log-likelihood statistic used in corpus linguistics for key word analysis. There will be a hands-on basic level introduction to the Wmatrix software from 3-4pm. For those who can stay after 4pm, there will be a second hands-on tutorial covering other advanced features of Wmatrix such as extending the dictionaries, collocations and semantic collocations.
Sample data will be provided, but you can bring along your own data to analyse if you wish.
↑top |
|
wk. 9 Monday 6 December 2010
Trust the Text - A Statement of the Obvious or a True Insight?
Tony McEnery
(Lancaster University)
In this talk I will present some extracts from a new book I have co-authored with Andrew Hardie, Corpus Linguistics: Method, Theory and Practice. I will focus upon a major re-evaluation of the neo-Firthian school of linguistics presented in the book, with a particular accent upon the work of John Sinclair. In doing so, I will seek to assess the contribution of Sinclair’s ideas to linguistics, as well as considering to what extent the ideas he puts forward, most notably collocation, are operationalisable.
↑top |
|
wk. 10 Monday 13 December 2010
Argumentative Indicators and Argumentation Schemes in Modern Saudi Religious Discourse (MSRD): A Corpus-Based Study
Rajab Alzaharani
(Lancaster University)
I suggest a framework, along with some examples, to study argumentation schemes in a specialized topic-related corpus for modern Saudi religious discourse (CMSRD) utilizing some argumentative indicators. I combine corpus linguistics with argumentation analysis in order to quantitatively identify argumentative indicators and qualitatively analyse arguments that appear therein.
This paper responds to the following over-arching research questions: How is the corpus linguistic approach able to inform argumentation analysis? / Can argumentation schemes be identified by corpus tools? Then the following sub-question: 1- What are the argumentation schemes that can be indicated by the most frequent keywords in CMSRD?
↑top |
|
 |
 |
|