Keywords
Corpus linguistic methodology, Corpus linguistics, Corpus tools, Digital humanities, Early modern English, English, English grammar, English language, European languages, Grammar, Grammatical theory and description, Historical and diachronic corpora, Historical GIS, Humanities computing, India, Language, Linguistics, Metaphor, Multilingual corpora, Quantitative linguistics, Semantics, South Asia, Statistics, Syntax
Research Areas
English Language and Linguistics

Dr Andrew Hardie
Lecturer
Affiliations
UCREL - University Centre for Computer Corpus Research on Language
My major specialism is corpus linguistics - specifically, the methodology of corpus linguistics, and how it can be applied to different areas of study in linguistics and beyond. I am currently working on applications of corpus methods in the social sciences and humanities. I am also very interested in the use of corpus-based methods to study languages other than English, especially the languages of Asia, with an especial focus on issues in descriptive and theoretical grammar.
PhD Supervision Interests
I am willing to consider PhD applications in areas coherent with my research interests. I am especially eager to supervise students in the following two areas:
- The development of new corpus-based methods, or the extension of existing methodologies;
- The application of these methods in different areas of the humanities and social sciences.
I am also interested to supervise projects that extend established corpus methods to "new" languages - non-European languages and minority languages in particular - especially with regard to topics in descriptive or theoretical grammar.
My departmental web-page contains an indicative list of topics studied by my current and previous PhD supervisees.
Professional Role
AS well as holding the position of Lecturer in Linguistics, I am Deputy Director of the ESRC Centre for Corpus Approaches to Social Science, a major research project running for five years from April 2013.
I am also currently the Chair of UCREL, the corpus research centre which beings together researchers from the Linguistics and Computing departments. (From 2005 to 2012 I was Project Development Officer.)
I am on the editorial board of Corpora, and was formerly on the boards of Glottometrics and the Journal of Quantitative Linguistics.
Research Interests
My primary research specialism is the corpus-based methodology and its applications. In particular, I am interested in a range of areas relating to corpus design and construction and corpus analysis methods and software tools, and how these may be applied to my own subject area (broadly: the grammar of English and other languages), to other fields of linguistics such as discourse analysis or language teaching, and to other disciplines in the humanities and social sciences.
Most of my current research work is focused on a series of projects in which corpus methods are adapted to the needs of social scientists and humanists, in a range of subject areas including Psychology, Geography, History and English Literature.
Areas that I have worked on (and published in) earlier in my career include:
- quantitative and collocational approaches to grammar in English and other languages;
- historical text-mining, with particular regard to the journalism of the Early Modern English period;
- part-of-speech tagging and the theory of morphosyntactic categories;
- keyness and frequency phenomena in texts;
- the languages and writing systems of South Asia;
- text and corpus encoding and processing (with particular reference to Unicode).
Languages that I have worked on or am currently interested in include:
- Arabic
- Assamese
- English
- Hindi-Urdu
- Nepali (see my Nepali Grammar Project)
- Russian
- Tibetan
A major part of my work involves software development to support the corpus methodologies listed above. I am one of the lead developers of Corpus Workbench, a powerful, open-source system for corpus indexing and querying. Furthermore, I created (and continue to develop) the CQPweb system as a user-friendly front-end to the Corpus Workbench.
As part of my work on the EMILLE corpus of South Asian languages, I created the Unicodify software. While working on part-of-speech tagging for South Asian languages including Urdu and Nepali, I developed the Unitag framework.
A list of my research publications is available on this website.
Current Teaching
I am currently attached full-time to a research project and therefore am not active in our general undergraduate and postgraduate teaching. I still supervise research students and teach on our postgraduate Summer School programmes.
I previously taught corpus linguistics, English grammar, grammatical theory, typology, language acquisition, psycholinguistics, and other topics at undergraduate and postgraduate level.
PhD Supervisions Completed
Here is a list of the topics that my current and former PhD students have worked on:
- The processing of learner speakers' collocational errors
- Using cluster analysis to study text typology in the British National Corpus
- Statistical analysis of closure in sublanguages
- Structural and ideological aspects of collocation in Modern Standard Arabic
- Valency-changing constructions in Javanese
- Sociolinguistics of swearing in Arabic
In Press
The history of corpus linguistics
McEnery, T. & Hardie, A. 2013 The Oxford Handbook of the History of Linguistics. Allan, K. (ed.). Oxford: Oxford University Press
Research output: Contribution in Book/Report/Proceedings › Chapter
2012
CQPweb - combining power, flexibility and usability in a corpus analysis tool
Hardie, A. 2012 In: International Journal of Corpus Linguistics. 17, 3, p. 380–409, 30 p.
Research output: Contribution to journal › Journal article
Prerequisites to a corpus-based analysis of EEBO-TCP
Baron, A. & Hardie, A. 09/2012
Research output: Contribution to conference › Abstract
Which 'Lancaster' do you mean? Disambiguation challenges in extracting place names for Spatial Humanities
Rayson, P., Baron, A. & Hardie, A. 09/2012
Research output: Contribution to conference › Abstract
Corpus linguistics: method, theory and practice.
McEnery, T. & Hardie, A. 2012 Cambridge: Cambridge University Press. 294 p.
Research output: Book/Report/Proceedings › Book
2011
Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium
Evert, S. & Hardie, A. 2011 Proceedings of the Corpus Linguistics 2011 conference. Birmingham: University of Birmingham
Research output: Contribution in Book/Report/Proceedings › Paper
Extending corpus annotation of Nepali: advances in tokenisation and lemmatisation
Hardie, A., Lohani, R. & Yadava, Y. 2011 In: Himalayan Linguistics. 10, 1, p. 151–165, 14 p.
Research output: Contribution to journal › Journal article
Visual GISting: bringing together corpus linguistics and Geographical Information Systems
Gregory, I. & Hardie, A. 2011 In: Literary and Linguistic Computing. 26, 3, p. 297-314, 18 p.
Research output: Contribution to journal › Journal article
2010
On two traditions in corpus linguistics, and what they have in common
Hardie, A. & McEnery, T. 2010 In: International Journal of Corpus Linguistics. 15, 3, p. 384-394, 11 p.
Research output: Contribution to journal › Journal article
Historical Text Mining and Corpus-Based Approaches to the Newsbooks of the Commonwealth
Hardie, A., McEnery, T. & Piao, S. 2010 The Dissemination of News and the Emergence of Contemporaneity in Early Modern Europe. Dooley, B. (ed.). Farnham: Ashgate Publishing Ltd, p. 251-286 36 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
2009
A morphosyntactic categorisation scheme for the automated analysis of Nepali
Hardie, A., Lohani, R. R., Regmi, B. N. & Yadava, Y. P. 2009 Annual Review of South Asian Languages and Linguistics 2009. Singh, R. (ed.). Berlin: Mouton de Gruyter, p. 171-196 26 p. (Trends in linguistics. Studies and monographs).
Research output: Contribution in Book/Report/Proceedings › Chapter (peer-reviewed)
Corpus linguistics and the languages of South Asia: some current research directions
Hardie, A. 2009 Contemporary corpus linguistics. Baker, P. (ed.). London: Continuum, p. 262-288 27 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
Corpus linguistics and historical contexts: text reuse and the expression of bias in early modern English journalism
Hardie, A. & McEnery, T. 2009 Corpora and discourse – and stuff: papers in honour of Karin Aijmer. Bowen, R., Mobärg, M. & Ohlander, S. (eds.). Göteborg: Acta Universitatis Gothoburgensis, p. 59-92 34 p. (Gothenburg Studies in English).
Research output: Contribution in Book/Report/Proceedings › Chapter
First language acquisition
Hardie, A. 2009 English language: description, variation and context /. Culpeper, J., Katamba, F., Kerswill, P., Wodak, R. & McEnery, T. (eds.). Basingstoke: Palgrave, p. 609-624 16 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
Collocational patterning in cross-linguistic perspective: adpositions in English, Nepali, and Russian
Hardie, A. & Mudraya, O. 2009 In: Arena Romanistica. 4, p. 138-149, 11 p.
Research output: Contribution to journal › Journal article
Empowerment and disempowerment in the Glencairn Uprising: A corpus-based critical analysis of Early Modern English news discourse.
Prentice, S. & Hardie, A. 2009 In: Journal of Historical Pragmatics. 10, 1, p. 23-55, 33 p.
Research output: Contribution to journal › Journal article
Freeing up digital content with text mining: New research means new licenses.
Dunning, A., Gregory, I. & Hardie, A. 07/2009 In: Serials. 22, 2, p. 166-173, 8 p.
Research output: Contribution to journal › Journal article
2008
The Child Language Survey
Pooley, N., Hardie, A., Rayson, P., Hoffmann, S., Alcock, K. & Cain, K. 01/2008
Research output: Contribution to conference › Poster
Construction and annotation of a corpus of contemporary Nepali
Yadava, Y., Hardie, A., Lohani, R., Regmi, B. N., Gurung, S., Gurung, A., McEnery, T., Allwood, J. & Hall, P. 2008 In: Corpora. 3, 2, p. 213-225, 13 p.
Research output: Contribution to journal › Journal article
A Collocation-based approach to Nepali postpositions
Hardie, A. 2008 In: Corpus linguistics and linguistic theory. 4 , 1, p. 19-62, 44 p.
Research output: Contribution to journal › Journal article
Using a semantic annotation tool for the analysis of metaphor in discourse.
Koller, V., Hardie, A., Rayson, P. & Semino, E. 2008 In: metaphorik.de. 15, p. 141-160, 20 p.
Research output: Contribution to journal › Journal article
2007
Collocational properties of adpositions in Nepali and English
Hardie, A. 2007 Proceedings of the Corpus Linguistics 2007 conference. Davies, M., Rayson, P., Hunston, S. & Danielsson, P. (eds.).
Research output: Contribution in Book/Report/Proceedings › Paper
Exploiting a Semantic Annotation Tool for Metaphor Analysis
Hardie, A., Koller, V., Rayson, P. & Semino, E. 2007
Research output: Contribution to conference › Conference paper
From legacy encodings to Unicode: the graphical and logical principles in the scripts of South Asia.
Hardie, A. 2007 In: Language Resources and Evaluation. 41, 1, p. 1-25, 25 p.
Research output: Contribution to journal › Journal article
Part-of-speech ratios in English corpora.
Hardie, A. 2007 In: International Journal of Corpus Linguistics. 12, 1, p. 55-81, 27 p.
Research output: Contribution to journal › Journal article
2006
Statistics.
Hardie, A. & McEnery, T. 2006 Encyclopaedia of Language and Linguistics. Brown, K. (ed.). Oxford: Elsevier, Vol. 12, p. 138-146 9 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
Corpus-building for South Asian languages
Hardie, A., Baker, P., McEnery, T. & Jayaram, B. D. 2006 Lesser-known languages in South Asia: Status and Policies, Case Studies and Applications of Information Technology. Saxene, A. & Borin, L. (eds.). Mouton de Gruyter, p. 211-242 32 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
A glossary of corpus linguistics.
Baker, P., Hardie, A. & McEnery, T. 2006 Edinburgh: Edinburgh University Press. 192 p.
Research output: Book/Report/Proceedings › Book
2005
Review of: Lars Borin (ed). 2002. Parallel corpora, parallel worlds. Selected papers from a symposium on parallel and comparable corpora at Uppsala University, Sweden, 22–23 April, 1999. Amsterdam: Rodopi
Hardie, A. 2005 In: Languages in Contrast. 5, 2, p. 291-296, 6 p.
Research output: Contribution to journal › Book/Film/Article review
A computer-assisted approach to the analysis of metaphor variation across genres.
Semino, E., Hardie, A., Koller, V. & Rayson, P. 2005 Corpus-based Approaches to Figurative Language.. Barnden, J., Lee, M., Littlemore, J., Moon, R., Philip, G. & Wallington, A. (eds.). Birmingham: University of Birmingham School of Computer Science, p. 145-153 9 p.
Research output: Contribution in Book/Report/Proceedings › Paper
Analiza morfologiczno-sk?adniowa korpusów (‘Part-of-speech tagging’):.
Hardie, A., Levin, E. & Pezik, P. 2005 odstawy j?zykoznawsta korpusowego (‘Foundations of corpus linguistics’). Lewandowska-Tomaszczyk, B. (ed.). ?ód?, Poland: Wydawnictwo Uniwersytetu ?ódzkiego, p. 75-94 20 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
Automated part-of-speech analysis of Urdu: conceptual and technical issues.
Hardie, A. 2005 Contemporary issues in Nepalese linguistics. Yadava, Y., Bhattarai, G., Lohani, R. R., Prasain, B. & Parajuli, K. (eds.). Kathmandu: Linguistic Society of Nepal, p. 49-72 24 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
2004
Developing Asian language corpora: standards and practice.
Xiao, R. Z., McEnery, A. M., Baker, J. P. & Hardie, A. 25/03/2004 8 p.
Research output: Contribution to conference › Conference paper
Corpus linguistics and South Asian languages: corpus creation and tool development.
Baker, P., Hardie, A., McEnery, T., Xiao, R. Z., Bontcheva, K., Cunningham, H., Gaizauskas, R., Hamza, O., Maynard, D., Tablan, V., Ursu, C., Jayaram, B. D. & Leisher, M. 11/2004 In: Literary and Linguistic Computing. 19, 4, p. 509-524, 16 p.
Research output: Contribution to journal › Journal article
2003
Corpus data for South Asian language processing.
Baker, P., Hardie, A., McEnery, T. & Jayaram, B. D. 2003 Proceeding of the EACL workshop on South Asian languages. Budapest, p. 1-8 8 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
Developing an automated semantic analysis system for early modern English.
Archer, D., McEnery, T., Rayson, P. & Hardie, A. 2003 Proceedings of the corpus linguistics 2003 conference. Archer, D., Rayson, P., Wilson, A. & McEnery, A. M. (eds.). Lancaster: Centre for Computer Corpus Research on Language Technical Papers, University of Lancaster, Vol. 16, p. 22-31 10 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
A corpus of seventeenth-century English news reportage: construction, encoding and applications.
Archer, D., Hardie, A., McEnery, T. & Piao, S. 2003 Proceedings of the Corpus Linguistics 2003 conference.. archer, D., rayson, P., wilson, A. & Mckenry, T. (eds.). 16 ed. Lancaster University: Department of Linguistics, (UCREL Technical Papers).
Research output: Contribution in Book/Report/Proceedings › Chapter
The ‘were’ subjunctive in British rural dialects: marrying corpus and questionnaire data.
Hardie, A. & McEnery, T. 05/2003 In: Computers and the Humanities. 37, 2, p. 205-228, 24 p.
Research output: Contribution to journal › Journal article
Constructing corpora of South Asian languages.
Baker, P., Hardie, A., McEnery, T. & Jayaram, B. D. 2003
Research output: Contribution to conference › Conference paper
Developing a tagset for automated part-of-speech tagging in Urdu.
Hardie, A. 2003
Research output: Contribution to conference › Conference paper
2002
EMILLE: a 67-million word corpus of Indic languages: data collection, mark-up and harmonization.
Baker, P., Hardie, A., McEnery, T., Cunningham, H. & Gaizauskas, R. 2002 Proceedings of LREC 2002. Lancaster: Lancaster University, p. 819-827 9 p.
Research output: Contribution in Book/Report/Proceedings › Chapter
2000
Swearing and abuse in modern British English
McEnery, T., Baker, P. & Hardie, A. 2000 PALC ’99: Practical Applications in Language Corpora: Papers from the International Conference at the University of Lódz, 15-18 April 1999. Lewandowska-Tomaszczyk, B. & Melia, P. (eds.). Frankfurt am Main: Peter Lang, p. 37-48 12 p.
Research output: Contribution in Book/Report/Proceedings › Paper
Assessing claims about language use with corpus data – swearing and abuse.
McEnery, A. M., Baker, J. P. & Hardie, A. 2000 Corpora galore. Kirk, J. M. (ed.). Amsterdam: Rodopi, p. 45-55 11 p. (Language and computers: studies in practical linguistics).
Research output: Contribution in Book/Report/Proceedings › Chapter
Proceedings of the 3rd discourse anaphora and reference resolution colloquium.
Baker, P., Hardie, A., McEnery, T. & Siewierska, A. 2000 Lancaster: University Centre for Computer Corpus Research on Language Technical Papers, 237 p.
Research output: Working paper
Metaphor in End of Life Care
01/09/2012 →
The primary aim of this project is to investigate the use of metaphor in the experience of end-of-life care in the UK. We will study the metaphors used by members of different stakeholder groups (pati ... Read more»FP7: Spatial Humanaties
01/10/2011 →
... Read more»CQPweb
01/08/2008 →
CQPweb is a web-based corpus analysis system which provides a user-friendly interface to the Corpus Workbench (CWB) system. This interface is compatible with any corpus, but is especially useful for l ... Read more»Corpus based grammer in contrast : The cross-linguistic distributional analysis of Naepali grammatical categories
01/10/2007 → 30/09/2009
... Read more»Variability in child language
01/04/2007 →
A feasibility and pilot study on the exploitation of the Child Language Survey This project is a feasibility and pilot study on the exploitation of the Child Language Survey. It is led by a cross- ... Read more»A digital resource for the study of spoken Nepali language; the bandhu collection
16/01/2006 →
... Read more»Using a semantic annotation tool for research on metaphor in discourse
01/12/2005 →
The project, which has gone through several stages since late 2005, represents an investigation into the computer-assisted analysis of metaphoric patterns across discourses and genres. Our overall aim ... Read more»
Corpus Linguistics 2013 @ Lancaster - Call for Papers
03/10/2012
The seventh international Corpus Linguistics conference (CL2013) will be held at Lancaster University from Tuesday 23rd July 2013 to Friday 26th July 2013. The main conference will be preceded by a wo ... Read more»
