Dr Paul Baker

Paul

Reader in Corpus Based Discourse Studies

Degree: BSc (Preston), MSc (Lancaster), PhD (Lancaster)

Associated research centres and groups: Gender and Language Research Group, University Centre for Computer Corpus Research on Language (UCREL)


Current Teaching

I currently teach various modules in Corpus Linguistics at MA level (on four different schemes), have several PhD students and supervise third year UG dissertations.

Research Interests

My research interests include corpus linguistics, language and gender/sexual identities and critical discourse analysis. I am particularly interested in how corpus-based techniques can be used in order to carry out CDA.

Publications

Books

Baker, P., Gabrielatos, C. and McEnery A. (2012 forthcoming) Discourse Analysis and Media Bias: The representation of Islam in the British Press. Cambridge: Cambridge University Press.

Baker, P. and Ellece, S. (2011) Key Terms in Discourse Analysis. London: Continuum.

Baker, P. (2010) Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press. More information

Baker, P. (ed.) (2009) Contemporary Corpus Linguistics. London: Continuum. More information

Baker, P. (2008) Sexed Texts: Language, Gender and Sexuality. London: Exquinox. More information

Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum. More information

Baker, P., Hardie, A. & McEnery, A. (2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.

Baker, P. (2005) Public Discourses of Gay Men. London: Routledge. More information

Baker, P. & Stanley, J. (2003) Hello Sailor! Seafaring life for gay men: 1945-1990. London: Pearson. More information

Baker, P. (2002) Fantabulosa: A Dictionary of Polari and Gay Slang. London: Continuum. More information

Baker, P. (2002). Polari: The Lost Language of Gay Men. London: Routledge. More information

Journals

I am commissioning editor of the journal Corpora published by Edinburgh University Press.

I am on the editorial board for the Journal of English Linguistics and the Journal of Language and Sexuality.

Journal Articles

Baker, P. (2011) 'Times may change but we'll always have money: a corpus driven examination of vocabulary change in four diachronic corpora.' Journal of English Linguistics 39: 65-88.

Baker, P. (2010) 'Will Ms ever be as frequent as Mr? A corpus-based comparison of gendered terms across four diachronic corpora of British English.' Gender and Language 4.1: 125-129.

Chen, Y. and Baker, P. (2010) 'Lexical Bundles in L1 and L2 Academic Writing.' Language Learning and Technology. 14: 2 30-49.

Baker, P. (2010) 'Representations of Islam in British broadsheet and tabloid newspapers 1999-2005.' Language and Politics. 9:2 310-338.

Baker, P. (2009) 'The BE06 Corpus of British English and recent language change.' International Journal of Corpus Linguistics. 14:3 312-337.

Baker, P.,Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T and Wodak, R. (2008) 'A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press.' Discourse and Society 19(3): 273-306.

Gabrielatos, C. and Baker, P. (2008) 'Fleeing, sneaking, flooding: a corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press 1996-2005)' Journal of English Linguistics 36:1 pp. 5-38.

Baker, P. and McEnery, A. (2005) 'A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts.' Language and Politics 4:2 pp. 197-226(30).

Baker, P. Hardie, A. McEnery, A., Xiao, R., Bontcheva, K., Cunningham, H., Gaizauskas, R., Hamza, O., Maynard, D., Tablan, V., Ursu, C., Jayaram, B.D., and Leisher, M. (2004) 'Corpus linguistics and South Asian languages: Corpus creation and tool development', Literary and Linguistic Computing, Volume 19, Issue 4, pp 509-524.

Baker, P. (2004) 'Querying keywords: questions of difference, frequency and sense in keywords analysis.' Journal of English Linguistics. 32: 4 pp 346-359.

Baker, P. (2004) '"Unnatural acts"' Discourses of homosexuality within the House of Lords debates on gay male law reform Sociolinguistics 8:1 88-106.

Baker, P. (2002) 'Construction of Gay Identity via Polari in the Julian and Sandy Radio Sketches,' Lesbian and Gay Review, 3:3: pp 75-84.

Baker, P. (2001) 'Moral Panic and Alternative Identity Construction in Usenet'. Journal of Computer Mediated Communication 7:1.

Baker, P. Lie, M., McEnery, A. and Sebba, M. (2000) 'Building a Corpus of Spoken Sylheti', Literary and Linguistic Computing, Volume 15, Issue 4, pp 419-431.

McEnery, A., Wilson, A. and Baker, P.(2000) 'Language teaching: corpus based help for teaching grammar', Journada de Corpus Linguistics, Volume 6, pp 65-77.

McEnery, A. Baker, P. Gaizauskas, R. & Cunningham, H. (2000) 'EMILLE: towards a corpus of South Asian languages', British Computing Society Machine Translation Specialist Group, London, pp 11-1 - 11-9.

McEnery, A. Wilson, A.and Baker, P. (1997) 'Teaching Grammar Again after Twenty Years: Corpus based help for grammar teaching.' New Approaches to Grammar Teaching, RECALL Journal, Volume 9, Number 2, pp 8-17.

Baker, P., McEnery, A.and Wilson, A. (1995) 'A brief report on a statistical analysis of corpus-based versus traditional human-teaching methods of part-of-speech analysis', Language Testing Update, Issue 18, pp 59-62.

McEnery, A., Baker, P. and Wilson, A. (1995) 'A Statistical Analysis of Corpus Based Computer vs Traditional Human Teaching Methods of Part of Speech Analysis', Computer Assisted Language Learning, Volume 8, Number 2-3, pp 259-274.

Baker, P. (1994) 'Lithium Discontinuation - A meta-analysis.' Lithium.

Book Chapters

Baker, P. (2011) 'Discourse and Gender'. In K. Hyland and B. Partridge. The Continuum Companion to Discourse Analysis. London: Continuum, pp. 199-212.

Baker, P. (2010) 'Corpus Linguistics'. L. Litosseleti (ed) Research Methods in Linguistics. London: Continuum, pp. 93-113.

Baker, P. (2009) 'Issues in teaching corpus-based discourse analysis' In L. Lombardo (ed). Using Corpora to Learn about Language and Discourse. Peter Lang, pp. 73-98.

Baker, P. (2009) 'Introduction' In P. Baker (ed) Contemporary Approaches to Corpus Linguistics. London: Continnum, pp. 1-8.

Baker, P. (2009) 'Language and Sexuality'. In J. Culpeper, F. Katamba, P. Kerswill, R. Wodak and T. McEnery (eds) English Language and Linguistics. London: Palgrave, pp. 550-563.

Baker, P. (2008) 'Eligible' bachelors and 'frustrated' spinsters: corpus linguistics, gender and language. In J. Sunderland, K. Harrington and H. Stantson (eds) Gender and Language Research Methodologies. London: Palgrave.

McEnery, T. and Baker, P. (2003) 'Corpora, translation and multilingual computing' in F. Zannetin (ed.) Corpora in Translator Education, St. Jerome Press, Manchester.

Baker, P. (2002) 'No Fats, Femmes or Flamers: Changing Constructions of Identity and the Object of Desire in Gay Men's Magazines.' B. Benwell (ed.) Masculinity and Men's Lifestyle Magazines. Sociological Review.

McEnery, A., Baker, P. and Cheepen, C. (2001) 'Lexis, Indirectness and Politeness in Operator Calls.' In C. Meyer & P. Leistyna. (eds.) Corpus Analysis: Language Structure and Language Use. Rodopi: Amsterdam.

Singh, S., McEnery, A. and Baker, P.(2000) 'Building a Parallel Corpus of English/Punjabi', in J. Veronis (ed) Parallel Text Processing. Kluwer: Dordrecht, pp 335-347.

McEnery, A.M., Baker, P. and Hardie, A. (2000) 'Swearing and Abuse in Modern British English', in B. Lewandowska-Tomaszczyk and P.J. Melia (eds.) Practical Applications of Language Corpora, Peter Lang: Hamburg, pp 37-48.

McEnery, A. and Baker, P. (2000) 'Minority Language Engineering', in B. Lewandowska-Tomaszczyk and P.J. Melia (eds.) Practical Applications of Language Corpora, Peter Lang: Hamburg, pp 411-428.

McEnery, A.M., Baker, P. and Hardie, A. (2000) 'Assessing Claims about Language Use with Corpus Data - Swearing and Abuse', in J. Kirk (ed) Corpora Galore, Rodopi: Amsterdam, pp 45-55.

Baker, P. (1997) 'Consistency and Accuracy in Correcting Automatically Tagged Data.' In Corpus Annotation. R. Garside, G. Leech & A. McEnery (eds.) Longman Addison-Wesley, pp 243-250.

McEnery, A.M., Baker, P. and Hutchinson, J.E. (1997) 'A Corpus Based Grammar Tutor'. In R.G. Garside, G.N. Leech & A.M. McEnery (eds.) Corpus Annotation, Longman Addison-Wesley, pp 209-219.

Conference Proceedings

Xiao, Z, McEnery, A, Baker, P and Hardie, A (2004) 'Developing Asian language corpora: standards and practice'. In: Proceedings of the 4th Workshop on Asian Language Resources, Sanya, China.

Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) 'Constructing corpora of South Asian languages'. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

Baker, P, Hardie, A, McEnery, AM and Jayaram, BD (2003) 'Corpus data for South Asian language processing'. In: Proceedings of the EACL Workshop on South Asian Languages, Budapest.

Tablan, V., Ursu, C., Bontcheva, K., Cunningham, H., Maynard, D., Hamza, O., McEnery, T., Baker, P. & Leisher, M. (2002) 'A Unicode-based Environment for Creation and Use of Language Resources,' in LREC 2002 Proceedings, pp 66-71.

Baker, P, Hardie, A, McEnery, A, Cunningham, H and Gaizauskas, R (2002) 'EMILLE, a 67-million word corpus of Indic languages: data collection, markup and harmonisation'. In: Proceedings of LREC 2002.

Baker, P, Hardie, A, McEnery, A and Siewierska, A (eds.) (2000) Proceedings of the Third Discourse Anaphora and Reference Resolution Colloquium (2000). UCREL Technical Papers Volume 12 Special Issue. Department of Linguistics, Lancaster University.

McEnery, T., Baker, P., and Burnard, L. (2000) 'Corpus Resources and Minority Language Engineering', in M. Gavrilidou, G. Carayannis, S. Markantontou, S. Piperidis and G. Stainhauoer (eds) Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, pp. 801-806.

McEnery, A. and Baker. P. (1998) 'Intergrating the Intranet into the teaching of linguistics.' (1998). The Future of the Humanities in the Digital Age. International Conference Bergen, Norway. 138-140.

Grants held

I have secured £168,000 in grants since my appointment in 2002.

2009 Principal Investigator. ESRC Grant of £100,000 to examine representation of Muslims in the UK press.

2005 Principal Investigator ESRC. Grant of £44,867 to examine discourse of refugees and asylum seekers in the UK press, 1996-2006.

2005 Principal Investigator. Nuffield Foundation. Grant of £6,800 to examine the representation of Muslims in the UK and US press post 9/11.

2004 Principal Investigator. Small Grants Committee grant of £8,400 to develop the Xara multilingual concordancing package. Collaboration with Lou Burnard at the University of Oxford.

2004 Principal Investigator . Faculty grant (Research Initiatives Fund) £1,629 contributed towards coordination ellipsis project (see below).

2004 Principal Investigator. British Academy grant of £4,680 to carry out a study of coordination ellipsis in the International Corpus of English. Collaboration with Prof Charles Meyer at the University of Massachusetts, Boston.

2002 Principal Investigator. Faculty Research Funds grant of £1,590 to carry out a pilot project to test the feasibility of time-aligning the spoken files of the British National Corpus to the corresponding transcribed text files.

Potential Doctoral Proposals

PhD Students

I will be on sabbatical from August 2012-August 2013 and will not be able to take any new students until I return.

Most of my PhD students are involved in corpus linguistics, (critical) discourse analysis, language and identities or a combination of these. My current PhD students are working on the following topics:

  • the construction of gender identity in Iranian bloggers
  • a corpus study examining how The Guardian reports on the topic of journalism.
  • a corpus-based comparison of two academic books about Wahhabi Islam, focussing on the use of collocation to create ideology
  • a corpus study examining construction of in and out groups in American newspapers.

Recent PhDs I supervised to completion:

  • a corpus-based examination of the concept of political correctness in British broadsheet newspapers
  • the language of marriage rituals in Botswana
  • combining corpus approaches and CDA to examine discourses of terrorism in the British and Chinese popular press
  • combining corpus approaches and CDA to examine discourses of homophobia in a right-wing political organisation
  • a corpus study to compare lexical bundle use of Chinese learners of English with native speakers of English
  • a corpus study of keywords to examine gender identity in British and Malaysian children's writing

The BE06 and AmE06 Corpora

The BE06 and AmE06 Corpora are one million word corpora of published general written British English and American English respectively.They have the same sampling frame as the Brown and LOB corpora. This consists of 500 files of approximately 2,000 word samples taken from 15 genres of writing. The majority of the texts were published between 2005 and 2007. All of the text samples were obtained from online archives although they had originally being published in paper form.

Link to a pdf of the powerpoint slides for a talk on the BE06 I gave at the Lancaster Corpus Research Group on the corpus (June 16th 2008).

Using the corpora

For research and teaching purposes, the corpora can be downloaded as zipped files here.

AmE06zipped files AmE06 metadata

BE06

Additionally, the following links give frequency lists of the BE06 in various formats (right click on the link and then save it).

BE06 Wordlist in .txt only format

BE06 Wordlist in WordSmith 5 format

BE06 Wordlist in Wordsmith 4 format

BE06 Wordlist in WordSmith 3 format

Powerpoint slides

Slides of my plenary talk at the Corpus Linguistics 2011 conference at Birmingham University, July 21st.

Slides of my plenary talk at the American Association of Corpus Linguistics 2011 conference at Georgia State University, Atlanta, October 7th.

Eprints Publications Repository and Bibliographic Database

Paul Baker has 40 selected publication records listed on this webpage. Use links to access abstracts and full text where available. View all records to sort by date, type and title. For all ePrints records go to http://eprints.lancs.ac.uk

Baker, Paul and Gabrielatos, Costas and Khosravinik, Majid and Krzyzanowski, Michal and McEnery, Anthony M. and Wodak, Ruth (2008) A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse and Society, 19 (3). pp. 273-306. ISSN 0957-9265

Gabrielatos, Costas and Baker, Paul (2008) Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press 1996-2005. Journal of English Linguistics, 36 (1). pp. 5-38. ISSN Online ISSN: 1552-5457 Print ISSN: 0075-4242

Baker, Paul and Hardie, Andrew and McEnery, Tony (2006) A glossary of corpus linguistics. Edinburgh University Press, Edinburgh. ISBN 978 0 7486 2018 0

Baker, Paul (2006) Using corpora in discourse analysis. Continuum, London. ISBN 0826477240

Baker, Paul and McEnery, Tony (2005) A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts. Journal of Language and Politics, 4 (2). pp. 197-226. ISSN 1569-2159

Baker, Paul (2005) Public Discourses of Gay Men. Routledge, London. ISBN 0415349737

Baker, Paul (2004) 'Unnatural acts' Discourses of homosexuality within the House of Lords debates on gay male law reform. Journal of Sociolinguistics, 8 (1). pp. 88-106. ISSN 1360-6441

Baker, Paul (2004) Querying keywords : questions of difference, frequency and sense in keywords analysis. Journal of English Linguistics, 32 (4). pp. 346-359. ISSN 0075-4242 (Print) 1552-5457 (Online)

Baker, Paul and Stanley, J. (2003) Hello sailor! : The hidden history of gay life at sea. Longman, London. ISBN 9780582772144

Baker, J. P. (2002) Polari : the lost language of gay men. Routledge. ISBN 0-415-26180-5


Associated Keywords: Broadcast talk and media discourse, Computer-mediated communication, Corpus linguistics, Critical discourse analysis, Discourse analysis, Language, Language and sexual identities, Language, gender and discourse, Language variation and change, Linguistics, sexuality

 

View all research activities, ePrints, news and events associated with Paul Baker.

 

«Back