English Language and Linguistics
My research interests include corpus linguistics, language and gender/sexual identities and critical discourse analysis. Books include: Corpus Linguistics and Sociolinguistics (2010), Sexed Texts: Language, Gender and Sexuality (2008), Using Corpora in Discourse Analysis (2006), Public Discourses of Gay Men (2005) and Polari: The Lost Language of Gay Men (2002). I am the commissioning editor for the journal Corpora.
PhD Supervision Interests
Most of my PhD students are involved in corpus linguistics, (critical) discourse analysis, language and identities or a combination of these. I am most interested in supervising research which highlights or challenges inequalities and/or can be demonstrated as making a positive impact on society.
My current PhD students are working on the following topics:
A corpus study examining how The Guardian reports on the topic of journalism
Discourses of disease and disaster in American newspapers
Construction of Islam in the BBC sitcom Citizen Khan
Metrosexuality in Malaysia
Discourses of infertility in blogs, news and clinic websites
Ideology and argumentation in Salafi Discourse
Representation of dialect in fiction
Recent PhDs I supervised to completion:
A corpus-based examination of the concept of political correctness in British broadsheet newspapers
The language of marriage rituals in Botswana
Combining corpus approaches and CDA to examine discourses of terrorism in the British and Chinese popular press
Combining corpus approaches and CDA to examine discourses of homophobia in a right-wing political organisation
A corpus study to compare lexical bundle use of Chinese learners of English with native speakers of English
A corpus study of keywords to examine gender identity in British and Malaysian children's writing
The construction of gender identity in Iranian bloggers
A corpus-based comparison of two academic books about Wahhabi Islam
I currently teach various modules in Corpus Linguistics at MA level (on four different schemes), have several PhD students and supervise third year UG dissertations.
The BE06 Corpus
The BE06 Corpus is a one million word corpus of published general written British English. It has the same sampling frame as the LOB and FLOB corpora. This consists of 500 files of 2000 word samples taken from 15 genres of writing.
Eighty-two per cent of the texts were published between 2005 and 2007, while the remainder were published in 2003-4 and early 2008. The median sampling point is 2006, hence the title BE06 (British English 2006). Link to a pdf of the powerpoint slides for a talk I gave at the Lancaster Corpus Research Group on the corpus (June 16th 2008).
Using the corpus
Due to copyright issues, there are no plans to make the corpus files fully available. However, the corpus has been placed on the CQP (Corpus Query Processor) system at Lancaster University and users can carry out concordances, get distribution information (and eventually have access to collocation information). Contact Andrew Hardie in order to obtain a username and password.
Additionally, the following links give frequency lists of the BE06 in various formats (right click on the link and then save it).
The AmE06 Corpus
The AmE06 Corpus is a one million word corpus of published general written American English, also using the same sampling frame as the LOB and FLOB corpora. This consists of 500 files of 2000 word samples taken from 15 genres of writing. The vast majority of the texts were published in 2006. The corpus is also available via CQPweb, and the wordlist is available below.
Baker, P. (2014) Using Corpora to Analyse Gender. London: Bloomsbury.
Baker, P. Gabrielatos, C. and McEnery. T. (2013) Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge: Cambridge University Press.
Baker, P. and Ellece, S. (2011) Key Terms in Discourse Analysis. London: Continuum.
Baker, P. (2010) Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press. More information
Baker, P. (ed.) (2009) Contemporary Corpus Linguistics. London: Continuum. More information
Baker, P. (2008) Sexed Texts: Language, Gender and Sexuality. London: Exquinox. More information
Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum. More information
Baker, P., Hardie, A. & McEnery, A. (2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.
Baker, P. (2005) Public Discourses of Gay Men. London: Routledge. More information
Baker, P. & Stanley, J. (2003) Hello Sailor! Seafaring life for gay men: 1945-1990. London: Pearson. More information
Baker, P. (2002) Fantabulosa: A Dictionary of Polari and Gay Slang. London: Continuum. More information
Baker, P. (2002). Polari: The Lost Language of Gay Men. London: Routledge. More information
I am commissioning editor of the journal Corpora published by Edinburgh University Press.
I am on the editorial board for the Journal of English Linguistics, the Journal of Language and Sexuality, Gender and Language, Applied Linguistics, Journalism and Discourse Studies, Text and Talk and Discourse Coherence, Cognition and Creativity.
Baker, P., Gabrielatos, C. and McEnery T. (2013) ‘Sketching Muslims: A corpus-driven analysis of representations around the word “Muslim” in the British press 1998-2009' Applied Linguistics 34:3
Baker, P. and Potts, A. (2013) '"Why do white people have thin lips?": Google and the perpetuation of stereotypes via auto-complete search forms." Critical Discourse Studies 10:2 187-204.
Baker, P. (2012) ‘From gay language to normative discourse: a diachronic corpus analysis of
Lavender Linguistics conference abstracts 1994-201.' Journal of Language and Sexuality 2:2 179-205.
Potts, A. and Baker. P. (2012) 'Does semantic tagging identify cultural change in British and American English?' International Journal of Corpus Linguistics 17:3 295-324.
Baker, P. (2012) 'Acceptable bias?: Using corpus linguistics methods with critical discourse analysis.' Critical Discourse Studies 9:3 247-256.
Gabrielatos, C., McEnery, T., Diggle, P., Baker. P. and ESRC (funder). (2012) 'The peaks and troughs of corpus-based contextual analysis.' International Journal of Corpus Linguistics. 17:2 151-175.
Baker, P. (2011) 'Times may change but we'll always have money: a corpus driven examination of vocabulary change in four diachronic corpora.' Journal of English Linguistics 39: 65-88.
Baker, P. (2010) 'Will Ms ever be as frequent as Mr? A corpus-based comparison of gendered terms across four diachronic corpora of British English.' Gender and Language 4.1: 125-129.
Chen, Y. and Baker, P. (2010) 'Lexical Bundles in L1 and L2 Academic Writing.' Language Learning and Technology. 14: 2 30-49.
Baker, P. (2010) 'Representations of Islam in British broadsheet and tabloid newspapers 1999-2005.' Language and Politics. 9:2 310-338.
Baker, P. (2009) 'The BE06 Corpus of British English and recent language change.' International Journal of Corpus Linguistics. 14:3 312-337.
Baker, P.,Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T and Wodak, R. (2008) 'A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press.' Discourse and Society 19(3): 273-306.
Gabrielatos, C. and Baker, P. (2008) 'Fleeing, sneaking, flooding: a corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press 1996-2005)' Journal of English Linguistics 36:1 pp. 5-38.
Baker, P. and McEnery, A. (2005) 'A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts.' Language and Politics 4:2 pp. 197-226(30).
Baker, P. Hardie, A. McEnery, A., Xiao, R., Bontcheva, K., Cunningham, H., Gaizauskas, R., Hamza, O., Maynard, D., Tablan, V., Ursu, C., Jayaram, B.D., Leisher, M. (2004) 'Corpus linguistics and South Asian languages: Corpus creation and tool development', Literary and Linguistic Computing, Volume 19, Issue 4, pp 509-524.
Baker, P. (2004) 'Querying keywords: questions of difference, frequency and sense in keywords analysis.' Journal of English Linguistics. 32: 4 pp 346-359.
Baker, P. (2004) '"Unnatural acts"' Discourses of homosexuality within the House of Lords debates on gay male law reform Sociolinguistics 8:1 88-106.
Baker, P. (2002) 'Construction of Gay Identity via Polari in the Julian and Sandy Radio Sketches,' Lesbian and Gay Review, 3:3: pp 75-84.
Baker, P. (2001) 'Moral Panic and Alternative Identity Construction in Usenet'. Journal of Computer Mediated Communication 7:1.
Baker, P. Lie, M., McEnery, A. and Sebba, M. (2000) 'Building a Corpus of Spoken Sylheti', Literary and Linguistic Computing, Volume 15, Issue 4, pp 419-431.
McEnery, A., Wilson, A.and Baker, P.(2000) 'Language teaching: corpus based help for teaching grammar', Journada de Corpus Linguistics, Volume 6, pp 65-77.
McEnery, A. Baker, P. Gaizauskas, R. & Cunningham, H. (2000) 'EMILLE: towards a corpus of South Asian languages', British Computing Society Machine Translation Specialist Group, London, pp 11-1 - 11-9.
McEnery, A. Wilson, A.and Baker, P. (1997) 'Teaching Grammar Again after Twenty Years: Corpus based help for grammar teaching.' New Approaches to Grammar Teaching, RECALL Journal, Volume 9, Number 2, pp 8-17.
Baker, P., McEnery, A.and Wilson, A. (1995) 'A brief report on a statistical analysis of corpus-based versus traditional human-teaching methods of part-of-speech analysis', Language Testing Update, Issue 18, pp 59-62.
McEnery, A., Baker, P. and Wilson, A. (1995) 'A Statistical Analysis of Corpus Based Computer vs Traditional Human Teaching Methods of Part of Speech Analysis', Computer Assisted Language Learning, Volume 8, Number 2-3, pp 259-274.
Baker, P. (1994) 'Lithium Discontinuation - A meta-analysis.' Lithium.
Baker, P. and McEnery, A. (2014) '"'FIND THE DOCTORS OF DEATH': The UK Press and the Issue of Foreign Doctors Working in the NHS, a Corpus-Based Approach". In A. Jaworski and N. Coupland (eds) The Discourse Reader. London: Routledge.
Baker, P. (2014) '"Bad wigs and screaming mimis": Using corpus-Assisted techniques to carry out critical discourse analysis of the representation of trans people in the British press.' In C. Hart and P. Cap (eds) Contemporary Critical Discourse Studies. London, Bloomsbury: 211-236
Baker P. ‘Discourse and Gender'. (2013) In K. Hyland and B. Paltridge (eds) Continuum Companion to Discourse Analysis. London: Continuum.
Baker, P. (2013) ‘Corpus Linguistics and Sociolinguistics'. J . Holmes (ed). Research Methods in Sociolinguistics. A Practical Guide. Wiley Blackwell.
Baker, P. (2012) 'Corpora and Gender studies' In K. Hyland, C. M. Huat and M. Handford (eds) Corpus Applications in Applied Linguistics. London: Continuum, pp. 100-116.
Baker, P. (2012) ‘Diachronic lexical change in American English (1961-2006).' In J. Zhang (ed). A Morphologically-based Study of the Lexical Collocation Heterogeneity in EST Texts. Shanghai Jiaotong University.
Baker, P. (2011) 'Social involvement in Corpus Studies.' In V. Viana, S. Zyngier, and G. Barnbrook (eds) Perspectives on Corpus Linguisitcs. Amsterdam: John Benjamins pp. 17-28.
Baker, P. (2010) 'Corpus Linguistics'. L. Litosseleti (ed) Research Methods in Linguistics. London: Continuum, pp. 93-113.
Baker, P. (2009) 'Issues in teaching corpus-based discourse analysis' In L. Lombardo (ed). Using Corpora to Learn about Language and Discourse. Peter Lang, pp. 73-98.
Baker, P. (2009) 'Introduction' In P. Baker (ed) Contemporary Approaches to Corpus Linguistics. London: Continnum, pp. 1-8.
Baker, P. (2009) 'Language and Sexuality'. In J. Culpeper, F. Katamba, P. Kerswill, R. Wodak and T. McEnery (eds) English Language and Linguistics. London: Palgrave, pp. 550-563.
Baker, P. (2008) 'Eligible' bachelors and 'frustrated' spinsters: corpus linguistics, gender and language. In J. Sunderland, K. Harrington and H. Stantson (eds) Gender and Language Research Methodologies. London: Palgrave.
McEnery, T. and Baker, P. (2003) 'Corpora, translation and multilingual computing' in F. Zannetin (ed.) Corpora in Translator Education, St. Jerome Press, Manchester.
Baker, P. (2002) 'No Fats, Femmes or Flamers: Changing Constructions of Identity and the Object of Desire in Gay Men's Magazines.' B. Benwell (ed.) Masculinity and Men's Lifestyle Magazines. Sociological Review.
McEnery, A., Baker, P. and Cheepen, C. (2001) 'Lexis, Indirectness and Politeness in Operator Calls.' In C. Meyer & P. Leistyna. (eds.) Corpus Analysis: Language Structure and Language Use. Rodopi: Amsterdam.
Singh, S., McEnery, A. and Baker, P.(2000) 'Building a Parallel Corpus of English/Punjabi', in J. Veronis (ed) Parallel Text Processing. Kluwer: Dordrecht, pp 335-347.
McEnery, A.M., Baker, P. andHardie, A. (2000) 'Swearing and Abuse in Modern British English', in B. Lewandowska-Tomaszczyk and P.J. Melia (eds.) Practical Applications of Language Corpora, Peter Lang: Hamburg, pp 37-48.
McEnery, A. and Baker, P. (2000) 'Minority Language Engineering', in B. Lewandowska-Tomaszczyk and P.J. Melia (eds.) Practical Applications of Language Corpora, Peter Lang: Hamburg, pp 411-428.
McEnery, A.M., Baker, P. andHardie, A. (2000) 'Assessing Claims about Language Use with Corpus Data - Swearing and Abuse', in J. Kirk (ed) Corpora Galore, Rodopi: Amsterdam, pp 45-55.
Baker, P. (1997) 'Consistency and Accuracy in Correcting Automatically Tagged Data.' In Corpus Annotation. R. Garside, G. Leech & A. McEnery (eds.) Longman Addison-Wesley, pp 243-250.
McEnery, A.M., Baker, P.& Hutchinson, J.E. (1997) 'A Corpus Based Grammar Tutor'. In R.G. Garside, G.N. Leech & A.M. McEnery (eds.) Corpus Annotation, Longman Addison-Wesley, pp 209-219.
Xiao, Z, McEnery, A, Baker, P and Hardie, A (2004) 'Developing Asian language corpora: standards and practice'. In: Proceedings of the 4th Workshop on Asian Language Resources, Sanya, China.
Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) 'Constructing corpora of South Asian languages'. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.
Baker, P, Hardie, A, McEnery, AM and Jayaram, BD (2003) 'Corpus data for South Asian language processing'. In: Proceedings of the EACL Workshop on South Asian Languages, Budapest.
Tablan, V., Ursu, C., Bontcheva, K., Cunningham, H., Maynard, D., Hamza, O., McEnery, T., Baker, P. & Leisher, M. (2002) 'A Unicode-based Environment for Creation and Use of Language Resources,' in LREC 2002 Proceedings, pp 66-71.
Baker, P, Hardie, A, McEnery, A, Cunningham, H and Gaizauskas, R (2002) 'EMILLE, a 67-million word corpus of Indic languages: data collection, markup and harmonisation'. In: Proceedings of LREC 2002.
Baker, P, Hardie, A, McEnery, A and Siewierska, A (eds.) (2000) Proceedings of the Third Discourse Anaphora and Reference Resolution Colloquium (2000). UCREL Technical Papers Volume 12 Special Issue. Department of Linguistics, Lancaster University.
McEnery, T., Baker, P., and Burnard, L. (2000) 'Corpus Resources and Minority Language Engineering', in M. Gavrilidou, G. Carayannis, S. Markantontou, S. Piperidis and G. Stainhauoer (eds) Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, pp. 801-806.
McEnery, A. and Baker. P. (1998) 'Intergrating the Intranet into the teaching of linguistics.' (1998). The Future of the Humanities in the Digital Age. International Conference Bergen, Norway. 138-140.
No publications found
CASS is a Centre designed to bring a new method in the study of language – the corpus approach – to a range of social sciences. In doing it provides an insight into the use and manipulation of lan ... Read more»
... Read more»
CASS is delighted to announce a successful ESRC application for funding on a project entitled "Twitter rape threats and the discourse of online misogyny" (ES/L008874/1). The award of £191,245.25 was ... Read more»
Daily Mail: How Google's autocomplete reveals racist, sexist and homophobic searches: Researchers claim search function 'perpetuates prejudices'17/05/2013
Paul Baker gave a talk to the English department at the University of Chester entitled: Do men all just want the same thing? A corpus linguistics approach to the analysis of gender and desire in perso ... Read more»
The seventh international Corpus Linguistics conference (CL2013) will be held at Lancaster University from Tuesday 23rd July 2013 to Friday 26th July 2013. The main conference will be preceded by a wo ... Read more»
The Polari Mission is a multi-disciplinary collaboration between artists and specialists in the fields of linguistics and computer science including Professor Paul Baker (Lancaster University) & T ... Read more»
Keywords: signposts to objectivity? Keywords offer discourse analysts a corpus-driven method of identifying salient lexical items in corpora, thus directing researchers towards interesting discursive ... Read more»
Paul is invited as keynote speaker to the PG Conference, Lancaster 2012. For details see http://www.lancs.ac.uk/fass/events/laelpgconference/programme.htm ... Read more»
Paul Baker speaking on Radio New Zealand about Polari, the secret language of gay men. http://www.radionz.co.nz/national/programmes/saturday/audio/2522111/paul-baker-polari ... Read more»
Paul Baker gave a talk at the University of Brighton on the construction of Muslim women who wear veils in the British press. ... Read more»
Paul Baker gave a talk on Polari at the Lancashire Archives, Preston, as part of the LGBT history month. News website: http://www.lancashire.gov.uk/corporate/web/?siteid=4528&pageid=38721&e=e ... Read more»
Paul Baker gave an invited talk on his research on the representation of Islam in the British press at Aston University. ... Read more»
A press release summarising research on the representation of Islam and Muslims in the UK press by Paul Baker, Costas Gabrielatos and Tony McEnery has been cited by ENGAGE, a group which aims to promo ... Read more»
Paul Baker gave the opening plenary talk at the American Association of Corpus Linguistics conference in Atlanta, Georgia. The talk was called "Has American English gotten any different? Using the Bro ... Read more»
Paul Baker gave one of the plenary talks at Corpus Linguistics 2011, held at Birmingham University, July 20-22nd. The powerpoint presentation for the talk is here. News website: http://cl2011.org.uk/ ... Read more»
Paul Baker gave a talk on the ESRC-funded refugees project at the annual IntelliText workshop at Leeds University. ... Read more»
Paul Baker gave a talkto COMPAS (Centre on Migration, Policy and Society) at Oxford University entitled "When is an asylum seeker not an asylum seeker? The representation of immigration in the UK pres ... Read more»
As Part of the British Library's Evolving English exhibition Paul Baker gave a talk entitled: Fantabulosa: Gay languages from Polari to the bear code. ... Read more»
Paul Baker gave a talk on the history and origins of Polari at the Imperial War Museum North. ... Read more»