home     |    timetable      |     archive      |    corpus resouces
 

Lancaster-based corpus resources

Web-based interfaces | Corpus tools | Corpora held at Lancaster | Journals | Mailing lists

 
The following is an overview of corpus-based resources available at Lancaster University. For further information about corpus-based research, ongoing projects, publications and events see the UCREL website. For non-Lancaster-based corpus rescources see David Lee's meta-site.
 
Web-based interfaces ↑top
 

BNCweb

A web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).

The URL for signing up is http://bncweb.lancs.ac.uk/bncwebSignup/user/login.php

 

CQPweb

A corpus analysis tool similar to BNCweb system, which can be used to analyse any corpus.

CQP URL: http://cqpweb.lancs.ac.uk

 

Wmatrix

A software tool for corpus analysis and comparison. It provides a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains.

Wmatrix URL: http://ucrel.lancs.ac.uk/wmatrix/

 
Corpus tools ↑top
 

AntConc

An easy-to-use freeware concordance program. For information on how to access AntConc click here. Also available for free download at this URL: http://www.antlab.sci.waseda.ac.jp/antconc_index.html
 

BNC Web Index

This is the web front end to David Lee's BNC Index spreadsheet.
URL: http://ucrel.lancs.ac.uk/bncindex/

 

CLAWS

Part of speech tagging software for English.

 

LL Calculator

This web-based tool calculates Log-Likelihood values from a 2x2 contingency table.
URL: http://ucrel.lancs.ac.uk/llwizard.html

 

ICECUP

Corpus exploration program designed for parsed corpora such as ICE-GB and DCPSE.
 

USAS Semantic tagger

Semantic tagger developed for English and extended to Finnish and Russian.
 

WordSmith

Windows based concordance program. For information on how to access WordSmith click here.

 
Corpora held at Lancaster ↑top
 

The following are some of the English language corpora held Lancaster University. For more information about some of these corpora see the UCREL site or Richard Xiao's survey of well-known and influential corpora. For information on how to access these corpora click here.

 

Contemporary English Corpora

  • British National Corpus (BNC)
  • American National Corpus (1st release data)
  • English Gigaword
  • ICE_Ireland
  • ICE-EA
  • ICE-GB
  • North American News Text Corpus

Spoken corpora

  • Corpus of London Teenage Language (COLT)
  • Diachronic Corpus of Present Day Spoken English
  • Lancaster/IBM Spoken English Corpus (SEC)
  • London Lund Corpus
  • Longman Spoken American Corpus
  • MICASE
  • Survey of English Dialects (SED)
  • Wellington Spoken Corpus (NZ English)

Brown family of corpora

  • Brown Corpus
  • Frown Corpus
  • LOB Corpus
  • Pre-LOB Corpus
  • FLOB Corpus (UK English
  • ACE Corpus (Australian English)
  • Kolhapur Corpus (Indian English)
  • WWC Corpus (NZ English)
  • British English 2006

Parsed corpora

  • Parsed Corpora
  • Anaphoric Treebank
  • Associated Press Treebank
  • Canadian Hansard Treebank
  • IBM Manuals Treebank
  • ICE-GB
  • Lancaster Parsed Corpus (LOB)
  • Lancaster-Leeds Treebank
  • SUSANNE corpus

Historical corpora

  • Archer Corpus
  • Corpus of 19th Century American Fiction
  • Corpus of Early English Correspondance, sampler
  • Corpus of English Dialogues
  • Corpus of Shakespeare Texts
  • Early English Books Online
  • Helsinki Corpus of English Texts
  • Helsinki Corpus of Older Scots
  • ICAMET
  • Lampeter Corpus of Early Modern English Tracts
  • Lancaster Newsbooks Corpus
  • Newdigate Newsletters

Developmental corpora

  • International Corpus of Learner English (ICLE)
  • Longman Learner Corpus
  • Polytechnic of Wales Corpus

Linguistic Data Consortium

Membership years: 2003, 2004, 2007, 2008

URL: http://www.ldc.upenn.edu/

 
Journals (available through the library) ↑top
 

Corpora : corpus-based language learning, language processing and linguistics

 

Corpus Linguistics and Lingustic Theory

 

International Journal of Corpus Linguistics

 
Mailing lists ↑top
 

UCREL mailing list

Sign-up URL: http://mail.comp.lancs.ac.uk/mailman/listinfo/ucrel
 

UCRS mailing list

If you would like to receive a weekly email announcement of UCRS meetings, email the CRG coordinators:
Costas Gabrielatos (Ext. 92271; c.gabrielatos[...]lancaster.ac.uk);
Mazura Mastura Muhammad (m.muhammad-...]lancaster.ac.uk);
Ghada Mohammed (mohammedg[...]lancaster.ac.uk)

 

Psycholinguistics mailing list

The purpose of this list is to act as a link between the departments of Linguistics and Psychology, allowing distribution of announcements of research presentations (and other talks/events of relevence to psycholinguistics) to interested people in both departments. All staff/students with an interest in psycholinguistics, or language and the mind in general, are invited to join.

To join the list send an email to "majordomo[at]lists.lancs.ac.uk" containing the text “subscribe psycholing my.address@lancaster.ac.uk”.

UCREL Corpus Research Seminar, Lancaster University