CORGRAM: Corpus-based grammar in contrast
Summary: The CORGRAM project is a quantitative investigation into the distributional properties of grammatical categories associated with nouns and verbs in three Indo-European languages. In this project, we will explore the application of novel corpus-based methods to a set of issues in grammatical analysis, in the context of a language, Nepali, for which corpus linguistics is in its infancy. It will also extend this analysis to a cross-linguistic comparison bringing in English and Russian.
Key Facts
Website: http://www.lancs.ac.uk/staff/hardiea/nepali/corgram.php
Funder: AHRC
Type of Activity: Academic Research - Externally Funded
Principal Investigator: Andrew Hardie
Research Associates: Ram Lohani, Olga Mudraya
Dept/Research Groups: University Centre for Computer Corpus Research on Language (UCREL), Linguistics and English Language
Keywords: Corpus linguistic methodology, Corpus linguistics, Corpus tools, Grammar, Grammatical theory and description, Linguistics, Linguistic typology, South Asia, Language, Multilingual corpora, Quantitative linguistics, European languages
Project Description
Previous work in the field of Nepali grammar has catalogued combinations of grammatical and lexical elements which can possibly occur. For example, Acharya (1991:78, 153, 157) lists 13 combinations of nouns and case-marking postpositions, and 360 different inflected forms of the Nepali lexical verb. Schmidt et al. (1993:xxi-xxvi) give similar catalogues of possibilities. However, to date little or no work on this topic, or Nepali grammar in general, has been based on the large-scale analysis of grammar in usage that corpus-based methods afford.
The grammatical categories of case (on nouns) and tense, aspect and mood (on verbs) are realised in Nepali as partially-bound elements which typically occur in close proximity to the nouns and verbs they relate to. Case, as well as the plural-collective marker, is indicated by post-nominal elements described variously as suffixes, clitics and postpositions. Tense, aspect and mood are largely marked by compounded auxiliary verbs, which however can also occur independently.
The semi-independence of these grammatical markers implies a degree of variety in their possible positions in the sentence structure. This raises the possibility of studying these markers, and the grammatical patterns in whose formation they participate, via quantitative analysis of their co-occurrence patterns in textual data. As outlined below, this may be accomplished by searching a corpus for statistically valid collocations. Collocation-based methods have been applied to the grammar of English, but not widely in a cross-linguistic context.
The questions to be addressed are in summary:
- What is the behaviour of Nepali grammatical categories, seen from a corpus-based quantitative perspective using analysis of co-occurrence and collocation, in real, naturally produced text?
- How does this add to (or amend) our knowledge of Nepali grammar based on earlier, non-corpus-based analyses?
- What cross-linguistic correspondences exist between the patterns of co-occurrence behaviour of these elements, and those of equivalent elements in two other languages, English and Russian?
References
Acharya, J (1991) A descriptive grammar of Nepali. Washington, D.C.: Georgetown University Press.
Schmidt, RL, Dahal, BM, Pradham, KB and Vajracharya, G (eds.) (1993) A practical dictionary of Modern Nepali. Delhi: Ratna Sagar.
Purpose of Research
Academic Research - Externally Funded
Project Funder
AHRC - £117,886
Associated Events
Workshop: Introduction to CQPweb
Date: 18 September 2008 Time: 14.00-16.00 pm
Linguistics and English Language workshop:Introduction to CQPweb CQPweb CQPweb is a new corpus analysis tool. It is designed as a clone of the ... Read more»

