
1. What is the Corpus of Written British Creole?
2. Principles of selecting materials for the Corpus
3. Annotating the Corpus of Written British Creole
5. Using the Corpus of Written British Creole
6. Obtaining a copy of the Corpus
Appendix 1: List of texts contained in the Corpus of Written British Creole (1998)
Appendix 2: Headers in the CWBC Corpus
Appendix 3: Language tags in the CWBC Corpus
Appendix 4: Further notes on the grammar tag codes in the Corpus of Written British Creole
A new written language is taking shape in Britain. After centuries of being mainly a spoken language (or rather a group of similar, spoken languages) which only occasionally was written down, Caribbean Creole has begun to appear regularly in print in Britain.
Until very recently, most published Creole took the form of written versions of songs or poems originally spoken to music ("Dub poetry"). These often appear on the album covers/inserts of poets like Linton Kwesi Johnson and Jean Binta Breeze. However, some poetry is written in Creole and published without being performed first, and in the last few years, a number of novels have appeared which use Creole extensively in dialogue and even for first person narrative. There is also an unknown amount of personal writing - letters for example - in Creole, which is never intended to be published.
Thus we have the emergence in Britain of a written variety, in the absence of any clear or authoritative norms or "standards". This presents an unusual opportunity to several different groups of researchers. Firstly, to those interested in Caribbean Creole and its development, especially its development outside the Caribbean; secondly, to corpus linguists, to set up a corpus of a non-standard written language variety (a task which has to date scarcely been undertaken); thirdly, to historical sociolinguists, who have a rare chance to see an unstandardised language developing its written form - a stage which English reached at least five centuries ago.
The Corpus of Written British Creole was compiled at Lancaster University with financial support from the British Academy (Small Personal Research Grant no. 05-012-4670, grantholder: Mark Sebba). Most of the searching for texts, permission clearance and inputting work was carried out by Sally Kedge in 1995. In 1998, additional work, including additional tagging and checking for errors, was done by Susan Dray.
At the time the original proposal was formulated, we believed that due to the relatively small volume of texts being produced in Creole in Britain, it was theoretically possible to collect every text of that description and still have a corpus of a manageable size. We knew that in practice that would not be possible, and in the event things were even more difficult than we had expected. This was partly because of the volume of informal writing (unpublished, and never intended to be published) which was simply not accessible; for example, personal letters written from one Creole speaker to another. Partly it was also for technical reasons of copyright and permission which meant that even where we had physical possession of a text, it could not be included in the corpus in machine-readable form.
In fact we were able to collect many more texts than have been placed in the Corpus (here we are using "Corpus" to mean the collection of machine-readable and annotated texts, rather than the whole collection of books, pamphlets and other pieces of writing which we have built up.) These texts exist on paper but are not availabe in machine-readable format. Time and resources have been one of the main factors preventing the expansion of the Corpus. Inputting the texts (transferring them from paper to machine-readable form) is in itself a time-consuming task. Tagging the texts with spelling, grammatical and discourse information has required even more time and effort. But inputting and tagging cannot even begin until the authors or publishers of printed material have been contacted and have given permission to include their work in the Corpus. Some authors are difficult or impossible to contact, while a small number of others denied permission to use their work.
The Corpus of Written British Creole is very small in corpus linguistic terms (around 12,000 words - even the early "small" computer corpora contained one million words). This raises the question of how "representative" the Corpus of Written British Creole is. Corpus linguists typically concern themselves with the question of "representativeness" of a corpus; in other words, how well the sample of texts in the corpus reflects the language found in a particular "universe" of texts (i.e. the totality of texts in that language). The representativeness of a corpus in not easy to determine, partly because, as Rieger (1979:66) points out,
a sample [...] can only be characterised as representative when so much is known about the universe from which it comes that the construction of this sample is no longer necessary.
In the case of the Creole corpus, there are additional complicating factors. Exactly what should count as the "universe" is not very clear, as Creole does not exist in a separate world from Standard English. Mixing of the two is very common. Therefore, should the "universe" include all texts which contain both Standard English and Creole, as well as all texts purely in Creole?
A less demanding requirement which can be made of a corpus is that it be exemplary. According to Bungarten (1979:42-43), "a corpus is exemplary, when its representativeness is not proven, but less formal arguments, like evident cohesiveness, linguistic judgments of a comptetent researcher, professional consensus, textual and pragmatic indicators, argue that the corpus may reasonably function as representative".
Because of the limitations mentioned above, the best that we could hope to establish is an exemplary corpus of written British Creole, which according to the "linguistic judgments of a comptetent researcher" is sufficiently wide-ranging and yet sufficiently cohesive that it could be considered representative of the language in its current state. Yet even that is not an easy goal. Thre are substantial difficulties entailed in determining what may reasonably be taken to "represent" an unstandardised language where the boundaries between it and its related standard - Standard English in this case - are so fuzzy. It is not clear that any researcher is really competent to make this linguistic judgement at the present time.
This is not to say that the whole exercise of setting up a Corpus of Written British Creole is pointless. On the contrary, we hope it may be a significant step to understanding better the nature of written British Creole. What is important is for researchers to understand the limitations of the Corpus. It should not be taken as representative of "Creole" or "written Creole in Britain" or any other such designation, without first reflecting on exactly what this might mean and the complexity of the underlying issues. It should not be assumed that what holds good for the language of the Corpus is necessarily true of Creole as written elsewhere or as spoken in the home or in the street. It would be wrong to assume that the Corpus represents one homogeneous variety. To help avoid this assumption, we have done our best to provide labels for each extract which will help researchers to locate it in terms of its time of writing, the author, the character whose speech is quoted, and its regional origin. Users of the corpus would do well to bear these factors in mind.
The Corpus, seen in this light, is a diverse collection of texts, which have in common that they contain an element of Creole. It is not a perfect sample, partly because that is an unrealistic goal, and partly due to lack of resources. It is simply the best which we could do with what we had at our disposal. The Corpus is available as a resource for researchers who want to study British Creole for whatever reason, and our hope is that it may provide the basis for fruitful research.
We had also to make decisions about the types of texts which would be included in the corpus. It was decided that in principle no genre would be excluded. However, it was very obvious that it would be impossible to obtain a balance of genres, as in more conventional types of corpus building.
The Corpus currently contains texts of the following types:
The decision to limit the Corpus to including writers who "either are known to be British born or to have spent most of their formative years in Britain" was taken on two grounds. One was the desire to be as representative as possible of one particular emergent variety of Creole. The other reason for limiting the Corpus to writers with a strong British connection was the feeling that a distinctive British tradition of writing Creole might be emerging. This may well have turned out to be wrong, as it involves the implicit assumption of a break in the tradition of writing Creole between the Caribbean and Britain. In fact, since many published authors have lived in both places, and to some extent texts published in the Caribbean are also available in Britain and vice versa, it is more reasonable to think of an unbroken tradition of Creole writing which unites the Caribbean with Britain. On the other hand, it may well be that differences will emerge over time. It is hoped that the Corpus will grow over time as new texts are added to it. Each extract in the Corpus contains a tag <year=xxxx> containing the year of publication or writing, so that it will be possible to track spelling, lexical or grammatical changes by comparing the dates associated with the different usages.
The choice of which elements to tag and what tags to use is one of the most difficult issues in constructing a corpus like this one. It is not simply a practical issue; it involves theoretical assumptions about the nature of the data. For example, should it be assumed that there is just one language present in the data, or two, or several - and if there is more than one, should all be treated in the same way for tagging purposes, or should they be marked in different ways?
Tagging is a developing area of corpus linguistics. Given that in a well-studied standard language like written Standard English, the grammatical classes of words are not too controversial in themselves, much of the current research in this area is devoted to developing automatic methods of tagging. For the Corpus of Written British Creole the problem is a much more basic one, of simply deciding what should be tagged and how.
In a typical "monolingual" tagged corpus, each individual word in the corpus would be tagged with one or more tags relating to the grammatical, semantic or other properties of the word. For example, in the LOB corpus, a sentence like
would be tagged as follows:
We felt that tagging like this was not practical for the Corpus of Written British Creole. Firstly, as all tagging would have to be manual it was simply beyond our resources. Secondly, there was not yet a sufficiently well developed descriptive grammar for the Creole to make the assignment of tags straightforward; too many arbitrary and problematic tags would have to be applied. Thirdly, it was not obvious that this type of tagging would be of much use to researchers. In fact, it was more likely that a detailed descriptive grammar would develop out of the Corpus, using it as a resource, rather than the other way around.
We decided to use a set of contrastive tags which would mark differences in spelling, lexis, and discoursal and grammatical structure between Standard English and the language of the Corpus texts. In other words, tags have mainly been used only where the word or structure encountered would not be expected in a text which was in Standard English. This greatly simplified the work of tagging the corpus, though at a cost: the tagging appears to focus on the language of the Corpus as a variety of English, rather than a language in its own right. It would be very unfortunate if anyone took this to imply either that Creole is in some way inferior to English, or that the purpose of the Corpus is to draw attention to "mistakes" or "deviant" grammar. That is absolutely not the intention, and would go against the drift of both Creole Studies and Linguistics in general over the last half century.
In adding contrastive tags to the data, we hope we have done the form of tagging which will be of use to most researchers. The general rule we have applied has been:
Where a form (word, structure etc.) is identical to an acceptable Standard English form, no tag has been added. Where the form is different from that expected in Standard English, a tag has been added which flags the nature of that difference.
So, for example, extract 20 from text 1 in the corpus is tagged as follows:
'I have one<gr=art> room in yah<lex=yah> specially<sp=especially> for you,
man<disc=man>.' Joseph switched to business. 'So is weh<sp=where> de<sp=the>
load deh<gr=de-cop><gr=cleft><gr=queststr>, sah<sp=sir>?'<bookid=01><speakerid
=joseph><year=1992><extractid=20><pageno=9>
In this extract, we can see that an entire sentence, Joseph switched to business, has no tags at all. This is because it is indistinguishable from written Standard English. Several words in the other sentences of the extract, for example, I, have, room, load, so, are also untagged, for the same reason. However, most of the other words are tagged, to signal spelling differences (weh<sp=where>), grammar differences (deh<gr=de-cop>), lexical (yah<lex=yah>)or discoursal features (man<disc=man>) which characterise Creole in opposition to Standard English.
The detailed structure of the tagset and examples of how all the main tags are used will be found in Appendix 3: Language Tags in the CWBC Corpus.
http://www.comp.lancs.ac.uk/computing/research/ucrel
Longman Mini-Concordancer is an easy-to-use concordancing package which has sometimes been used to search the Corpus but unfortunately it can only process a limited amount of text and the Corpus is already too large to be loaded into it in full. A more advanced corpus tool like WordSmith is therefore recommended. If you cannot obtain any concordancing software, it is also possible to do searches using an ordinary word processes.
A concordancer will enable you to find every instance of a particular word or tag which occurs within the Corpus. For example, if you want to find every example of the tense/aspect marking particle a in the corpus, you would make your concordancer do a concordance on the tag <gr=a-tense>. The output could look something like this:
celebrate. Big t'ings<sp=things>
ah<gr=a-tense> gwan<sp=going on> 'bout
ress> we<gr=APPGE> programme,
we ah<gr=a-tense> go shoot first, ask qu
er. "Work?! Where? Here? Joke
you a<gr=a-tense> joke<gr=?predicate-cle
e<gr=PPIS1> hear<gr=tense>
people a<gr=a-tense> talk bout<sp=about> whe
-to> bout<sp=about> him<gr=PPHS1>
a<gr=a-tense> go stay up dat<sp=that>
p=about> 12 o'clock him<gr=PPHS1>
a<gr=a-tense> go up a<gr=a-prep> Hel
dat<sp=that> him<gr=PPHS1>
could a<gr=a-tense> see through dem<gr=PPHO
actid=72><pageno=5> Joe Samuel
a<gr=a-tense> daed<typo=dead> wid<sp=
night was darkan<sp=and>
no moon a<gr=a-tense>shine.<bookid=23><year=
Notice that in the above selection, two spelling variants of the a particle have been retrieved. Your concordancer will also allow you to view more of the context of each sentence; for example, if the second line of the above selection seemed to merit further investigation, you would be able to view the whole extract:
"To all informer man<sem=man>
who waan<sp=want to> distress<sem=distress>
we<gr=APPGE> programme, we ah<gr=a-tense> go shoot first, ask question<gr=?no
plural marker>
later, seen<sem=seen>!" Easy-Love,
his funki<sp=funky><lex=funky>
dreds<sp=dreads><lex=dreads> bouncing,
announced to nobody in particular or whoever was
listening<bookid=020><speakerid=EasyLove><year=1994><extractid=13><pageno=13>.
Different concordancers will do this in slightly different ways. Note that each extract contains information at the end about what piece of writing it was taken from, the identity of the speaker/narrator, the date of publication and the page location where it was found. Thus by finding the end of any extract, it is possible to find these details.
If on the other hand, you wanted to find all the variant spellings of the Standard English word nothing, you would do a concordance on <sp=nothing> with a result like the following:
night; you didn't say notten<sp=nothing><gr=double-negative>
about com
. Man, you didn't say notten<sp=nothing><gr=double-negative>
in that
r business. Ain't got notten<sp=nothing>
to live for if I ain't got yo
in't got you to love! Notten<sp=nothing>
at all. <bookid=022><speakeri
worry 'bout<sp=about> notten<sp=nothing>,
man!<bookid=022><speakerid=U
live. Rastas don't see notin<sp=nothing><gr=double-negative>
wrong with
Dem<gr=PPHS2> can't do notin<sp=nothing><gr=double-negative>.<bookid=
r=?neg> haf<sp=have> nutting<sp=nothing>
fi<lex=fi> wurry<sp=worry> bo
im<PPHO1> seh<sp=say>
nuttin<sp=nothing> nuh<sp=no><gr=negative> bus<s
negative> sey<sp=say> nuttin<sp=nothing><gr=double-negative>.
<bookid=
6. Obtaining a copy of the Corpus
Dr. Mark Sebba (Lecturer in Linguistics, Lancaster University)
Department of Linguistics and Modern English Language,
LancasterUniversity,
Lancaster LA1 4YT
Great Britain
Tel: 01524 592453 (from outside Britain: +44 1524 592453) E-mail: M.Sebba@lancs.ac.uk
A request to users of the Corpus
We are interested in knowing about the uses to which the Corpus is being put and the type of research you are doing. We are continually collecting further material to add to the Corpus and would be grateful for any contributions or information about possible texts.
| Text type | First name | Last name | Title | Date | File name (marked-up) | File name (plain text) | Size (words) |
| poetry | Jean | Breeze | Tracks (part) | 1989 | marku47 | plain47. | 2080 |
| poetry | Linton Kwesi | Johnson | Tings and Times (part) | 1991 | marku53.txt | plain53.txt | 383 |
| poetry | Sandra | Mundle | Ole Woman | 1995 | marku 50.txt | plain50.txt | 156 |
| poetry | Benjamin | Zephaniah | City Psalms (part) | 1992 | marku 13.txt | 785 | |
| poetry | Benjamin | Zephaniah | City Psalms (part)
|
1992 | plain13.txt | 8185 | |
| novel | Victor | Headley | Yardie | 1992 | marku1.txt | plain1.txt | 1927 |
| novel | Karline | Smith | Moss Side Massive | 1995 | marku 20.txt | plain20.txt | 6396 |
| play | Randhi | McWilliams | God, Man and Sister Geraldine | 1995 | marku 22.txt | plain22.txt | 3113 |
| student writing
(school compilation) |
G.M. | Richards | A Fe We Ting
|
ca. 1980 | marku 23.txt | plain23.txt | 3013 |
| advertisement | anon | Dragon Stout Advertisement | 1995 | marku49.txt | plain49.txt | ||
| advertisement | anon | Desnoes and Geddes advertisement | 1995 | marku48 | plain48 | 5 | |
| cartoon | anon | Nuff Agonies | 1995 | marku54.txt | plain54.txt | 32 | |
| educational materials | Mike | Read | Where do I belong inna Inglan | 1984/8 | marku24 | plain.24 | 1618 |
| newspaper article | anon | Weekly Gleaner extract - Eating Out | 1994 | marku45.txt | plain45.txt | 573 | |
| graffiti | anon | Denise and. Cheryl | 1984 | marku46.txt | plain46.txt | 5 |
Each corpus entry begins with a header which contains certain information about the text which follows.
Each item of information has the form of <x = y> and usually is self-explanatory, for example:
<Bookid=013>
<booktype=poems>
<title=City Psalms>
<date=1992>
<authorname=Benjamin Zephaniah>
<authorcountry=Britain>
<authordob=1958>
This means that the particular piece of writing (book, poem, play etc.) has, for the purposes of the Corpus, been given the unique "identity number" 013 (<Bookid=013>) (Note that <Bookid= is used for poems, plays etc., not just books). This particular corpus entry is a book of poems called City Psalms, published in 1992 by Benjamin Zephaniah, who, as far as we have been able to determine, is (mainly) British and was born in 1958.
Sometimes this header is further augmented by information about individual characters in a book or play, for example:
Passenger<speakerage=><speakercountry=Jamaica><speakerresidence=Jamaica>
This means that the character called "Passenger" is stated or implied somewhere in the text to be a Jamaican who lives in Jamaica. His age is not given. This information is given where available, to help interpret the speech characteristics of individuals represented in the texts.
Each individual extract from a longer text within the corpus ends with a string of tags which repeat, in part, the information given in the header. For example:
'So is holiday<gr=no-art> you come<gr=no-aux> for<gr=cleft><gr=queststr>, or you plan to settle on<x3=on> yah<lex=yah><gr=queststr>?' she enquired.<bookid=01><speakerid=donna><year=1992><extractid=46><pageno=20>
The tags at the end of this extract indicate that this is an extract
from corpus entry 01, which was published in 1992. They also give a unique
number referring to this extract (<extractid=46>)
and its page location (<pageno=20>),
as well as the name of the character who speaks the words in quotation
marks (<speakerid=donna>).
So even when the extract is separated from its header (e.g. in a list which
results from a search using a concordancer) it is still possible to trace
the source of the extract and to see some of the relevant details without
checking the header.
Grammar/syntax: <gr=
Lexicon: <lex=
Semantic: <sem=
Discourse: <disc=
2. The codes
The string of characters following the equal sign is a code which indicates more precisely the nature of the feature which is of interest.
2.1 Spelling: the term after the equal sign is the standard spelling of the same word, e.g.
bwai<sp=boy>
nuff<sp=enough>
This allows the researcher to trace all variant spellings of a word by searching on the standard spelling.
2.2 Lexicon: this is used to tag lexical items which are not expected in Standard British English. The term after the equal sign is:
pickney<lex=pickney> (DJE headword form)
baddap<lex=bad-up> (Standardised spelling)
skanking<lex=skanking> (The word itself)
What follows the equal sign is not intended as a gloss for the item in question. The use of a standardised form following the equal sign is intended to help in searching, in the event that there are several different spellings or forms of the same word.
2.3 Semantic: the term after the equal sign is an indication that the meaning differs from the Standard English meaning of the same item, or, in the case of specific items which serve as grammatical functors, the headword form of that word. In the example below, the lexical item Babylon is not specific to Creole (hence not <lex= > but does not have the same meaning as in Standard English. Here it means "the police". However, the tag itself does not provide information about the meaning; it only draws attention to the fact of a specifically Creole meaning.
Dat de babylon<sem=babylon> hol’ t’ree breders<sem=brothers>
Warning: the tag <sem= has not been used in the corpus with complete consistency. You may come across mistakes.
2.4 Discourse: This is a small, closed set of specific discourse markers. The term following the equal sign describes the nature of the spoken expression being tagged, e.g.
"Boo-yah<disc=excl>! What happen<disc=greeting>, Blood<disc=address>?
The set of discourse marking tags used in the Corpus is as follows:
<disc=greeting> for ritualised greetings: Hail<disc=greeting>, Landlady,{...}
<disc=excl> for exclamations: "Bwai<disc=excl><sp=boy>, him must be a millionaire," Sister Jones stated
<disc=address> for terms of address used to another participant in the conversation. The commonest term in this category is man:
We ah go look after you, man<disc=man><disc=address> .
<disc=questtag> for question tags which are characteristic of non-standard English, especially innit (which is associated with London speech) and tags of JC origin like seen? : You got every last detail planned, innit <disc=questtag> Cliff?"
Me name Joseph, seen<disc=questtag> ?
<disc=expletive> is used for words which represent swearing. Some of these are obscene and/or highly offensive to some people.
Raas<lex=raas>claat<sp=cloth><lex=raas-claat><disc=expletive>!
<disc=man> for the word man used in a discourse function rather than as a common noun. As man appears to have a number of distinct discourse functions, a second <disc= tag may be used to indicate the function of this particular instance of man:
You must have been on Mars or sump'n man<disc=man><disc=address>.
Man<disc=man><disc=excl>, Sidney Higgins, you turn<sem=turn><gr=no-aux> comic<gr=no-art> now<gr=queststr>?
2.5 Grammar: the grammar codes are more complex than the other
types.
Some items have been coded using codes which only indicate in general terms the category of the difference between the item and the Standard English equivalent. These are in capital letters, for example:
<gr=TENSE>
This indicates that the tense marking, or the interpretation of the tense of the tagged verb, is different from the equivalent in Standard English.
<gr=STRUCTURE>
This indicates that the structure of the sentence preceding the tag is in some way unexpected or noteworthy, but gives no further information.
We have tried to use these vague tags sparingly, but feel they may be useful to the analyst in picking out interesting words or strings.
The more usual form of the code following <gr= is an indicator of the exact nature of the grammatical difference between the tagged item/string and its Standard English equivalent. For example:
"You heard<gr=no-aux> about Fluxy?"
Big t'ings ah<gr=a-tense> gwan 'bout yah!
In some cases, CLAWS 2 grammatical tags have been used. These are used to mark the part-of-speech category of the tagged word, but only when it is different in form from the one expected in Standard English. For example,
Me<gr=PPIS1> have 'nuff woman...
The tag <gr=PPIS1> on the word "me" indicates that it represents the first person singular subject pronoun. However, in the corpus the corresponding pronoun in the sentence
I have 'nuff woman..
would not receive any tag, because it has the Standard English form.
A full list of the CLAWS pronoun tags is given below in Appendix 4, Section 1. However, it will usually be obvious what the meaning of the tag is and why the particular word has been tagged.
Contents of this section:
1. Pronouns
2. Plural Marking
3. Articles and Demonstratives
4.Possessives
5. Prepositions
6. Tense and Aspect Marking
7. Use of the copula
8. Negation
9. Infinitive marking
10. Question structure
11. Cleft and predicate cleft
12. Other verbal constructions
singular plural
1 mi + wi ++
2 yu unu +++
3 im (s/he) + dem +
i (it)
The forms marked + differ from the Standard English subject forms. Those marked ++ differ from the the Standard English object forms. The second person plural form unu is not found in Standard English at all, and all the forms above differ from the forms used in Standard English for the possessive.
Where a pronoun form used in the Corpus is different from the form which would be expected in Standard English, it has been tagged using the appropriate CLAWS tag which indicates person and number, or in the case of possessive pronouns, possession.
e.g.
wi<gr=PPIS2>
CLAWS tags for pronouns
I
PPIS1
me PPIO1
my APPGE
you PPY (singular)
you PPY2 (plural)
NB - as this is not a distinction made in SE, this is not a CLAWS 2
tag
your APPGE
he/she PPHS1
his/her APPGE
him/her PPHO1
we
PPIS2
our PPIO2
mine PPGE
yours PPGE
them PPHO2
they PPHS2
Myself
PPX1
Yourself
PPX1
Him/her/itself PPX1
Yourselves PPX2
Ourselves PPX2
Theirselves PPX2
2. Plural Marking
JC does not mark the plural of nouns, except in the case of (usually) animate nouns, which may be followed by the affix -dem. Such items are tagged by <gr=plmkr>, e.g.
hearken to de people dem<gr=plmkr> voice
Where a plural form is expected in Standard English, but is absent in JC, the tag
<gr=no-plmkr> is used in the Corpus, e.g.
Me have 'nuff<sp=enough> woman<gr=no-plmkr>
3. Articles and Demonstratives
The use of these is different in Jamaican Creole and Standard English. Where an article expected in SE is lacking, this is tagged as <gr=no-art>, e.g.
washin worn out clothes dung<sp=down> a<gr=a-prep>
ribba<sp=river><gr=no-art>
Conversely, use of an article which is not expected, or different from the one expected in Standard English, is marked by <gr=art>, e.g.
so him light one<gr=art> cigarette.
Demonstratives in non-Standard English forms or usages are tagged <gr=demon>. These may be features of nonstandard British English as in the example below:
I can't concentrate on this with them<gr=demon> girls watching
In "classical" JC, possession may be shown simply by juxtaposition, with the possessor preceding the possessed. The effect is that the ordering of nouns is as in Standard English, but there is no possessive marker (‘)s: di bwai niem, "the boy’s name".
This structure applies to common nouns but also to pronouns, so we find mi buk "my book", unu kyaa "your car" etc.
Nouns with possessive function are tagged <gr=possessive>, while possessive pronouns have the tag <gr=APPGE>. Examples of both types are found in the Corpus, often together, as in the following examples:
you nuh recognise yuh<gr=APPGE> husband<gr=possessive> sister!
"You don’t recognise your husband’s sister!"
an de neighbour a cuss bout we<gr=APPGE> bedspring noise<gr=possessive>
"And the neighbours cursing about our bedsprings’ noise"
Another strategy for indicating possession in JC is to use the preposition /fi/ (possibly derived from English for, and certainly overlapping in some of its uses). Where the possessor (rarely) is a full noun phrase, the order is possessed - /fi/ - possessor as in Standard English: di buk fi di tiicha, "the book of the teacher". However, where the possessor is a pronoun, the usual construction is /fi/ - pronoun - possessed. There are many examples of this in the Corpus, e.g.
dis a fi<lex=fi> we<gr=APPGE> lan
That a fe<lex=fi> yuh<sp=you><gr=APPGE> business.
A version of this strategy which characterises styles closer to Standard English is to use the preposition of, often reduced to /a/. This never seems to occur with pronouns.
all he had ina him<gr=APPGE> pocket was a box a<gr=a-prep> matches an a pack of cigarette.
Here a is marked simply as a preposition.
5. Prepositions
JC often uses the preposition a where English would use in, at or to. In such cases the word a is tagged with <gr=a-prep>.
Me go a<gr=a-prep> de airport
me lef' Jamaica an' come ah<gr=a-prep> England!
Other JC prepositions may have an archaic flavour in modern SE, for example pan (from upon) which translates some instances of on. In this case the preposition is just identified as spelt in a variant way (i.e. it is not tagged as a grammatical/lexical feature).
so mi start a posse pon<sp=upon> mi likkle corner
6. Tense and Aspect Marking
"Classical" Jamaican Creole uses different tense forms from Standard English, as below.
(a) Mi ron I run (habitually); I ran.
(b) Mi a ron I am running
(c) Mi ena (en+a) ron I was running
(d) Mi en ron I have run; I had run
The en form also has the common variant did, e.g. mi dida / did ron.
The codes for these indicate the form of the tense/aspect marker, e.g.
de day did<gr=did> start out bad
people dem a<gr=a-tense> sing
The tag used here for a is <gr=a-tense> to distinguish it from other uses of a, e.g. as a preposition, <gr=a-prep>.
Where the difference from Standard English is in terms of lack of a tense marker, the tag used is <gr=tense>:
one big foreign chevrolet drive<gr=tense> up an tek <gr=tense> im een
This tag is also used to draw attention to other differences from Standard English in terms of tense or aspect.
Due to the nature of the tense/aspect marking system in JC, there is often nothing to correspond to the Standard English auxiliary verb. The tag <gr=no-aux> is used to indicate this, e.g.
Jeeze, look how long I been calling<gr=no-aux> you.
While JC has no morphologically marked past tense forms corresponding to English (cf. looked, went, drove), in some cases the base form of the JC verb derives historically from an English past tense. Examples are brok (break/broke), lef (leave/left). These are not specifically past forms in JC. The tag used to draw attention to this is <gr=pastform>.
"And you...! Yuh bettah pack yuh bags an' lef'<sp=left><gr=pastform>."
7. Use of the copula
There are several ways in Jamaican Creole to translate the English copula to be.
a. Where the copula functions as an auxiliary verb, the tense/aspect marker a may be used in JC. This would receive the tag <gr=a-tense>, e.g.
people dem a<gr=a-tense> sing
b. The JC equative verb is also a which "regularly connects two nominals" (Bailey 1966 p.32). In this case the word will be tagged <gr=a-copula>, e.g.
him know sey dat dem a<gr=a-copula> duppy.
This form is rare in the Corpus, as forms from Standard English to be are more often found in this function.
c. JC has a separate locative verb de (often spelt deh); this is tagged <gr=de-cop>, e.g.
So is weh de load deh<gr=de-cop>, sah?
d. With true adjectives in JC, no copula is required. This absence is marked by the tag <gr=no-cop> following the adjective, e.g.
Children different<gr=no-cop> now.
Note that this tag is also used elsewhere when a copula expected in Standard English is missing, e.g.
I here<gr=no-cop>, you know, Ethel!
When I at <gr=no-cop> ome
There are a variety of negators in Jamaican Creole. All instances of negation which are different from that expected in Standard English are marked with <gr=neg>. This includes variant spellings of negators like no and not.
Dis poetry nar<gr=neg> put yu to sleep
it does nu<sp=no><gr=negative> good for me
"Nuh<sp=no><gr=negative> tell me seh you nuh<sp=no><gr=negative> recognise yuh husband sister!"
Where double negatives occur, the tag <gr=dblneg> is used, e.g.
Hey bwoy, don’t come cause no<gr=dblneg><gr=negative> fuss y’hear
Where ain’t and its cognate forms appear, they are tagged with <gr=aint>, e.g.
They insisted you ain’t <gr=aint> got a face they can sell
9. Infinitive marking
The English infinitive marker to is in some cases to be translated by fi in JC, but in some cases it is optional in Creole where it is obligatory in English. The tag <gr=no-infmkr> is placed on the verb which in Standard English would be preceded by to, e.g.
an mi nose start run<gr=no-infmkr> wid misery
This case is to be distinguished where possible from verb chaining (see below), where the combination of verbs would not be expected in Standard English.
The word fi is treated as an independent JC lexical item, marked by <lex=fi>. It functions as an infinitive marker in the examples below:
so it did hard fi<lex=fi> understand
heng dem out fi<lex=fi> dry
It can also mark possession in pronouns or nouns:
a fi<lex=fi> mi people pon de crass
fa<sp=for> dis a fi<lex=fi> we fightin style
In some cases it is used as if it were a variant spelling of the English preposition for:
get de children ready fi<sp=for><lex=fi> school
10. Question structure
In formal and written SE yes/no questions, the main verb or auxiliary verb is inverted around the subject of the sentence, e.g. it was nice - was it nice? Similarly with questions introduced by words such as who or what, unless these words are themselves the subject: who has he asked, what did she want, etc. In JC this process is totally absent, so the word order of a question is the same as the order of the corresponding statement..
In the Corpus, the tag <gr=queststr> is used to draw attention to this, e.g.
So how Ethel’s been doing? <gr=queststr>
Frequently, there is nothing to correspond to the Standard English auxiliary verb.
The tag <gr=no-aux> is used to indicate this, e.g.
"You heard<gr=no-aux> about Fluxy? <gr=queststr>"
11. Cleft and predicate cleft
Cleft structures in English are sentences introduced by it is or it was and enable one of the noun phrases to be moved out of the main clause, e.g. the cleft of John saw Mary is:
It was Mary that John saw.
Cleft structures are especially common with questions in JC and have been marked with the tag <gr=cleft>. In the Corpus they occur both with the SE form is and the JC form a as copula.
ah<gr=a-cop> we run t'ings <gr=cleft>
‘So is dat you ina now <gr=cleft>’, she said slowly.
Predicate cleft is a JC construction not found in English, which involves fronting and repeating the main verb, e.g.
a<gr=a-cop> no<neg=no> play we a<gr=a-tense> play<gr=predicate-cleft>
Joke you a joke<gr=predicate-cleft>, man!
12.1 say constructions
13.1 Verb chaining.
Hey, bwoy, don't come cause<gr=vbchn>
nofuss y'hear!
Structures such as relative clauses which might be expected to include
a clause marking element (e.g. conjunction, complementiser or relative
pronoun) in Standard English, are marked <gr=no-clausemkr> when
such an item does not appear, e.g.
Appendix 5: A Short Bibliography of Works on Creole in Britain
(Only works dealing with English-lexicon creoles are included here.
The emphasis is on books, and many shorter but valuable papers and articles
have been omitted).
Edwards, V. (1979): The West Indian language issue in British schools: challenges and responses. London, Routledge and Kegan Paul.
Gilroy, P. (1987): 'There Ain't No Black in the Union Jack'. London, Hutchinson.
Hewitt, R. (1986) White Talk, Black Talk. Cambridge University Press.
Rampton, B. (1995) Crossing. Language and Ethnicity among adolescents. London, Longman.
Sebba, M. (1993) London Jamaican: language systems in interaction. London, Longman.
Sutcliffe, D. (1982a): British Black English. Oxford, Blackwell.
Sutcliffe, D. and Wong A. (editors) (1986): The language of the Black experience. Oxford, Blackwell.
Sutcliffe, D. with John Figueroa (1992): System in Black Language. Clevedon, Avon: Multilingual Matters.
Wells, J. C. (1973): Jamaican pronunciation in London. Oxford, Blackwell.
Bungarten, Theo (1979): Das Korpus als empirische Grundlage in der Linguistik und Literaturwissenschaft. in Bergenholtz and Schaeder (eds), 28-51.
Cassidy, F.G. and Le Page, R.B. (1967/1980) Dictionary of Jamaican English. Cambridge, Cambridge University Press.
Rieger, Burghard (1979): Repräsentativität: von der Unangemessenheit eines Begriffs zur Kennzeichnung eines Problems linguistischer Korpusbildung. In Bergenholtz, Henning and Burkhard Schaeder (eds.) (1979): Empirische Textwissenschaft, pp. 52-70. Scriptor.
Sebba, Mark (1989): The Adequacy Of Corpora. Unpublished M.Sc. dissertation, Centre for Computational Linguistics, University of Manchester Institute of Science and Technology.
