The Corpus of Written British Creole:
a User's Guide
by Mark Sebba, Sally Kedge and Susan Dray
1999
 
 

 
Contents

1. What is the Corpus of Written British Creole?

2. Principles of selecting materials for the Corpus

3. Annotating the Corpus of Written British Creole

4. Analysing the Corpus

5. Using the Corpus of Written British Creole

6. Obtaining a copy of the Corpus

Appendix 1: List of texts contained in the Corpus of Written British Creole (1998)

Appendix 2: Headers in the CWBC Corpus

Appendix 3: Language tags in the CWBC Corpus

Appendix 4: Further notes on the grammar tag codes in the Corpus of Written British Creole

Appendix 5: A Short Bibliography of Works on Creole in Britain

References


  1. What is the Corpus of Written British Creole?
  2. A new written language is taking shape in Britain. After centuries of being mainly a spoken language (or rather a group of similar, spoken languages) which only occasionally was written down, Caribbean Creole has begun to appear regularly in print in Britain.

    Until very recently, most published Creole took the form of written versions of songs or poems originally spoken to music ("Dub poetry"). These often appear on the album covers/inserts of poets like Linton Kwesi Johnson and Jean Binta Breeze. However, some poetry is written in Creole and published without being performed first, and in the last few years, a number of novels have appeared which use Creole extensively in dialogue and even for first person narrative. There is also an unknown amount of personal writing - letters for example - in Creole, which is never intended to be published.

    Thus we have the emergence in Britain of a written variety, in the absence of any clear or authoritative norms or "standards". This presents an unusual opportunity to several different groups of researchers. Firstly, to those interested in Caribbean Creole and its development, especially its development outside the Caribbean; secondly, to corpus linguists, to set up a corpus of a non-standard written language variety (a task which has to date scarcely been undertaken); thirdly, to historical sociolinguists, who have a rare chance to see an unstandardised language developing its written form - a stage which English reached at least five centuries ago.

    The Corpus of Written British Creole was compiled at Lancaster University with financial support from the British Academy (Small Personal Research Grant no. 05-012-4670, grantholder: Mark Sebba). Most of the searching for texts, permission clearance and inputting work was carried out by Sally Kedge in 1995. In 1998, additional work, including additional tagging and checking for errors, was done by Susan Dray.

    At the time the original proposal was formulated, we believed that due to the relatively small volume of texts being produced in Creole in Britain, it was theoretically possible to collect every text of that description and still have a corpus of a manageable size. We knew that in practice that would not be possible, and in the event things were even more difficult than we had expected. This was partly because of the volume of informal writing (unpublished, and never intended to be published) which was simply not accessible; for example, personal letters written from one Creole speaker to another. Partly it was also for technical reasons of copyright and permission which meant that even where we had physical possession of a text, it could not be included in the corpus in machine-readable form.

    In fact we were able to collect many more texts than have been placed in the Corpus (here we are using "Corpus" to mean the collection of machine-readable and annotated texts, rather than the whole collection of books, pamphlets and other pieces of writing which we have built up.) These texts exist on paper but are not availabe in machine-readable format. Time and resources have been one of the main factors preventing the expansion of the Corpus. Inputting the texts (transferring them from paper to machine-readable form) is in itself a time-consuming task. Tagging the texts with spelling, grammatical and discourse information has required even more time and effort. But inputting and tagging cannot even begin until the authors or publishers of printed material have been contacted and have given permission to include their work in the Corpus. Some authors are difficult or impossible to contact, while a small number of others denied permission to use their work.

    The Corpus of Written British Creole is very small in corpus linguistic terms (around 12,000 words - even the early "small" computer corpora contained one million words). This raises the question of how "representative" the Corpus of Written British Creole is. Corpus linguists typically concern themselves with the question of "representativeness" of a corpus; in other words, how well the sample of texts in the corpus reflects the language found in a particular "universe" of texts (i.e. the totality of texts in that language). The representativeness of a corpus in not easy to determine, partly because, as Rieger (1979:66) points out,

    a sample [...] can only be characterised as representative when so much is known about the universe from which it comes that the construction of this sample is no longer necessary.

    In the case of the Creole corpus, there are additional complicating factors. Exactly what should count as the "universe" is not very clear, as Creole does not exist in a separate world from Standard English. Mixing of the two is very common. Therefore, should the "universe" include all texts which contain both Standard English and Creole, as well as all texts purely in Creole?

    A less demanding requirement which can be made of a corpus is that it be exemplary. According to Bungarten (1979:42-43), "a corpus is exemplary, when its representativeness is not proven, but less formal arguments, like evident cohesiveness, linguistic judgments of a comptetent researcher, professional consensus, textual and pragmatic indicators, argue that the corpus may reasonably function as representative".

    Because of the limitations mentioned above, the best that we could hope to establish is an exemplary corpus of written British Creole, which according to the "linguistic judgments of a comptetent researcher" is sufficiently wide-ranging and yet sufficiently cohesive that it could be considered representative of the language in its current state. Yet even that is not an easy goal. Thre are substantial difficulties entailed in determining what may reasonably be taken to "represent" an unstandardised language where the boundaries between it and its related standard - Standard English in this case - are so fuzzy. It is not clear that any researcher is really competent to make this linguistic judgement at the present time.

    This is not to say that the whole exercise of setting up a Corpus of Written British Creole is pointless. On the contrary, we hope it may be a significant step to understanding better the nature of written British Creole. What is important is for researchers to understand the limitations of the Corpus. It should not be taken as representative of "Creole" or "written Creole in Britain" or any other such designation, without first reflecting on exactly what this might mean and the complexity of the underlying issues. It should not be assumed that what holds good for the language of the Corpus is necessarily true of Creole as written elsewhere or as spoken in the home or in the street. It would be wrong to assume that the Corpus represents one homogeneous variety. To help avoid this assumption, we have done our best to provide labels for each extract which will help researchers to locate it in terms of its time of writing, the author, the character whose speech is quoted, and its regional origin. Users of the corpus would do well to bear these factors in mind.

    The Corpus, seen in this light, is a diverse collection of texts, which have in common that they contain an element of Creole. It is not a perfect sample, partly because that is an unrealistic goal, and partly due to lack of resources. It is simply the best which we could do with what we had at our disposal. The Corpus is available as a resource for researchers who want to study British Creole for whatever reason, and our hope is that it may provide the basis for fruitful research.

    Back to contents
     

2. Principles of selecting materials for the Corpus
 
    1. that the texts have been written and / or published in Britain
    2. that all or part of the text is a written representation of English-lexicon Caribbean Creole (Patois / Patwa / Nation Language)
    Identifying Creole was not particularly problematic as there are several reliable descriptions available and it was generally clear which parts of a text contained Creole and which did not. The requirement that texts be written and / or published in Britain was in practice much more problematic. Where a text was published in Britain this was usually easy to determine, but some of the writing we obtained was unpublished. Also, some texts were published in Britain by writers known to have spent much of their lives in the Caribbean. Though the text would technically be "British" its language might well reflect Caribbean rather than British usage. Should these be included? And what of the cases where we had no biographical information at all about the writer? In a small corpus, like this, the potential for producing a skewed sample is high. In the end we were sometimes forced to make arbitrary judgements. We concentrated on including writers who are known either to be British born or to have spent most of their formative years in Britain. Even then, we have not been able to exclude with certainty texts which do not relate to Britain, e.g. where the speech styles portrayed are not intended to reflect British usage.

    We had also to make decisions about the types of texts which would be included in the corpus. It was decided that in principle no genre would be excluded. However, it was very obvious that it would be impossible to obtain a balance of genres, as in more conventional types of corpus building.

    The Corpus currently contains texts of the following types:

    1. Poems
    2. Extracts from novels and other fiction
    3. Plays
    4. Miscellaneous, including advertisements and grafitti
    Many of the works from which sections appear in the Corpus are actually long pieces of prose, e.g. novels. In accordance with the agreement between the researchers and the authors, the Corpus contains only extracts of these works. Normally each extract will be at least partly in Creole; in other words, parts of the work which are wholly in Standard English have not been included in the Corpus. This means that the Corpus is of no use, for example, for studying what proportion of a novel is in Standard English or in Creole. However, since many of the extracts include both Standard English and Creole, the Corpus can be used to study how these two interact with each other, for example at the sentence level.

    The decision to limit the Corpus to including writers who "either are known to be British born or to have spent most of their formative years in Britain" was taken on two grounds. One was the desire to be as representative as possible of one particular emergent variety of Creole. The other reason for limiting the Corpus to writers with a strong British connection was the feeling that a distinctive British tradition of writing Creole might be emerging. This may well have turned out to be wrong, as it involves the implicit assumption of a break in the tradition of writing Creole between the Caribbean and Britain. In fact, since many published authors have lived in both places, and to some extent texts published in the Caribbean are also available in Britain and vice versa, it is more reasonable to think of an unbroken tradition of Creole writing which unites the Caribbean with Britain. On the other hand, it may well be that differences will emerge over time. It is hoped that the Corpus will grow over time as new texts are added to it. Each extract in the Corpus contains a tag <year=xxxx> containing the year of publication or writing, so that it will be possible to track spelling, lexical or grammatical changes by comparing the dates associated with the different usages.

    Back to contents
     

3. Annotating the Corpus of Written British Creole
    The Corpus has been compiled on the understanding that it is a resource for different researchers who have different methodologies, theoretical orientations and research questions in mind. In order to increase the usefulness of the Corpus, certain words and sentences within it have been annotated or tagged with codes which are enclosed within angles brackets, thus: <tag>

    The choice of which elements to tag and what tags to use is one of the most difficult issues in constructing a corpus like this one. It is not simply a practical issue; it involves theoretical assumptions about the nature of the data. For example, should it be assumed that there is just one language present in the data, or two, or several - and if there is more than one, should all be treated in the same way for tagging purposes, or should they be marked in different ways?

    Tagging is a developing area of corpus linguistics. Given that in a well-studied standard language like written Standard English, the grammatical classes of words are not too controversial in themselves, much of the current research in this area is devoted to developing automatic methods of tagging. For the Corpus of Written British Creole the problem is a much more basic one, of simply deciding what should be tagged and how.

    In a typical "monolingual" tagged corpus, each individual word in the corpus would be tagged with one or more tags relating to the grammatical, semantic or other properties of the word. For example, in the LOB corpus, a sentence like

    would be tagged as follows:

    We felt that tagging like this was not practical for the Corpus of Written British Creole. Firstly, as all tagging would have to be manual it was simply beyond our resources. Secondly, there was not yet a sufficiently well developed descriptive grammar for the Creole to make the assignment of tags straightforward; too many arbitrary and problematic tags would have to be applied. Thirdly, it was not obvious that this type of tagging would be of much use to researchers. In fact, it was more likely that a detailed descriptive grammar would develop out of the Corpus, using it as a resource, rather than the other way around.

    We decided to use a set of contrastive tags which would mark differences in spelling, lexis, and discoursal and grammatical structure between Standard English and the language of the Corpus texts. In other words, tags have mainly been used only where the word or structure encountered would not be expected in a text which was in Standard English. This greatly simplified the work of tagging the corpus, though at a cost: the tagging appears to focus on the language of the Corpus as a variety of English, rather than a language in its own right. It would be very unfortunate if anyone took this to imply either that Creole is in some way inferior to English, or that the purpose of the Corpus is to draw attention to "mistakes" or "deviant" grammar. That is absolutely not the intention, and would go against the drift of both Creole Studies and Linguistics in general over the last half century.

    In adding contrastive tags to the data, we hope we have done the form of tagging which will be of use to most researchers. The general rule we have applied has been:

    Where a form (word, structure etc.) is identical to an acceptable Standard English form, no tag has been added. Where the form is different from that expected in Standard English, a tag has been added which flags the nature of that difference.

    So, for example, extract 20 from text 1 in the corpus is tagged as follows:

    'I have one<gr=art> room in yah<lex=yah> specially<sp=especially> for you,

    man<disc=man>.' Joseph switched to business. 'So is weh<sp=where> de<sp=the>

    load deh<gr=de-cop><gr=cleft><gr=queststr>, sah<sp=sir>?'<bookid=01><speakerid

    =joseph><year=1992><extractid=20><pageno=9>

    In this extract, we can see that an entire sentence, Joseph switched to business, has no tags at all. This is because it is indistinguishable from written Standard English. Several words in the other sentences of the extract, for example, I, have, room, load, so, are also untagged, for the same reason. However, most of the other words are tagged, to signal spelling differences (weh<sp=where>), grammar differences (deh<gr=de-cop>), lexical (yah<lex=yah>)or discoursal features (man<disc=man>) which characterise Creole in opposition to Standard English.

    The detailed structure of the tagset and examples of how all the main tags are used will be found in Appendix 3: Language Tags in the CWBC Corpus.

    Back to contents

4. Analysing the Corpus
    Whether or not you use the tags already included in the Corpus, in order to exploit the Corpus to the full you will need to use corpus tools in the form of a browser or concordancer. Several such tools are now commercially available. Some information about corpus analysis tools can be found on the World Wide Web at:

    http://www.comp.lancs.ac.uk/computing/research/ucrel

    Longman Mini-Concordancer is an easy-to-use concordancing package which has sometimes been used to search the Corpus but unfortunately it can only process a limited amount of text and the Corpus is already too large to be loaded into it in full. A more advanced corpus tool like WordSmith is therefore recommended. If you cannot obtain any concordancing software, it is also possible to do searches using an ordinary word processes.

    A concordancer will enable you to find every instance of a particular word or tag which occurs within the Corpus. For example, if you want to find every example of the tense/aspect marking particle a in the corpus, you would make your concordancer do a concordance on the tag <gr=a-tense>. The output could look something like this:

    celebrate. Big t'ings<sp=things> ah<gr=a-tense> gwan<sp=going on> 'bout
    ress> we<gr=APPGE> programme, we ah<gr=a-tense> go shoot first, ask qu
    er. "Work?! Where? Here? Joke you a<gr=a-tense> joke<gr=?predicate-cle
    e<gr=PPIS1> hear<gr=tense> people a<gr=a-tense> talk bout<sp=about> whe
    -to> bout<sp=about> him<gr=PPHS1> a<gr=a-tense> go stay up dat<sp=that>
    p=about> 12 o'clock him<gr=PPHS1> a<gr=a-tense> go up a<gr=a-prep> Hel
    dat<sp=that> him<gr=PPHS1> could a<gr=a-tense> see through dem<gr=PPHO
    actid=72><pageno=5> Joe Samuel a<gr=a-tense> daed<typo=dead> wid<sp=
    night was darkan<sp=and> no moon a<gr=a-tense>shine.<bookid=23><year=

    Notice that in the above selection, two spelling variants of the a particle have been retrieved. Your concordancer will also allow you to view more of the context of each sentence; for example, if the second line of the above selection seemed to merit further investigation, you would be able to view the whole extract:

    "To all informer man<sem=man> who waan<sp=want to> distress<sem=distress> we<gr=APPGE> programme, we ah<gr=a-tense> go shoot first, ask question<gr=?no plural marker>
    later, seen<sem=seen>!" Easy-Love, his funki<sp=funky><lex=funky> dreds<sp=dreads><lex=dreads> bouncing, announced to nobody in particular or whoever was
    listening<bookid=020><speakerid=EasyLove><year=1994><extractid=13><pageno=13>.

    Different concordancers will do this in slightly different ways. Note that each extract contains information at the end about what piece of writing it was taken from, the identity of the speaker/narrator, the date of publication and the page location where it was found. Thus by finding the end of any extract, it is possible to find these details.

    If on the other hand, you wanted to find all the variant spellings of the Standard English word nothing, you would do a concordance on <sp=nothing> with a result like the following:

    night; you didn't say notten<sp=nothing><gr=double-negative> about com
    . Man, you didn't say notten<sp=nothing><gr=double-negative> in that
    r business. Ain't got notten<sp=nothing> to live for if I ain't got yo
    in't got you to love! Notten<sp=nothing> at all. <bookid=022><speakeri
    worry 'bout<sp=about> notten<sp=nothing>, man!<bookid=022><speakerid=U
    live. Rastas don't see notin<sp=nothing><gr=double-negative> wrong with
    Dem<gr=PPHS2> can't do notin<sp=nothing><gr=double-negative>.<bookid=
    r=?neg> haf<sp=have> nutting<sp=nothing> fi<lex=fi> wurry<sp=worry> bo
    im<PPHO1> seh<sp=say> nuttin<sp=nothing> nuh<sp=no><gr=negative> bus<s
    negative> sey<sp=say> nuttin<sp=nothing><gr=double-negative>. <bookid=

    Back to contents
     

5. Using the Corpus of Written British Creole
 
    1. that they are used only for non-profit educational purposes by researchers
    2. that they are not used in such a way as to infringe the author's/publisher's copyright
    3. that the original author and publisher of the work receives full acknowledgement if any extract from it is used in a publication
    Subject to these conditions researchers may use the material in the Corpus for the purposes of private research.
Back to contents

  6. Obtaining a copy of the Corpus

    You can obtain a copy of the corpus for research purposes by contacting the address below. Please send us your comments, queries, or texts for inclusion.

    Dr. Mark Sebba (Lecturer in Linguistics, Lancaster University)
    Department of Linguistics and English Language,
    LancasterUniversity,
    Lancaster LA1 4YT
    Great Britain

    Tel: 01524 592453 (from outside Britain: +44 1524 592453) E-mail: M.Sebba@lancs.ac.uk

    A request to users of the Corpus

    We are interested in knowing about the uses to which the Corpus is being put and the type of research you are doing. We are continually collecting further material to add to the Corpus and would be grateful for any contributions or information about possible texts.

    Back to contents
     



    Appendix 1: List of texts contained in the Corpus of Written British Creole (1998)
     
    Text type First name Last name Title Date File name (marked-up) File name (plain text) Size (words)
                   
    poetry Jean Breeze Tracks (part) 1989 marku47 plain47. 2080
    poetry Linton Kwesi Johnson Tings and Times (part) 1991 marku53.txt plain53.txt 383
    poetry Sandra Mundle Ole Woman 1995 marku 50.txt plain50.txt 156
    poetry Benjamin Zephaniah City Psalms (part) 1992 marku 13.txt   785
    poetry Benjamin Zephaniah City Psalms (part) 

     

    1992   plain13.txt 8185
    novel Victor Headley Yardie 1992 marku1.txt plain1.txt 1927
    novel Karline Smith Moss Side Massive 1995 marku 20.txt plain20.txt 6396
    play Randhi McWilliams God, Man and Sister Geraldine 1995 marku 22.txt plain22.txt 3113
    student writing 

    (school compilation)

    G.M. Richards A Fe We Ting 

     

    ca. 1980 marku 23.txt plain23.txt 3013
    advertisement anon   Dragon Stout Advertisement 1995 marku49.txt plain49.txt  
    advertisement anon   Desnoes and Geddes advertisement 1995 marku48 plain48 5
    cartoon anon   Nuff Agonies 1995 marku54.txt plain54.txt 32
    educational materials Mike Read Where do I belong inna Inglan 1984/8 marku24 plain.24 1618
    newspaper article anon   Weekly Gleaner extract - Eating Out 1994 marku45.txt plain45.txt 573
    graffiti anon   Denise and. Cheryl 1984 marku46.txt plain46.txt 5
     

    Back to contents



    Appendix 2: Headers in the CWBC Corpus

    Each corpus entry begins with a header which contains certain information about the text which follows.

    Each item of information has the form of <x = y> and usually is self-explanatory, for example:

    <Bookid=013>
    <booktype=poems>
    <title=City Psalms>
    <date=1992>
    <authorname=Benjamin Zephaniah>
    <authorcountry=Britain>
    <authordob=1958>

    This means that the particular piece of writing (book, poem, play etc.) has, for the purposes of the Corpus, been given the unique "identity number" 013 (<Bookid=013>) (Note that <Bookid= is used for poems, plays etc., not just books). This particular corpus entry is a book of poems called City Psalms, published in 1992 by Benjamin Zephaniah, who, as far as we have been able to determine, is (mainly) British and was born in 1958.

    Sometimes this header is further augmented by information about individual characters in a book or play, for example:

    Passenger<speakerage=><speakercountry=Jamaica><speakerresidence=Jamaica>

    This means that the character called "Passenger" is stated or implied somewhere in the text to be a Jamaican who lives in Jamaica. His age is not given. This information is given where available, to help interpret the speech characteristics of individuals represented in the texts.

    Each individual extract from a longer text within the corpus ends with a string of tags which repeat, in part, the information given in the header. For example:

    'So is holiday<gr=no-art> you come<gr=no-aux> for<gr=cleft><gr=queststr>, or you plan to settle on<x3=on> yah<lex=yah><gr=queststr>?' she enquired.<bookid=01><speakerid=donna><year=1992><extractid=46><pageno=20>

    The tags at the end of this extract indicate that this is an extract from corpus entry 01, which was published in 1992. They also give a unique number referring to this extract (<extractid=46>) and its page location (<pageno=20>), as well as the name of the character who speaks the words in quotation marks (<speakerid=donna>). So even when the extract is separated from its header (e.g. in a list which results from a search using a concordancer) it is still possible to trace the source of the extract and to see some of the relevant details without checking the header.
     

    Back to contents
     



     Appendix 3: Language tags in the CWBC Corpus
     1. Form of the tags
     
    1. all tags are contained in angle brackets: <>
    2. all tags are introduced by a code which indicates the level of analysis, followed by an equal sign =.

    3. Spelling/orthography: <sp=

      Grammar/syntax: <gr=

      Lexicon: <lex=

      Semantic: <sem=

      Discourse: <disc=

    Generally, <sem= has been reserved for closed class items or grammatical functors which are different semantically from their Standard English counterparts (if any), while <lex= is used for lexical items which are either not found in British or North American standard varieties, or are used in a different sense.

    2. The codes

    The string of characters following the equal sign is a code which indicates more precisely the nature of the feature which is of interest.

    2.1 Spelling: the term after the equal sign is the standard spelling of the same word, e.g.

    bwai<sp=boy>

    nuff<sp=enough>

    This allows the researcher to trace all variant spellings of a word by searching on the standard spelling.

    2.2 Lexicon: this is used to tag lexical items which are not expected in Standard British English. The term after the equal sign is:

    1. the Dictionary of Jamaican English headword of the word in question, if there is one; or
    2. the Standard English spelling of the singular form if there is one; or
    3. simply the word as it stands in the text,
    e.g.

    pickney<lex=pickney> (DJE headword form)

    baddap<lex=bad-up> (Standardised spelling)

    skanking<lex=skanking> (The word itself)

    What follows the equal sign is not intended as a gloss for the item in question. The use of a standardised form following the equal sign is intended to help in searching, in the event that there are several different spellings or forms of the same word.

    2.3 Semantic: the term after the equal sign is an indication that the meaning differs from the Standard English meaning of the same item, or, in the case of specific items which serve as grammatical functors, the headword form of that word. In the example below, the lexical item Babylon is not specific to Creole (hence not <lex= > but does not have the same meaning as in Standard English. Here it means "the police". However, the tag itself does not provide information about the meaning; it only draws attention to the fact of a specifically Creole meaning.

    Dat de babylon<sem=babylon> hol' t'ree breders<sem=brothers>

    Warning: the tag <sem= has not been used in the corpus with complete consistency. You may come across mistakes.

    2.4 Discourse: This is a small, closed set of specific discourse markers. The term following the equal sign describes the nature of the spoken expression being tagged, e.g.

    "Boo-yah<disc=excl>! What happen<disc=greeting>, Blood<disc=address>?

    The set of discourse marking tags used in the Corpus is as follows:

    <disc=greeting> for ritualised greetings: Hail<disc=greeting>, Landlady,{...}

    <disc=excl> for exclamations: "Bwai<disc=excl><sp=boy>, him must be a millionaire," Sister Jones stated

    <disc=address> for terms of address used to another participant in the conversation. The commonest term in this category is man:

    We ah go look after you, man<disc=man><disc=address> .

    <disc=questtag> for question tags which are characteristic of non-standard English, especially innit (which is associated with London speech) and tags of JC origin like seen? : You got every last detail planned, innit <disc=questtag> Cliff?"

    Me name Joseph, seen<disc=questtag> ?

    <disc=expletive> is used for words which represent swearing. Some of these are obscene and/or highly offensive to some people.

    Raas<lex=raas>claat<sp=cloth><lex=raas-claat><disc=expletive>!

    <disc=man> for the word man used in a discourse function rather than as a common noun. As man appears to have a number of distinct discourse functions, a second <disc= tag may be used to indicate the function of this particular instance of man:

    You must have been on Mars or sump'n man<disc=man><disc=address>.

    Man<disc=man><disc=excl>, Sidney Higgins, you turn<sem=turn><gr=no-aux> comic<gr=no-art> now<gr=queststr>?

     
    2.5 Grammar: the grammar codes are more complex than the other types.

    Some items have been coded using codes which only indicate in general terms the category of the difference between the item and the Standard English equivalent. These are in capital letters, for example:

    <gr=TENSE>

    This indicates that the tense marking, or the interpretation of the tense of the tagged verb, is different from the equivalent in Standard English.

    <gr=STRUCTURE>

    This indicates that the structure of the sentence preceding the tag is in some way unexpected or noteworthy, but gives no further information.

    We have tried to use these vague tags sparingly, but feel they may be useful to the analyst in picking out interesting words or strings.

    The more usual form of the code following <gr= is an indicator of the exact nature of the grammatical difference between the tagged item/string and its Standard English equivalent. For example:

    "You heard<gr=no-aux> about Fluxy?"

    Big t'ings ah<gr=a-tense> gwan 'bout yah!

    In some cases, CLAWS 2 grammatical tags have been used. These are used to mark the part-of-speech category of the tagged word, but only when it is different in form from the one expected in Standard English. For example,

    Me<gr=PPIS1> have 'nuff woman...

    The tag <gr=PPIS1> on the word "me" indicates that it represents the first person singular subject pronoun. However, in the corpus the corresponding pronoun in the sentence

    I have 'nuff woman..

    would not receive any tag, because it has the Standard English form.

    A full list of the CLAWS pronoun tags is given below in Appendix 4, Section 1. However, it will usually be obvious what the meaning of the tag is and why the particular word has been tagged.

    Back to contents

 

    The forms given below are the personal pronoun forms of "Classical" Jamaican Creole which do not show distinctions of gender or case. They are used for subject, object and possessive. There are separate singular and plural forms in second person: (cf. Bailey (1966):22-24).

    singular                 plural

    1 mi +                   wi ++

    2 yu                     unu +++

    3 im (s/he) +        dem +

    i (it)

    The forms marked + differ from the Standard English subject forms. Those marked ++ differ from the the Standard English object forms. The second person plural form unu is not found in Standard English at all, and all the forms above differ from the forms used in Standard English for the possessive.

    Where a pronoun form used in the Corpus is different from the form which would be expected in Standard English, it has been tagged using the appropriate CLAWS tag which indicates person and number, or in the case of possessive pronouns, possession.

    e.g.

    wi<gr=PPIS2>

    CLAWS tags for pronouns

    I             PPIS1
    me         PPIO1
    my         APPGE
    you        PPY (singular)
    you        PPY2 (plural)
    NB - as this is not a distinction made in SE, this is not a CLAWS 2 tag

    your         APPGE
    he/she       PPHS1
    his/her      APPGE
    him/her     PPHO1
    we            PPIS2
    our           PPIO2
    mine         PPGE
    yours        PPGE
    them         PPHO2
    they         PPHS2
    Myself               PPX1
    Yourself            PPX1
    Him/her/itself     PPX1
    Yourselves        PPX2
    Ourselves          PPX2
    Theirselves        PPX2

     2. Plural Marking

    JC does not mark the plural of nouns, except in the case of (usually) animate nouns, which may be followed by the affix -dem. Such items are tagged by <gr=plmkr>, e.g.

    hearken to de people dem<gr=plmkr> voice

    Where a plural form is expected in Standard English, but is absent in JC, the tag

    <gr=no-plmkr> is used in the Corpus, e.g.

    Me have 'nuff<sp=enough> woman<gr=no-plmkr>

    3. Articles and Demonstratives

    The use of these is different in Jamaican Creole and Standard English. Where an article expected in SE is lacking, this is tagged as <gr=no-art>, e.g.

    washin worn out clothes dung<sp=down> a<gr=a-prep>

    ribba<sp=river><gr=no-art>

    Conversely, use of an article which is not expected, or different from the one expected in Standard English, is marked by <gr=art>, e.g.

    so him light one<gr=art> cigarette.

    Demonstratives in non-Standard English forms or usages are tagged <gr=demon>. These may be features of nonstandard British English as in the example below:

    I can't concentrate on this with them<gr=demon> girls watching

    Back to contents
     

    4. Possessives

    In "classical" JC, possession may be shown simply by juxtaposition, with the possessor preceding the possessed. The effect is that the ordering of nouns is as in Standard English, but there is no possessive marker (')s: di bwai niem, "the boy's name".

    This structure applies to common nouns but also to pronouns, so we find mi buk "my book", unu kyaa "your car" etc.

    Nouns with possessive function are tagged <gr=possessive>, while possessive pronouns have the tag <gr=APPGE>. Examples of both types are found in the Corpus, often together, as in the following examples:

    you nuh recognise yuh<gr=APPGE> husband<gr=possessive> sister!

    "You don't recognise your husband's sister!"

    an de neighbour a cuss bout we<gr=APPGE> bedspring noise<gr=possessive>

    "And the neighbours cursing about our bedsprings' noise"

    Another strategy for indicating possession in JC is to use the preposition /fi/ (possibly derived from English for, and certainly overlapping in some of its uses). Where the possessor (rarely) is a full noun phrase, the order is possessed - /fi/ - possessor as in Standard English: di buk fi di tiicha, "the book of the teacher". However, where the possessor is a pronoun, the usual construction is /fi/ - pronoun - possessed. There are many examples of this in the Corpus, e.g.

    dis a fi<lex=fi> we<gr=APPGE> lan

    That a fe<lex=fi> yuh<sp=you><gr=APPGE> business.

    A version of this strategy which characterises styles closer to Standard English is to use the preposition of, often reduced to /a/. This never seems to occur with pronouns.

    all he had ina him<gr=APPGE> pocket was a box a<gr=a-prep> matches an a pack of cigarette.

    Here a is marked simply as a preposition.

     5. Prepositions

    JC often uses the preposition a where English would use in, at or to. In such cases the word a is tagged with <gr=a-prep>.

    Me go a<gr=a-prep> de airport

    me lef' Jamaica an' come ah<gr=a-prep> England!

    Other JC prepositions may have an archaic flavour in modern SE, for example pan (from upon) which translates some instances of on. In this case the preposition is just identified as spelt in a variant way (i.e. it is not tagged as a grammatical/lexical feature).

    so mi start a posse pon<sp=upon> mi likkle corner

    6. Tense and Aspect Marking

    "Classical" Jamaican Creole uses different tense forms from Standard English, as below.

    (a) Mi ron I run (habitually); I ran.

    (b) Mi a ron I am running

    (c) Mi ena (en+a) ron I was running

    (d) Mi en ron I have run; I had run

    The en form also has the common variant did, e.g. mi dida / did ron.

    The codes for these indicate the form of the tense/aspect marker, e.g.

    de day did<gr=did> start out bad

    people dem a<gr=a-tense> sing

    The tag used here for a is <gr=a-tense> to distinguish it from other uses of a, e.g. as a preposition, <gr=a-prep>.

    Where the difference from Standard English is in terms of lack of a tense marker, the tag used is <gr=tense>:

    one big foreign chevrolet drive<gr=tense> up an tek <gr=tense> im een

    This tag is also used to draw attention to other differences from Standard English in terms of tense or aspect.

    Due to the nature of the tense/aspect marking system in JC, there is often nothing to correspond to the Standard English auxiliary verb. The tag <gr=no-aux> is used to indicate this, e.g.

    Jeeze, look how long I been calling<gr=no-aux> you.

    While JC has no morphologically marked past tense forms corresponding to English (cf. looked, went, drove), in some cases the base form of the JC verb derives historically from an English past tense. Examples are brok (break/broke), lef (leave/left). These are not specifically past forms in JC. The tag used to draw attention to this is <gr=pastform>.

    "And you...! Yuh bettah pack yuh bags an' lef'<sp=left><gr=pastform>."

    7. Use of the copula

    There are several ways in Jamaican Creole to translate the English copula to be.

    a. Where the copula functions as an auxiliary verb, the tense/aspect marker a may be used in JC. This would receive the tag <gr=a-tense>, e.g.

    people dem a<gr=a-tense> sing

    b. The JC equative verb is also a which "regularly connects two nominals" (Bailey 1966 p.32). In this case the word will be tagged <gr=a-copula>, e.g.

    him know sey dat dem a<gr=a-copula> duppy.

    This form is rare in the Corpus, as forms from Standard English to be are more often found in this function.

    c. JC has a separate locative verb de (often spelt deh); this is tagged <gr=de-cop>, e.g.

    So is weh de load deh<gr=de-cop>, sah?

    d. With true adjectives in JC, no copula is required. This absence is marked by the tag <gr=no-cop> following the adjective, e.g.

    Children different<gr=no-cop> now.

    Note that this tag is also used elsewhere when a copula expected in Standard English is missing, e.g.

    I here<gr=no-cop>, you know, Ethel!

    When I at <gr=no-cop> ome

    Back to contents

    8. Negation

    There are a variety of negators in Jamaican Creole. All instances of negation which are different from that expected in Standard English are marked with <gr=neg>. This includes variant spellings of negators like no and not.

    Dis poetry nar<gr=neg> put yu to sleep

    it does nu<sp=no><gr=negative> good for me

    "Nuh<sp=no><gr=negative> tell me seh you nuh<sp=no><gr=negative> recognise yuh husband sister!"

    Where double negatives occur, the tag <gr=dblneg> is used, e.g.

    Hey bwoy, don't come cause no<gr=dblneg><gr=negative> fuss y'hear

    Where ain't and its cognate forms appear, they are tagged with <gr=aint>, e.g.

    They insisted you ain't <gr=aint> got a face they can sell

    9. Infinitive marking

    The English infinitive marker to is in some cases to be translated by fi in JC, but in some cases it is optional in Creole where it is obligatory in English. The tag <gr=no-infmkr> is placed on the verb which in Standard English would be preceded by to, e.g.

    an mi nose start run<gr=no-infmkr> wid misery

    This case is to be distinguished where possible from verb chaining (see below), where the combination of verbs would not be expected in Standard English.

    The word fi is treated as an independent JC lexical item, marked by <lex=fi>. It functions as an infinitive marker in the examples below:

    so it did hard fi<lex=fi> understand

    heng dem out fi<lex=fi> dry

    It can also mark possession in pronouns or nouns:

    a fi<lex=fi> mi people pon de crass

    fa<sp=for> dis a fi<lex=fi> we fightin style

    In some cases it is used as if it were a variant spelling of the English preposition for:

    get de children ready fi<sp=for><lex=fi> school

    10. Question structure

    In formal and written SE yes/no questions, the main verb or auxiliary verb is inverted around the subject of the sentence, e.g. it was nice - was it nice? Similarly with questions introduced by words such as who or what, unless these words are themselves the subject: who has he asked, what did she want, etc. In JC this process is totally absent, so the word order of a question is the same as the order of the corresponding statement..

    In the Corpus, the tag <gr=queststr> is used to draw attention to this, e.g.

    So how Ethel's been doing? <gr=queststr>

    Frequently, there is nothing to correspond to the Standard English auxiliary verb.

    The tag <gr=no-aux> is used to indicate this, e.g.

    "You heard<gr=no-aux> about Fluxy? <gr=queststr>"

    11. Cleft and predicate cleft

    Cleft structures in English are sentences introduced by it is or it was and enable one of the noun phrases to be moved out of the main clause, e.g. the cleft of John saw Mary is:

    It was Mary that John saw.

    Cleft structures are especially common with questions in JC and have been marked with the tag <gr=cleft>. In the Corpus they occur both with the SE form is and the JC form a as copula.

    ah<gr=a-cop> we run t'ings <gr=cleft>

    'So is dat you ina now <gr=cleft>', she said slowly.

    Predicate cleft is a JC construction not found in English, which involves fronting and repeating the main verb, e.g.

    a<gr=a-cop> no<neg=no> play we a<gr=a-tense> play<gr=predicate-cleft>

    Joke you a joke<gr=predicate-cleft>, man!

    12. Other verbal construction

    12.1 say constructions

       
    The form say (often spelt seh) may be used to introduce a clause of saying, thinking, knowing etc., functioning like SE that. The item say is unlikely to be a verb in this context. The word say or its cognate has been tagged <gr=say>. Where the spelling is not SE say, this has been tagged as well.
      Nuh tell me seh<sp=say><gr=say>, you nuh recognise yuh husband sister!
       
    12.2 make constructions
       
    Make may be used in the SE sense of let, introducing an embedded clause, e.g.
     
      If my son is a rapist, mek<sp=make> <gr=make> lock him up!
    13. Clause structure

    13.1 Verb chaining.

       
    Verbs in JC may be combined in ways which are not possible in English. One set of possibilities involves the motion verbs go and come immediately followed by another verb, e.g.
      Prettyboy, go bring<gr=vbchn> you gran'uncle something to drink.

      Hey, bwoy, don't come cause<gr=vbchn> nofuss y'hear!
       

    A second possibility is to find the motion verb following a main verb with lexical content, e.g.
      weh you ah rush go<gr=vbchn> so?
       
    A third possibility involves a complex of one or more verbs followed by tell, e.g.
      For them to send back tell<gr=vbchn> me they ain'tgot no time [...]
       
    All of these constructions, and any others which may involve verbs in an apparent series, are tagged as <gr=vbchn>.
       
    13.2 Lack of clause marking

    Structures such as relative clauses which might be expected to include a clause marking element (e.g. conjunction, complementiser or relative pronoun) in Standard English, are marked <gr=no-clausemkr> when such an item does not appear, e.g.
     

      "Any bwai <gr=no-clausemkr> try test we - dead!"
Back to contents

Appendix 5: A Short Bibliography of Works on Creole in Britain

(Only works dealing with English-lexicon creoles are included here. The emphasis is on books, and many shorter but valuable papers and articles have been omitted).
 

Back to contents
 

REFERENCES

Back to contents