| 1a Name: |
|
| 1b Place of work: |
|
| 1c Email address: |
|
| 1d Nationality (optional):
| |
|
2a From the list below please indicate those languages for which you would like to see
(more) corpus resources available: |
| Cornish
| Scottish Gaelic
| Irish
| Manx
Scots
| Ulster Scots
| Welsh |
|
|
|
|
|
| |
| |
|
2b For which other languages would you like to see corpus resources
available?
| |
| 3a Which of the following corpus types would you be most
interested in building for these languages?
| Monolingual
| Bilingual
| Multilingual
| All
Any |
| 3bi If you answered "Bilingual",
"Multilingual", "Any" or "All"
to the above question, which language(s) would you like to have in your
corpora? e.g. English/Welsh.
| |
| 3bii ...and which of the following would you prefer a
multi/bi-lingual corpus to contain? |
| Word-aligned
translations of the same texts in each language |
| Sentence-aligned
translations of the same texts in each language |
| Un-aligned
translations of the same texts in each language |
| Different texts
from equivalent genres in each languages |
| Different texts from
different genres
|
| 4 Would you prefer mostly to see written or spoken data
built for these languages? |
| Both about equally
| Written
| Spoken
| Both but emphasis on written
| Both but emphasis
on spoken |
|
5a Would you prefer balanced corpora of these languages,
or corpora which focused on specific genres?
| Balanced
| Focused
| Either
|
| 5b What genres would you like to see represented in such
a corpus? (Mark as many as apply)
|
| News
| Legal
| Health
| Fiction
| Letters/Diaries
| Leisure |
| Commerce
| Government
| Scientific/Academic
| Historical
| Children
| Manuals
|
| 5c Any other type of genre?
|
|
| 6a How would you like the data to be
linguistically annotated? (Mark as many as apply) |
| Part-of-speech
| Parsed
| Phonemic
| Prosodic
| Semantic
| Just plain text
|
| 6b Any other type of linguistic annotation?
|
|
|
|
| |
| 7 By which media would you prefer to receive corpus data?
(Mark as many as apply) |
| Diskette
| CD
| ftp
| Dat Tape
| Internet
|
|
|
8 Would you be interested in seeing any of the following textual
mark-up in the corpus? |
| TEI-Lite
| TEI
| SGML
| XML
| HTML
| CHILDES/LIDES
|
| 9 For the features below please mark how important they
would be for your corpus. We have specified the default option as "no
opinion" to save you time. |
| Feature
| Example
| Preferred status |
| Header elements |
| Creator(s) of corpus |
creator= |
Essential |
If Possible |
No Opinion |
Not wanted |
| Date Created/updated |
date.created= |
Essential |
If Possible |
No Opinion |
Not wanted
|
| Author |
<author> |
Essential |
If Possible |
No Opinion
| Not wanted |
| Extent: words/bytes |
<wordCount> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Source of data |
<sourceDesc> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Project description |
<projectDesc> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Sampling description |
<samplingDecl> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Editorial description |
<editorialDecl> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Revision description |
<revisionDesc> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Language usage |
<langUsage> |
Essential |
If Possible |
No Opinion |
Not wanted |
|
Primary data
|
| Top-level structure |
<cesCorpus> |
Essential |
If Possible |
No Opinion |
Not wanted
|
| Text body |
<body> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Text divisions |
<div> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Head elements |
<opener> <head>
|
Essential |
If Possible |
No Opinion |
Not wanted |
| Closer elements |
<closer> <byline> |
Essential |
If Possible |
No Opinion |
Not wanted |
|
Paragraph-level elements
|
| Paragraph |
<p> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Quote
| <quote> <q> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Poem |
<poem> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Figure |
<figure> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Note (footnote) |
<note> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Table |
<table> |
Essential |
If Possible |
No Opinion |
Not wanted |
| List |
<list> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Foreign |
<foreign lang=> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Sentence unit |
<s> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Punctuation
| <punc type=colon>
| Essential |
If Possible |
No Opinion |
Not wanted |
| Rendition information e.g. bold/italics |
rend=BO |
Essential |
If Possible |
No Opinion |
Not wanted |
|
Spoken data
|
| Overlapping speech |
<anchor> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Non-lexical vocalisations |
<vocal coughs> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Stress |
<shift feature=loud> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Pauses |
<pause dur=2> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Unclear speech |
<unclear> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Actions/gestures/events |
<event> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Setting |
<settingDesc> |
Essential |
If Possible |
No Opinion |
Not wanted |
| Participants
|
<particiDesc>
|
Essential
|
If Possible
|
No Opinion
|
Not wanted
|
| 10a (For language engineers) Imagine you have a CD of
corpus data for a range of indigenous minority languages in
both written and spoken formats. What applications would you want to use
this data to build? |
|
|
| 10b (For linguists) Imagine you have a CD of corpus data
for a range of indigenous minority languages in both written
and spoken formats. What sort of questions would you want to explore with
such a corpus? |
|
|
| 11 What type of support tools would you like to use with
this imaginary corpus data? |
|
|
| 12 How likely are you to be working with such indigenous minority languages in the future? |
| Very likely
| Possibly
| Unsure
| Probably not
| Very unlikely
|
|
| Finally, it would be helpful if you could forward the url of
this page to anyone else who you think might be interested in completing
it. Thank You for taking the time to complete this questionnaire. Dr.
Andrew Wilson, Celia Worth, LER-BIML Project, Lancaster
University. May 2002
|