Language Engineering Resources Questionnaire: Indigenous Minority Languages of the British Isles and Ireland

This questionnaire has been taken off-line.

The Department of Linguistics at Lancaster University is currently undertaking an EPSRC-sponsored project investigating the needs of the language engineering community with regard to corpus building in the indigenous minority languages of the British Isles and Ireland (LER-BIML), namely Cornish, Scottish Gaelic, Irish, Manx, Scots, Ulster Scots (Ullans) and Welsh. This web-questionnaire has been designed to assess such needs. The answers will be made anonymous and eventually contribute to a report detailing the needs of the language engineering community. If you would like a copy of this report, please mark this box.

Even if you are not working directly with these languages at present, we'd like to imagine that you may do at some point in the future, and it would be helpful if you could complete the questionnaire with this in mind.

Relevant links:
Corpus Encoding Standard: contents | Text Encoding Initiative guidelines | Unicode Consortium Website

1a Name:
1b Place of work:
1c Email address:
1d Nationality (optional):

2a From the list below please indicate those languages for which you would like to see (more) corpus resources available:

Cornish Scottish Gaelic        Irish       Manx      Scots Ulster Scots Welsh

2b For which other languages would you like to see corpus resources available?

3a Which of the following corpus types would you be most interested in building for these languages?

Monolingual Bilingual Multilingual All      Any
3bi If you answered "Bilingual", "Multilingual", "Any" or "All" to the above question, which language(s) would you like to have in your corpora? e.g. English/Welsh.

3bii ...and which of the following would you prefer a multi/bi-lingual corpus to contain?
Word-aligned translations of the same texts in each language
Sentence-aligned translations of the same texts in each language
Un-aligned translations of the same texts in each language
Different texts from equivalent genres in each languages
Different texts from different genres

4 Would you prefer mostly to see written or spoken data built for these languages?
Both about equally Written Spoken Both but emphasis on written Both but emphasis on spoken
 

5a Would you prefer balanced corpora of these languages, or corpora which focused on specific genres?

Balanced

Focused

Either

5b What genres would you like to see represented in such a corpus? (Mark as many as apply)

News Legal Health Fiction Letters/Diaries Leisure
Commerce

Government

Scientific/Academic

Historical

Children

Manuals

5c Any other type of genre?

6a How would you like the data to be linguistically annotated? (Mark as many as apply)
Part-of-speech

Parsed

Phonemic

Prosodic

Semantic

Just plain text

6b Any other type of linguistic annotation?

7 By which media would you prefer to receive corpus data? (Mark as many as apply)
Diskette

CD

ftp

Dat Tape

Internet

8 Would you be interested in seeing any of the following textual mark-up in the corpus?

TEI-Lite

TEI

SGML

XML

HTML

CHILDES/LIDES

9 For the features below please mark how important they would be for your corpus. We have specified the default option as "no opinion" to save you time.
Feature Example Preferred status
Header elements
Creator(s) of corpus creator= Essential If Possible No Opinion Not wanted
Date Created/updated date.created= Essential If Possible No Opinion Not wanted
Author <author> Essential If Possible No Opinion Not wanted
Extent: words/bytes <wordCount> Essential If Possible No Opinion Not wanted
Source of data <sourceDesc> Essential If Possible No Opinion Not wanted
Project description <projectDesc> Essential If Possible No Opinion Not wanted
Sampling description <samplingDecl> Essential If Possible No Opinion Not wanted
Editorial description <editorialDecl> Essential If Possible No Opinion Not wanted
Revision description <revisionDesc> Essential If Possible No Opinion Not wanted
Language usage <langUsage> Essential If Possible No Opinion Not wanted
 

Primary data

Top-level structure <cesCorpus> Essential If Possible No Opinion Not wanted
Text body <body> Essential If Possible No Opinion Not wanted
Text divisions <div> Essential If Possible No Opinion Not wanted
Head elements <opener> <head>
Essential If Possible No Opinion Not wanted
Closer elements <closer> <byline> Essential If Possible No Opinion Not wanted
 

Paragraph-level elements

Paragraph <p> Essential If Possible No Opinion Not wanted
Quote <quote> <q> Essential If Possible No Opinion Not wanted
Poem <poem> Essential If Possible No Opinion Not wanted
Figure <figure> Essential If Possible No Opinion Not wanted
Note (footnote) <note> Essential If Possible No Opinion Not wanted
Table <table> Essential If Possible No Opinion Not wanted
List <list> Essential If Possible No Opinion Not wanted
Foreign <foreign lang=> Essential If Possible No Opinion Not wanted
Sentence unit <s> Essential If Possible No Opinion Not wanted
Punctuation <punc type=colon> Essential If Possible No Opinion Not wanted
Rendition information e.g. bold/italics rend=BO Essential If Possible No Opinion Not wanted
 

Spoken data

Overlapping speech <anchor> Essential If Possible No Opinion Not wanted
Non-lexical vocalisations <vocal coughs> Essential If Possible No Opinion Not wanted
Stress <shift feature=loud> Essential If Possible No Opinion Not wanted
Pauses <pause dur=2> Essential If Possible No Opinion Not wanted
Unclear speech <unclear> Essential If Possible No Opinion Not wanted
Actions/gestures/events <event> Essential If Possible No Opinion Not wanted
Setting <settingDesc> Essential If Possible No Opinion Not wanted
Participants

<particiDesc>

Essential

If Possible

No Opinion

Not wanted

10a (For language engineers) Imagine you have a CD of corpus data for a range of indigenous minority languages in both written and spoken formats. What applications would you want to use this data to build?

10b (For linguists) Imagine you have a CD of corpus data for a range of indigenous minority languages in both written and spoken formats. What sort of questions would you want to explore with such a corpus?

11 What type of support tools would you like to use with this imaginary corpus data?

12 How likely are you to be working with such indigenous minority languages in the future?
Very likely

Possibly

Unsure

Probably not

     Very unlikely

Finally, it would be helpful if you could forward the url of this page to anyone else who you think might be interested in completing it.

Thank You for taking the time to complete this questionnaire.

Dr. Andrew Wilson, Celia Worth, LER-BIML Project, Lancaster University. May 2002