Persistent Identifier
|
doi:10.18710/7LNWJX |
Publication Date
|
2024-11-25 |
Title
| Background data for: Some obstacles to replication in corpus linguistics |
Author
| Sönning, Lukas (University of Bamberg) - ORCID: 0000-0002-2705-395X |
Point of Contact
|
Use email button above to contact.
Sönning, Lukas (University of Bamberg) |
Description
| This dataset contains tabular files recording occurrences and frequencies of modal verbs in the Brown family corpora; nine modal verbs (can, could, may, might, must, shall, should, will, would) and six corpora are considered (Brown, LOB, Frown, FLOB, BE06, AmE06). Tokens were retrieved using the CQPweb interface provided by the University of Lancaster, and the tables include information on several text-level variables (text length, broad genre, text category, corpus, time period, variety). The data are provided in two formats: (i) in case form, where each token (77,872 in total) is listed separately, including information on the context of occurrence (10 words to the left and 10 to the right); and (ii) in frequency form, which aggregates occurrences by providing information on how often each modal verb appears in every text, thus including one row per text-modal combination (27,000 in total: 6 corpora x 500 texts x 9 modals). (2023-11-09) |
Subject
| Arts and Humanities |
Keyword
| corpus linguistics
modals
modal verbs
English
British English
American English
Brown family corpora
Brown Corpus
Frown Corpus
The Freiburg-Brown corpus of American English
LOB Corpus
The Lancaster-Oslo/Bergen Corpus
FLOB Corpus
The Freiburg–LOB Corpus of British English
BE06 Corpus
British English 2006 Corpus
AmE06 Corpus
American English 2006 Corpus
language change
frequency
dispersion
replication
statistical modeling
methodology
data structure
statistical inference
replication crisis
corpus design
observational data
regression modeling |
Related Publication
| Sönning, Lukas. [Forthcoming blogpost]. Some obstacles to replication in corpus linguistics. |
Language
| English |
Producer
| University of Bamberg https://www.uni-bamberg.de/eng-ling/ |
Production Date
| 2023-11-06 |
Production Location
| Bamberg, Germany |
Distributor
| The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/ |
Depositor
| Sönning, Lukas |
Deposit Date
| 2023-11-09 |
Time Period
| Start Date: 1961-01-01 ; End Date: 2006-12-31 |
Date of Collection
| Start Date: 2023-11-04 ; End Date: 2023-11-06 |
Data Type
| corpus data; textual linguistic data; observational data |
Software
| CQPweb, Version: 3.3.18
R, Version: 4.2.1 |
Data Source
| Brown family (extended). Distributed by the CQPweb interface: https://cqpweb.lancs.ac.uk/
Data from six corpora that are included in the Brown family (extended) collection are used in this dataset:
- Brown Corpus
- Francis, W. N. & H. Kučera. 1979. A Standard Corpus of Present-Day Edited American English, for Use with Digital Computers (Brown). Providence, RI: Brown University.
- Kučera, H. & W. N. Francis. 1967. Computational analysis of present-day American English. Dartmouth Publishing Group.
- LOB Corpus (Lancaster-Oslo/Bergen Corpus)
- Leech, G., S. Johansson & K. Hofland. 1970–1978. The LOB Corpus (original version). Lancaster University, University of Oslo, University of Bergen.
- Leech, G., S. Johansson, R. Garside & K. Hofland. 1981–1986. The LOB Corpus (POS-tagged version). Lancaster University, University of Oslo & University of Bergen.
- Frown Corpus (Freiburg-Brown corpus of American English)
- Mair, C. 1999. The Freiburg-Brown Corpus (‘Frown’). Original edition. Freiburg: Albert-Ludwigs-Universität.
- Mair, C. & G. Leech. 2007. The Freiburg-Brown Corpus ('Frown') (POS-tagged version). POS-tagged edition. Freiburg and Lancaster: Albert-Ludwigs-Universität.
- FLOB Corpus (Freiburg–LOB Corpus of British English)
- Mair, C. 1999. The Freiburg-LOB Corpus (‘F-LOB’) (original version). Freiburg: Albert-Ludwigs-Universität.
- Mair, C. & G. Leech. 2007. The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version). Albert Ludwigs-Universität Freiburg & Lancaster University.
- BE06 Corpus (British English 2006)
- Baker, P. 2008. The British English 2006 corpus (BE06). Lancaster University.
- Baker, P. 2009. The BE06 corpus of British English and recent language change. International Journal of Corpus Linguistics 14(3). 312-337.
- AmE06 Corpus
- Potts, A. & P. Baker. 2012. Does semantic tagging identify cultural change in British and American English? International Journal of Corpus Linguistics 17(3). 295-324.
The extracted text fragments included in the data files of this dataset only represent insubstantial portions of the corpora listed above, and they do not represent coherent larger texts. Reuse of such excerpts is permitted under exceptions in IPR and database protection regulations, such as Fair use (cf. US Copyright Act), the EU Database Directive (cf. art 8 Rights and obligations of lawful users), and the Norwegian Copyright Act (cf. § 24 Eneretten til databaser). |