Principles of corpus linguistics and their application to translation studies researchGabriela Saldanha Centre for English Language Studies, University of Birmingham 1. Introduction Corpora have been put to many different uses in fields as varied as natural languageprocessing, critical discourse analysis and applied linguistics, to mention just a few. As isto be expected, within each of those areas corpora fulfil different roles, from providing datato build statistical machine translation systems to revealing ideological stance in politically-sensitive texts. ‘Corpus linguistics’ is understood here in a more restricted sense, linked toBritish traditions of text analysis that see linguistics as a social science and language as ameans of social interaction where meaning is inextricably linked to the cultural andhistorical context in which it is produced. This article focuses specifically on the principlesof corpus linguistics as a research methodology, and looks at the implications of thisspecific approach to the study of language in translation studies. 2. A corpus defined in corpus linguistics terms Because there is no unanimous agreement on the necessary and sufficient conditions for a collection of texts to be a corpus, the term ‘corpus’ can be seen in the literature referringsometimes to a couple of short stories stored in electronic form and sometimes to thewhole world wide web. In order to discuss the fundamental principles of corpus linguistics,it is important to first establish certain limits around what can and cannot be considered a‘corpus-based’ study of translation.Different definitions of corpus emphasise different aspects of this resource. The definitionoffered by McEnery and Wilson (1996: 87), for example, emphasises representativeness:“a body of text which is carefully sampled to be maximally representative of a language or language variety”. The problem with making representativeness the defining characteristicof a corpus is that it is very difficult to evaluate and it will always depend on what thecorpus is used for. A way around this problem is found in the definition offered by Bowker and Pearson (2002: 9): “a large collection of authentic texts that have been gathered inelectronic form according to a specific set of criteria”. Bowker and Pearson’s definition ismore flexible than McEnery and Wilson’s, even if the assumption is still that the corpus isintended to be “used as a representative sample of a particular language or subset of thatlanguage” (Bowker and Pearson, 2002: 9). However, in making selection criteria and notrepresentativeness the defining characteristic, Bowker and Pearson allow for a certainflexibility that reflects more accurately the fact that corpus representativeness is alwaysdependent on the purpose for which the corpus is used and on the specific linguisticfeatures under study. For example, a corpus that represents accurately the distribution of a common feature – say, pronouns – in a certain language subset may not representaccurately a rarer feature, such as the use of reported speech, in the same subset.Generally, corpora are intended to be long-term resources and to be used for a variety of studies, so representativeness cannot be ensured at the design stage.
In the first of a two-part blog entry, Prof. Ronald Carter of the University of Nottingham provides a brief introduction to corpora and corpus linguistics, exploring ways in which corpora are currently being used to inform language teaching and the development of teaching materials.
What is a corpus?
corpus noun (plural corpuses or corpora) the collection of a single writer’s work or of writing about a particular subject, or a large amount of written and sometimes spoken material collected to show the state of a language
Cambridge Advanced Learner’s Dictionary Third Edition (2008) Cambridge: Cambridge University Press...
I often pass trucks like the one pictured below in my travels to and from northern Michigan (this one happened to be stopped at a gas station where I was filling up which allowed me to snap the picture).
I think it is wonderful that the company assembles 100% of the toys in their product line in the United States and that all of the plastic used in the toys is purchased in the USA (see here).
Unfortunately, every time I see these trucks I can't help but think "another load of crap."
The results from a Google Ngram Viewer search comparing the use of the phrases "load of crap" and "load of toys" in American English helps to explain why.
In the second of this two-part blog entry, Prof. Ronald Carter of the University of Nottingham looks in more detail at the kind of information corpora can reveal about the use of language and why this is so important for the development of language teaching materials....
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.