Applied Corpus Li...
Follow
Find
2.5K views | +0 today
 
Scooped by Jersus Colmenares
onto Applied Corpus Linguistics to Education
Scoop.it!

IELTS Vocabulary

IELTS Vocabulary | Applied Corpus Linguistics to Education | Scoop.it
Exercises and practice to improve your IELTS vocabulary for the exam. Learn about the academic word list and topic related vocabulary.
more...
No comment yet.
Applied Corpus Linguistics to Education
Corpus linguistics resources for language teaching and learning
Your new post is loading...
Your new post is loading...
Scooped by Jersus Colmenares
Scoop.it!

Common prescriptive mistakes part 2– Made up rules / Split infinitives - YouTube

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Longman Dictionaries :: Pearson Longman

Longman Dictionaries :: Pearson Longman | Applied Corpus Linguistics to Education | Scoop.it
Jersus Colmenares's insight:

How Do We Build Our Dictionaries?
The Longman Corpus

Tania Saiz-Sousa, Dictionaries Marketing Manager

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Pearson Longman's February ESL Newsletter

Pearson Longman's February ESL Newsletter | Applied Corpus Linguistics to Education | Scoop.it
Jersus Colmenares's insight:

Who's Afraid of the AWL? (page 1)
John Brezinsky, Higher Education Marketing Manager

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Pearson Longman's May 2010 ESL Newsletter

Pearson Longman's May 2010 ESL Newsletter | Applied Corpus Linguistics to Education | Scoop.it
Jersus Colmenares's insight:

Great reading: "Corpus Linguistics and Grammar Teaching
By Douglas Biber and Susan Conrad"

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

IV International Conference on Corpus Use and Learning to Translate (CULT). University of Alicante

IV International Conference on Corpus Use and Learning to Translate (CULT). University of Alicante | Applied Corpus Linguistics to Education | Scoop.it
IV International Conference on Corpus Use and Learning to Translate (CULT). Fourth Congress International CULT (Corpus Use and Learning to Translate)
more...
No comment yet.
Rescooped by Jersus Colmenares from Applied linguistics and knowledge engineering
Scoop.it!

ELF corpora in the mainstream: notes from the ICAME 35 ...

ELF corpora in the mainstream: notes from the ICAME 35 ... | Applied Corpus Linguistics to Education | Scoop.it
Mikko Laitinen, Magnus Levin and Alexander Lakaw presented a poster entitled “Ongoing grammatical change and the new Englishes: Towards a set of corpora of English use in the expanding circle” (link to pdf, or click on ...

Via Pascual Pérez-Paredes
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

LIWC: Linguistic Inquiry and Word Count

LIWC: Linguistic Inquiry and Word Count | Applied Corpus Linguistics to Education | Scoop.it
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Needles in a haystack: questioning the "fluidity" of ELF

Needles in a haystack: questioning the "fluidity" of ELF | Applied Corpus Linguistics to Education | Scoop.it
As I've earlier argued on this blog, sometimes the claims of "fluidity", "diversity", and "innovation" found in English as a lingua franca (ELF) research are overstated. It's so diverse that even o...
more...
No comment yet.
Rescooped by Jersus Colmenares from Metaglossia: The Translation World
Scoop.it!

What's New in Papyrology: Talk: (Uni-Leipzig) Amir Zeldes, Corpus Linguistics Tools for Sahidic Coptic

Corpus Linguistics Tools for Sahidic Coptic 
Amir Zeldes1 & Caroline T. Schroeder2
1 Humboldt-Universität zu Berlin, 2 University of the Pacific 
Coptic, the language of Christian Egypt in the Hellenistic era of the first millennium, offers both a chance and a challenge for digital humanities research in the 21st century. On the one hand, there are comparatively few digital resources available: no publically available automatic tokenization, part-of-speech tagging, or corpus search software, nor any guidelines on how to undertake these tasks (we are aware of only one, incomplete and unreleased effort to tag Coptic in Orlandi 2004; our work bases partly on Orlandi’s lexical resources, kindly made available to us). On the other hand, an explosion of work in digital humanities (standards like TEI/EpiDoc for manuscript digitization, cf. Cayless et al. 2009 or digital infrastructure like Perseus, cf. Crane et al. 2009, to name just two) has led to a wide range of resources one can draw on in bringing Coptic to the level of technology now enjoyed e.g. by Greek and Latin. 
To seize these opportunities, we have endeavored to develop comprehensive, freely available tools for the automatic linguistic processing of Coptic manuscripts that can be corrected manually and made available online. We present the first publically available tokenizer (lexicon and rule-based) for the main Sahidic dialect of Coptic, as well as two corresponding part-of-speech tagging schemes and training models, fine and coarse grained. Tokenization for Coptic is a non-trivial task, since manuscripts are written in scriptio continua (without spaces), but Coptic word forms are linguistically segmented at two levels: both into minimal morphemes, and into larger word forms, corresponding to nominal or verbal complexes, including related prepositions and articles (nouns) and multiple concatenated conjugation bases with subject/object pronouns and allomorphy (verbs). Our tokenizer currently addresses only the first task, and assumes that a human annotator has separated the scriptio continua into the coarse word forms. Example (1) shows morpheme borders added by the tokenizer, represented by pipe symbols. In some cases, letters can stand for two sounds that belong to different morphemes. In such cases the tokenizer saves the original diplomatic form and also outputs an alternative orthography which allows morphemes to be represented separately. This is shown in (2) for the letter   theta), which stands for a /t/ followed by /h/ coming from different morphemes (individual letters are transliterated in angle brackets). In words of Greek origin, theta, phi and chi should be retained, while coincidental combinations of multiple morphemes leading to these letters must be disentangled. 
Etc. at  Abstract


Via Charles Tiayon
more...
Charles Tiayon's curator insight, December 17, 2013 1:19 AM

Corpus Linguistics Tools for Sahidic Coptic 
Amir Zeldes1 & Caroline T. Schroeder2
1 Humboldt-Universität zu Berlin, 2 University of the Pacific 
Coptic, the language of Christian Egypt in the Hellenistic era of the first millennium, offers both a chance and a challenge for digital humanities research in the 21st century. On the one hand, there are comparatively few digital resources available: no publically available automatic tokenization, part-of-speech tagging, or corpus search software, nor any guidelines on how to undertake these tasks (we are aware of only one, incomplete and unreleased effort to tag Coptic in Orlandi 2004; our work bases partly on Orlandi’s lexical resources, kindly made available to us). On the other hand, an explosion of work in digital humanities (standards like TEI/EpiDoc for manuscript digitization, cf. Cayless et al. 2009 or digital infrastructure like Perseus, cf. Crane et al. 2009, to name just two) has led to a wide range of resources one can draw on in bringing Coptic to the level of technology now enjoyed e.g. by Greek and Latin. 
To seize these opportunities, we have endeavored to develop comprehensive, freely available tools for the automatic linguistic processing of Coptic manuscripts that can be corrected manually and made available online. We present the first publically available tokenizer (lexicon and rule-based) for the main Sahidic dialect of Coptic, as well as two corresponding part-of-speech tagging schemes and training models, fine and coarse grained. Tokenization for Coptic is a non-trivial task, since manuscripts are written in scriptio continua (without spaces), but Coptic word forms are linguistically segmented at two levels: both into minimal morphemes, and into larger word forms, corresponding to nominal or verbal complexes, including related prepositions and articles (nouns) and multiple concatenated conjugation bases with subject/object pronouns and allomorphy (verbs). Our tokenizer currently addresses only the first task, and assumes that a human annotator has separated the scriptio continua into the coarse word forms. Example (1) shows morpheme borders added by the tokenizer, represented by pipe symbols. In some cases, letters can stand for two sounds that belong to different morphemes. In such cases the tokenizer saves the original diplomatic form and also outputs an alternative orthography which allows morphemes to be represented separately. This is shown in (2) for the letter   theta), which stands for a /t/ followed by /h/ coming from different morphemes (individual letters are transliterated in angle brackets). In words of Greek origin, theta, phi and chi should be retained, while coincidental combinations of multiple morphemes leading to these letters must be disentangled. 
Etc. at  Abstract

Scooped by Jersus Colmenares
Scoop.it!

Why Nonadditive Entropy Is Important for Big Data Corpora ...

Why Nonadditive Entropy Is Important for Big Data Corpora ... | Applied Corpus Linguistics to Education | Scoop.it
In a separate White Paper (link to be provided), I identify – for the first time – how statistical mechanics, using a special and highly relevant form of entropy, can be applied to data corpora analysis.
more...
No comment yet.
Rescooped by Jersus Colmenares from ProsoDis
Scoop.it!

Wmatrix corpus analysis and comparison tool

Wmatrix corpus analysis and comparison tool | Applied Corpus Linguistics to Education | Scoop.it

Via ProsoDis
more...
ProsoDis's curator insight, August 29, 5:38 AM

"

Wmatrix is a software tool for corpus analysis and comparison. It provides a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains.

Wmatrix allows the user to run these tools via a web browser such as Chrome, Firefox or Internet Explorer, and so will run on any computer (Mac, Windows, Linux, Unix) with a web browser and a network connection. Wmatrix was initially developed by Paul Rayson in theREVERE project, extended and applied to corpus linguistics during PhD work and is still being updated regularly. Earlier versions were available for Unix via terminal-based command line access (tmatrix) and Unix via Xwindows (Xmatrix), but these only offer retrieval of text pre-annotated with USAS and CLAWS."

Scooped by Jersus Colmenares
Scoop.it!

COW – Corpora from the web - German Grammar - FU Berlin

COW – Corpora from the web - German Grammar - FU Berlin | Applied Corpus Linguistics to Education | Scoop.it
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

IELTS Academic Word List (English Edition) @ Amazon Shop | JHedzWorlD

IELTS Academic Word List (English Edition) @ Amazon Shop | JHedzWorlD | Applied Corpus Linguistics to Education | Scoop.it
IELTS Academic Word List (English Edition) Do you need to: - Improve your English vocabulary quickly and effectively?
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

What do we mean by good grammar? - YouTube

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Pearson Longman's October 2009 ESL Newsletter

Pearson Longman's October 2009 ESL Newsletter | Applied Corpus Linguistics to Education | Scoop.it
Jersus Colmenares's insight:

Transforming Your Students' Vocabulary
Tania Saiz-Sousa, Marketing Manager

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Pearson ELT April 2012 Newsletter

Pearson ELT April 2012 Newsletter | Applied Corpus Linguistics to Education | Scoop.it
Jersus Colmenares's insight:

Focusing on Vocabulary in Writing Classes
by Joyce Cain

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Real_Grammar_Units_2_and_29.pdf

Jersus Colmenares's insight:

This is a great example of how to frame grammar units informed by/based on corpora within a more familiar format for students in regular language classes.

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Vol.8, No.1 - Corpora - Edinburgh University Press

Jersus Colmenares's insight:

Free issue!!!

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Our Use Of Little Words Can, Uh, Reveal Hidden Interests

Our Use Of Little Words Can, Uh, Reveal Hidden Interests | Applied Corpus Linguistics to Education | Scoop.it
When we talk, we focus on the "content" words — the ones that convey information. But the tiny words that tie our sentences together have a lot to say about power and relationships.
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Timeline Photos - American English at State | Facebook

Timeline Photos - American English at State | Facebook | Applied Corpus Linguistics to Education | Scoop.it
Did you know there is a certain order when we use multiple adjectives to describe one noun? For example: The big dirty old brown dog was sleeping. Check...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

ELF corpora in the mainstream: notes from the ICAME 35 ...

ELF corpora in the mainstream: notes from the ICAME 35 ... | Applied Corpus Linguistics to Education | Scoop.it
Mikko Laitinen, Magnus Levin and Alexander Lakaw presented a poster entitled “Ongoing grammatical change and the new Englishes: Towards a set of corpora of English use in the expanding circle” (link to pdf, or click on ...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

La préparation de la question de Corpus - Français 1ère - Les Bons Profs

Un point de méthode fait par une prof pour apprendre à préparer la question de corpus. Plus de vidéos de français sur http://www.lesbonsprofs.com/premiere#!f...
more...
No comment yet.
Rescooped by Jersus Colmenares from Metaglossia: The Translation World
Scoop.it!

UAM CorpusTool: Download

Version 3 of UAMCT offers substantial improvements over version 2.8, particularly in terms of automatic syntactic annotation (mainly for English and French) and part-of-speech tagging (suing TreeTagger or Stanford tagger, 20 languages handled).

Version 3.0 was in beta for a year, but never reached a main release.

Version 3.1 is now in beta, and substantially improves over 3.0:

POS tagging provided for around 20 languages.Syntactic tagging for French (via Stanford Parser)New Search interface using CQL (Corpus Query Language)

If you have any issues with the release, please email me (micko@wagsoft.com).

This is the FIRST beta release of 3.1 (18th August, 2014). Documentation is almost totally lacking, although look at the Release Notes 


Via Charles Tiayon
more...
Charles Tiayon's curator insight, August 21, 10:29 PM

Version 3 of UAMCT offers substantial improvements over version 2.8, particularly in terms of automatic syntactic annotation (mainly for English and French) and part-of-speech tagging (suing TreeTagger or Stanford tagger, 20 languages handled).

Version 3.0 was in beta for a year, but never reached a main release.

Version 3.1 is now in beta, and substantially improves over 3.0:

  • POS tagging provided for around 20 languages.
  • Syntactic tagging for French (via Stanford Parser)
  • New Search interface using CQL (Corpus Query Language)

If you have any issues with the release, please email me (micko@wagsoft.com).

This is the FIRST beta release of 3.1 (18th August, 2014). Documentation is almost totally lacking, although look at the Release Notes 

Scooped by Jersus Colmenares
Scoop.it!

[1406.6312] Scalable Topical Phrase Mining from Text Corpora

Abstract: While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases ...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

O Corpora, O Mores! | Tolnai Translations

O Corpora, O Mores! | Tolnai Translations | Applied Corpus Linguistics to Education | Scoop.it
I have always considered corpora essential for the translation of various specialized texts. I actually believe that one cannot possibly begin to translate without consulting similar texts in the target language (the so-called ...
more...
No comment yet.