Applied Corpus Li...
Follow
Find
2.5K views | +8 today
 
Scooped by Jersus Colmenares
onto Applied Corpus Linguistics to Education
Scoop.it!

Aprender español | Cómo usar el corpus CREA. | Recursos d_ELE

Aprender español | Cómo usar el corpus CREA. | Recursos d_ELE | Applied Corpus Linguistics to Education | Scoop.it
Curiosidades y novedades sobre España y el español. Curiosidades sobre la lengua española, (ELE). Bits and pieces about Spanish. curiosities of Spanish, Spanish idioms. Free graded reading to learn Spanish.
Jersus Colmenares's insight:

De antemono, ofrezco mis disculpas si esto ya estaba publicado. Sin embargo, espero que les sea de mucha ayuda.

more...
No comment yet.
Applied Corpus Linguistics to Education
Corpus linguistics resources for language teaching and learning
Your new post is loading...
Your new post is loading...
Scooped by Jersus Colmenares
Scoop.it!

Our Use Of Little Words Can, Uh, Reveal Hidden Interests

Our Use Of Little Words Can, Uh, Reveal Hidden Interests | Applied Corpus Linguistics to Education | Scoop.it
When we talk, we focus on the "content" words — the ones that convey information. But the tiny words that tie our sentences together have a lot to say about power and relationships.
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Timeline Photos - American English at State | Facebook

Timeline Photos - American English at State | Facebook | Applied Corpus Linguistics to Education | Scoop.it
Did you know there is a certain order when we use multiple adjectives to describe one noun? For example: The big dirty old brown dog was sleeping. Check...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

ELF corpora in the mainstream: notes from the ICAME 35 ...

ELF corpora in the mainstream: notes from the ICAME 35 ... | Applied Corpus Linguistics to Education | Scoop.it
Mikko Laitinen, Magnus Levin and Alexander Lakaw presented a poster entitled “Ongoing grammatical change and the new Englishes: Towards a set of corpora of English use in the expanding circle” (link to pdf, or click on ...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

La préparation de la question de Corpus - Français 1ère - Les Bons Profs

Un point de méthode fait par une prof pour apprendre à préparer la question de corpus. Plus de vidéos de français sur http://www.lesbonsprofs.com/premiere#!f...
more...
No comment yet.
Rescooped by Jersus Colmenares from Metaglossia: The Translation World
Scoop.it!

UAM CorpusTool: Download

Version 3 of UAMCT offers substantial improvements over version 2.8, particularly in terms of automatic syntactic annotation (mainly for English and French) and part-of-speech tagging (suing TreeTagger or Stanford tagger, 20 languages handled).

Version 3.0 was in beta for a year, but never reached a main release.

Version 3.1 is now in beta, and substantially improves over 3.0:

POS tagging provided for around 20 languages.Syntactic tagging for French (via Stanford Parser)New Search interface using CQL (Corpus Query Language)

If you have any issues with the release, please email me (micko@wagsoft.com).

This is the FIRST beta release of 3.1 (18th August, 2014). Documentation is almost totally lacking, although look at the Release Notes 


Via Charles Tiayon
more...
Charles Tiayon's curator insight, August 21, 10:29 PM

Version 3 of UAMCT offers substantial improvements over version 2.8, particularly in terms of automatic syntactic annotation (mainly for English and French) and part-of-speech tagging (suing TreeTagger or Stanford tagger, 20 languages handled).

Version 3.0 was in beta for a year, but never reached a main release.

Version 3.1 is now in beta, and substantially improves over 3.0:

  • POS tagging provided for around 20 languages.
  • Syntactic tagging for French (via Stanford Parser)
  • New Search interface using CQL (Corpus Query Language)

If you have any issues with the release, please email me (micko@wagsoft.com).

This is the FIRST beta release of 3.1 (18th August, 2014). Documentation is almost totally lacking, although look at the Release Notes 

Scooped by Jersus Colmenares
Scoop.it!

[1406.6312] Scalable Topical Phrase Mining from Text Corpora

Abstract: While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases ...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

O Corpora, O Mores! | Tolnai Translations

O Corpora, O Mores! | Tolnai Translations | Applied Corpus Linguistics to Education | Scoop.it
I have always considered corpora essential for the translation of various specialized texts. I actually believe that one cannot possibly begin to translate without consulting similar texts in the target language (the so-called ...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

fowler.corpora 0.1 : Python Package Index

fowler.corpora is software to create vector space models for Distributional Semantics experiments. It is possible to instantiate a vector space from. British National Corpus; Google Books N-gram corpus.
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

AWL Information - School of Linguistics and Applied Language Studies - Victoria University of Wellington

Coxhead's #Academic Word list: http://t.co/W3RHGgX6zf
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

10 words that are dying out - but 'cheerio' is alive and well in Plymouth ... - Plymouth Herald

10 words that are dying out - but 'cheerio' is alive and well in Plymouth ... - Plymouth Herald | Applied Corpus Linguistics to Education | Scoop.it
Plymouth Herald
10 words that are dying out - but 'cheerio' is alive and well in Plymouth ...
Plymouth Herald
"The rise of 'awesome' seems to provide evidence of American English's influence on British speakers.
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

AACL 2014

AACL 2014 | Applied Corpus Linguistics to Education | Scoop.it
Jersus Colmenares's insight:

Kind reminder...

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

How to help learners of English understand prepositions | British Council Voices

How to help learners of English understand prepositions | British Council Voices | Applied Corpus Linguistics to Education | Scoop.it
Why are words like ‘on’, ‘at’, ‘for’ and ‘about’ so tricky for learners of English and how can teachers help? Adam Simpson, winner of the British Council’s Teaching English blog award for his infographic on prepositions, explains.
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

In memory of Geoffrey Leech

In memory of Geoffrey Leech | Applied Corpus Linguistics to Education | Scoop.it
Jersus Colmenares's insight:

On the passing of Geoffrey Leech. "He is perhaps best known now as one of the founders of the field of corpus linguistics"

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

LIWC: Linguistic Inquiry and Word Count

LIWC: Linguistic Inquiry and Word Count | Applied Corpus Linguistics to Education | Scoop.it
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Needles in a haystack: questioning the "fluidity" of ELF

Needles in a haystack: questioning the "fluidity" of ELF | Applied Corpus Linguistics to Education | Scoop.it
As I've earlier argued on this blog, sometimes the claims of "fluidity", "diversity", and "innovation" found in English as a lingua franca (ELF) research are overstated. It's so diverse that even o...
more...
No comment yet.
Rescooped by Jersus Colmenares from Metaglossia: The Translation World
Scoop.it!

What's New in Papyrology: Talk: (Uni-Leipzig) Amir Zeldes, Corpus Linguistics Tools for Sahidic Coptic

Corpus Linguistics Tools for Sahidic Coptic 
Amir Zeldes1 & Caroline T. Schroeder2
1 Humboldt-Universität zu Berlin, 2 University of the Pacific 
Coptic, the language of Christian Egypt in the Hellenistic era of the first millennium, offers both a chance and a challenge for digital humanities research in the 21st century. On the one hand, there are comparatively few digital resources available: no publically available automatic tokenization, part-of-speech tagging, or corpus search software, nor any guidelines on how to undertake these tasks (we are aware of only one, incomplete and unreleased effort to tag Coptic in Orlandi 2004; our work bases partly on Orlandi’s lexical resources, kindly made available to us). On the other hand, an explosion of work in digital humanities (standards like TEI/EpiDoc for manuscript digitization, cf. Cayless et al. 2009 or digital infrastructure like Perseus, cf. Crane et al. 2009, to name just two) has led to a wide range of resources one can draw on in bringing Coptic to the level of technology now enjoyed e.g. by Greek and Latin. 
To seize these opportunities, we have endeavored to develop comprehensive, freely available tools for the automatic linguistic processing of Coptic manuscripts that can be corrected manually and made available online. We present the first publically available tokenizer (lexicon and rule-based) for the main Sahidic dialect of Coptic, as well as two corresponding part-of-speech tagging schemes and training models, fine and coarse grained. Tokenization for Coptic is a non-trivial task, since manuscripts are written in scriptio continua (without spaces), but Coptic word forms are linguistically segmented at two levels: both into minimal morphemes, and into larger word forms, corresponding to nominal or verbal complexes, including related prepositions and articles (nouns) and multiple concatenated conjugation bases with subject/object pronouns and allomorphy (verbs). Our tokenizer currently addresses only the first task, and assumes that a human annotator has separated the scriptio continua into the coarse word forms. Example (1) shows morpheme borders added by the tokenizer, represented by pipe symbols. In some cases, letters can stand for two sounds that belong to different morphemes. In such cases the tokenizer saves the original diplomatic form and also outputs an alternative orthography which allows morphemes to be represented separately. This is shown in (2) for the letter   theta), which stands for a /t/ followed by /h/ coming from different morphemes (individual letters are transliterated in angle brackets). In words of Greek origin, theta, phi and chi should be retained, while coincidental combinations of multiple morphemes leading to these letters must be disentangled. 
Etc. at  Abstract


Via Charles Tiayon
more...
Charles Tiayon's curator insight, December 17, 2013 1:19 AM

Corpus Linguistics Tools for Sahidic Coptic 
Amir Zeldes1 & Caroline T. Schroeder2
1 Humboldt-Universität zu Berlin, 2 University of the Pacific 
Coptic, the language of Christian Egypt in the Hellenistic era of the first millennium, offers both a chance and a challenge for digital humanities research in the 21st century. On the one hand, there are comparatively few digital resources available: no publically available automatic tokenization, part-of-speech tagging, or corpus search software, nor any guidelines on how to undertake these tasks (we are aware of only one, incomplete and unreleased effort to tag Coptic in Orlandi 2004; our work bases partly on Orlandi’s lexical resources, kindly made available to us). On the other hand, an explosion of work in digital humanities (standards like TEI/EpiDoc for manuscript digitization, cf. Cayless et al. 2009 or digital infrastructure like Perseus, cf. Crane et al. 2009, to name just two) has led to a wide range of resources one can draw on in bringing Coptic to the level of technology now enjoyed e.g. by Greek and Latin. 
To seize these opportunities, we have endeavored to develop comprehensive, freely available tools for the automatic linguistic processing of Coptic manuscripts that can be corrected manually and made available online. We present the first publically available tokenizer (lexicon and rule-based) for the main Sahidic dialect of Coptic, as well as two corresponding part-of-speech tagging schemes and training models, fine and coarse grained. Tokenization for Coptic is a non-trivial task, since manuscripts are written in scriptio continua (without spaces), but Coptic word forms are linguistically segmented at two levels: both into minimal morphemes, and into larger word forms, corresponding to nominal or verbal complexes, including related prepositions and articles (nouns) and multiple concatenated conjugation bases with subject/object pronouns and allomorphy (verbs). Our tokenizer currently addresses only the first task, and assumes that a human annotator has separated the scriptio continua into the coarse word forms. Example (1) shows morpheme borders added by the tokenizer, represented by pipe symbols. In some cases, letters can stand for two sounds that belong to different morphemes. In such cases the tokenizer saves the original diplomatic form and also outputs an alternative orthography which allows morphemes to be represented separately. This is shown in (2) for the letter   theta), which stands for a /t/ followed by /h/ coming from different morphemes (individual letters are transliterated in angle brackets). In words of Greek origin, theta, phi and chi should be retained, while coincidental combinations of multiple morphemes leading to these letters must be disentangled. 
Etc. at  Abstract

Scooped by Jersus Colmenares
Scoop.it!

Why Nonadditive Entropy Is Important for Big Data Corpora ...

Why Nonadditive Entropy Is Important for Big Data Corpora ... | Applied Corpus Linguistics to Education | Scoop.it
In a separate White Paper (link to be provided), I identify – for the first time – how statistical mechanics, using a special and highly relevant form of entropy, can be applied to data corpora analysis.
more...
No comment yet.
Rescooped by Jersus Colmenares from ProsoDis
Scoop.it!

Wmatrix corpus analysis and comparison tool

Wmatrix corpus analysis and comparison tool | Applied Corpus Linguistics to Education | Scoop.it

Via ProsoDis
more...
ProsoDis's curator insight, August 29, 5:38 AM

"

Wmatrix is a software tool for corpus analysis and comparison. It provides a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains.

Wmatrix allows the user to run these tools via a web browser such as Chrome, Firefox or Internet Explorer, and so will run on any computer (Mac, Windows, Linux, Unix) with a web browser and a network connection. Wmatrix was initially developed by Paul Rayson in theREVERE project, extended and applied to corpus linguistics during PhD work and is still being updated regularly. Earlier versions were available for Unix via terminal-based command line access (tmatrix) and Unix via Xwindows (Xmatrix), but these only offer retrieval of text pre-annotated with USAS and CLAWS."

Scooped by Jersus Colmenares
Scoop.it!

COW – Corpora from the web - German Grammar - FU Berlin

COW – Corpora from the web - German Grammar - FU Berlin | Applied Corpus Linguistics to Education | Scoop.it
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

IELTS Academic Word List (English Edition) @ Amazon Shop | JHedzWorlD

IELTS Academic Word List (English Edition) @ Amazon Shop | JHedzWorlD | Applied Corpus Linguistics to Education | Scoop.it
IELTS Academic Word List (English Edition) Do you need to: - Improve your English vocabulary quickly and effectively?
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

ReCALL Special Issue on Researching New Uses of Corpora for ...

ReCALL Special Issue on Researching New Uses of Corpora for ... | Applied Corpus Linguistics to Education | Scoop.it
In particular, corpus linguistics has revolutionised different fields of language study by bringing in data (aka corpora) to language description. Although the 1980s introduced large corpora designed to be representative of the ...
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Research Blog: Teaching machines to read between the lines (and a new corpus with entity salience annotations)

RT @historying: Google's ongoing corpus linguistics work: http://t.co/rp24vo72OY
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Vol. 53 (2014)

Vol. 53 (2014) | Applied Corpus Linguistics to Education | Scoop.it
Revista científica en el ámbito del Procesamiento del Lenguaje Natural
more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

The University of Auckland - Lancaster University Corpus Linguistics Workshop 2014 -- Abstract submission system

more...
No comment yet.
Scooped by Jersus Colmenares
Scoop.it!

Ten Common Misused English Words Among English Language Learners

Ten Common Misused English Words Among English Language Learners | Applied Corpus Linguistics to Education | Scoop.it
For today, I've made a list of the ten most common misused words in English by English language learners. I came up with this list based on the English language learners I've tutored who are unawar...
more...
No comment yet.