Blog Archive

Monday, February 13, 2012

One Corpus, Two Corpora

The Corpus of Contemporary American English (COCA) from Mark Davies at Brigham Young University is a useful tool for determining collocations, usage, synonyms and much more.  Essentially it is a searchable database of language samples of spoken and written, popular and academic contemporary English (1990-2011). It is free to use and although it takes a little bit of practice to learn all the different applications, there are many “help” icons that assist you along the way.  A search for the word "chart" in a corpus will generate a result that looks like this.

A new feature of COCA allows users to input an entire document, rather than individual words or phrases, and have the language searched for word frequency, “academic” words, and detailed information about all individual words that appear in the corpus.  Mark Davies adds,

You can click on any word in your text to get detailed information about the word (all on one screen) -- its overall frequency in COCA, its frequency in each genre (spoken, fiction, magazine, newspaper, and academic), the 20-30 most frequent collocates (nearby words), up to 200 sample concordance lines, synonyms, and related words from WordNet. There's no need to go consult other dictionaries or thesauruses or online-resources -- it's all right there, with just one click for each and every word in your text.
Check out this feature at 

If you’ve never used a corpus, I highly recommend familiarizing yourself. They are valuable tools for language learners and linguists alike.  Another completely free corpus that I find very useful is The Michigan Corpus of Academic Spoken English (MICASE)  This is specific to Academic Spoken English as the name implies and includes samples from native and non-native speakers.  You can also hear audio samples (with transcripts!) of a variety of English accents here. 

No comments:

Post a Comment