This page is a bit of a repository for links to odds and ends, including R-packages, Shiny apps, code-snippets, and other on-going projects.


  • corpuslingr – A library of functions enabling complex corpus search in context, search aggregation, bag-of-words/KWIC building, and keyphrase extraction.
  • corpusdatr – A collection of linguistic resources, including an abridged version of the Slate Magazine corpus (ca 1996-2000, 1K texts, ~1m words), derived from data made available via the OANC. The corpus has been annotated using spacyr, includes named-entity tags, and is perfectly sized for demo and pedagogical purposes.
  • sotuAnn – An annotated version of the State of the Union corpus (ca 1790-2016, ~2m words).
  • lexvarsdatr – A collection of behavioral data resources (via supplemental materials), including concreteness ratings, age-of-acquisition ratings, response times in lexical decision, some CELEX measures, and word association data, along with some search functionality.
  • quicknews – A simple library of functions for building/scraping multi-lingual corpora based on GoogleNews’ RSS feed.

All packages are presently in development.

Shiny apps

  • Corpus Search – An application built on the search functionality of corpuslingr for interactive corpus search. At present, the demo app makes available for search the Slate Magazine corpus from corpusdatr. Swapping out the corpus with a personal one would be straightforward.
  • Word Association Networks – A simple application for investigating word association data made available via the South Florida Word Assocation Norms.
  • School Locator for Albuquerque Public Schools – Retrieve school locations, driving times, and school characteristics based on user-specified address in Bernalillo County.