Skip to content

cokelly/collocateR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status DOI

CollocateR

CollocateR is a package for the statistical programming language R. Albeit imperfectly, the package increasingly uses functions and workflows from the tidyverse and tidytext packages.

Purpose

CollocateR serves a simple purpose. It processes collocates for keywords in context in text files and calculates significance for them, based on tests set out in Barnbrook et al's Collocation: Applications and Implications, Palgrave 2013, and formulae explained in the British National Corpus home.

Functions

- save_collocates: Return a list containing a tokenised version of the original document, a record of the node in original and hashed format, lists of left and right collocate locations, and document word_length.

  • get_freqs: A frequency count for collocates, both in context and in the document in general
  • pmi: a 'pointwise mutual information' significance test based on the probability of nodes and collocates occurring together compared to the probability of their occurring independently.
  • npmi: as above, but normalised so all results occur between 1 (perfect collocation) and -1 (the terms never collocate).
  • z-score: a probability test comparing probability of collocate occurring in near the node versus its occurrence across the text

TODO

  • save_collocates
  • pmi
  • npmi
  • z-score
  • MI Cubed
  • log_log
  • log_likelihood
  • Import other elements

Acknowledgement

README generated with readme2tex.