4 Week 3: Dictionary-based techniques

An extension of word frequency analyses, which we covered last week, are so-called “dictionary-based” techniques. In their most basic form, these analyses use an index of target terms and classify the corpus of interest based on their presence or absence. The technical dimensions of this type of analysis are covered in the chapter section by Klaus Krippendorff (2004), and some of the issues attending them in the article by - Loughran and Mcdonald (2011). The article by Brooke (2021) provides an outstanding illustration of the use of text analysis techniques to make inferences about larger questions of bias.

We will also be reading two examples of the application of these techniques by Martins and Baumard (2020) and Young and Soroka (2012). Here, we will be discussing how successful the authors are in measuring the phenomenon of interest (“prosociality” and “tone” respectively). Questions about sampling and representativeness will again be relevant here, and will naturally inform our assessments of this work.

Questions:

  1. Are general dictionaries possible; or do they have to be domain-specific?
  2. How do we know if our dictionary is accurate?
  3. How could we enhance/supplement dictionary-based techniques?

Required reading:

  • Martins and Baumard (2020)
  • Voigt et al. (2017)
  • Brooke (2021)

Further reading:

  • Tausczik and Pennebaker (2010)
  • Klaus Krippendorff (2004) (pp.283-289)
  • Brier and Hopp (2011)
  • Bonikowski and Gidron (2015)
  • Barberá et al. (2021)
  • Young and Soroka (2012)

Slides: