16 Week 10
16.1 Validation
This week we’ll be thinking about how to validate techniques we’ve used in the preceding weeks. Validation is a necessary and important part of any text analysis technique.
Often we speak of validation in the context of machine labelling of large text data. But validation need not—and should not—be restricted to automated classification tasks. The articles by Ying, Montgomery, and Stewart (2021) and Pedro L. Rodriguez, Spirling, and Stewart (2021) describe ways to approach validation in unsupervised contexts. Finally, the article by Peterson and Spirling (2018) shows how validation and accuracy might provide a measure of substantive significance.
Required reading:
- Ying, Montgomery, and Stewart (2021)
- Pedro L. Rodriguez, Spirling, and Stewart (2021)
- Peterson and Spirling (2018)
- Manning, Raghavan, and Schtze (2007, ch.2: https://nlp.stanford.edu/IR-book/information-retrieval-book.html)
Further reading:
- K. Krippendorff (2004)
- Denny and Spirling (2018)
- Justin Grimmer and Stewart (2013b)
- Barberá et al. (2021)
- Schiller, Daxenberger, and Gurevych (2021)
Slides:
- Week 10 Slides