14 Week 8: Sampling text information

This week we’ll be thinking about how best to sample text information, thinking about the different biases that might inhere in the data-generating process, as well as the representativeness and generalizability of any text corpus we construct.

The reading by Barberá and Rivero (2015) invesitgates the representativeness of Twitter data, and should give us pause when thinking about using digital trace data as a general barometer of public opinion.

The reading by Michalopoulos and Xue (2021) takes an entirely different tack, but illustrates how we can think systematically about text information more broadly representative of societies in general.

Required reading:

Barberá and Rivero (2015)
Michalopoulos and Xue (2021)
Klaus Krippendorff (2004, chs. 5 and 6)

Further reading:

Martins and Baumard (2020)
Baumard et al. (2022)

Slides:

Week 8 Slides

13 Week 7 Demo

15 Week 9: Supervised learning