12 Week 7: Unsupervised learning (word embedding)
This week we will be discussing a second form of “unsupervised” learning—word embeddings. If previous weeks allowed us to characterize the complexity of text, or cluster text by potential topical focus, word embeddings permit us a more expansive form of measurement. In essence, we are producing here a matrix representation of an entire corpus.
The reading by Pedro L. Rodriguez and Spirling (2022) provides an effective overview of the technical dimensions of this technique. The articles by Garg et al. (2018) and Kozlowski, Taddy, and Evans (2019) are two substantive articles that use word embeddings to provide insights into prejudice and bias as manifested in language over time.
Required reading:
Further reading:
- P. Rodriguez and Spirling (2021)
- Pedro L. Rodriguez and Spirling (2022)
- Osnabrügge, Hobolt, and Rodon (2021)
- Rheault and Cochrane (2020)
- Jurafsky and Martin (2021, ch.6): https://web.stanford.edu/~jurafsky/slp3/]
Slides:
- Week 7 Slides