“Computational Text Analysis” (PGSP11584)
This is the dedicated webpage for the course Computational Text Analysis” (PGSP11584) at the University of Edinburgh, taught by Christopher Barrie. Go to the Course Overview and Introduction tabs for a course overview and introduction to R.
We will be using this online book throughout the course. Each week has a set of essential and recommended readings. The essential readings must be consulted in full prior to the Lecture and Seminar for that week. In addition, you will find online Exercises and examples written in R. This is a “live” book and will be amended and updated during the course itself.
0.1 Structure
The course is structured of alternating weeks of substantive and technical instruction.
Week | Focus | Coding assignment(s) | Class activity |
---|---|---|---|
1 | Retrieving and analyzing text information | Introductory exercises + RTC Workshop by Ugur Ozdemir | Seminar discussion |
2 | Tokenization and word frequencies | Demo | Seminar discussion |
3 | Dictionary-based techniques | Demo + Exercise 2 | Flash talk + Exercise 1 group work |
4 | Natural language, complexity, and similarity | Demo | Coding demo of Exercise 2 + Seminar discussion |
5 | Scaling techniques | Demo + Exercise 4 | Flash talk + Exercise 3 group work |
6 | Unsupervised learning (topic models) | Demo | Coding demo of Exercise 4 + Seminar discussion |
7 | Unsupervised learning (word embedding) | Demo + Exercise 6 | Flash talk + Exercise 5 group work |
8 | Sampling text information | Demo | Coding demo of Exercise 6 + Seminar discussion |
9 | Supervised learning | Demo + Exercise 8 | Flash talk + Exercise 7 group work |
10 | Validation | Demo + Exercise 9 |
|
Acknowledgments
When compiling this course, I benefited from syllabus materials shared online by Bradley Boehmke, Margaret Roberts, Alexandra Siegel, and Arthur Spirling. Thanks also to Justin Grimmer, Margaret Roberts, and Brandon Stewart for providing early view access to their forthcoming Text as Data book.