“Computational Text Analysis” (PGSP11584)

cover This is the dedicated webpage for the course Computational Text Analysis” (PGSP11584) at the University of Edinburgh, taught by Christopher Barrie. Go to the Course Overview and Introduction tabs for a course overview and introduction to R.

We will be using this online book throughout the course. Each week has a set of essential and recommended readings. The essential readings must be consulted in full prior to the Lecture and Seminar for that week. In addition, you will find online Exercises and examples written in R. This is a “live” book and will be amended and updated during the course itself.

0.1 Structure

The course is structured of alternating weeks of substantive and technical instruction.

Week Focus Coding assignment(s) Class activity
1 Retrieving and analyzing text information Introductory exercises + RTC Workshop by Ugur Ozdemir Seminar discussion
2 Tokenization and word frequencies Demo Seminar discussion
3 Dictionary-based techniques Demo + Exercise 2 Flash talk + Exercise 1 group work
4 Natural language, complexity, and similarity Demo Coding demo of Exercise 2 + Seminar discussion
5 Scaling techniques Demo + Exercise 4 Flash talk + Exercise 3 group work
6 Unsupervised learning (topic models) Demo Coding demo of Exercise 4 + Seminar discussion
7 Unsupervised learning (word embedding) Demo + Exercise 6 Flash talk + Exercise 5 group work
8 Sampling text information Demo Coding demo of Exercise 6 + Seminar discussion
9 Supervised learning Demo + Exercise 8 Flash talk + Exercise 7 group work
10 Validation Demo + Exercise 9 Coding demo of Exercise 8 + Seminar discussion

Acknowledgments

When compiling this course, I benefited from syllabus materials shared online by Bradley Boehmke, Margaret Roberts, Alexandra Siegel, and Arthur Spirling. Thanks also to Justin Grimmer, Margaret Roberts, and Brandon Stewart for providing early view access to their forthcoming Text as Data book.