27 References
Aiden, Erez Lieberman, Joseph P. Pickett, and Jean-Baptiste Michel. 2011. “Culturomics—Response.” Science, April. https://doi.org/10.1126/science.332.6025.36-a.
Alshaabi, Thayer, Jane L. Adams, Michael V. Arnold, Joshua R. Minot, David R. Dewhurst, Andrew J. Reagan, Christopher M. Danforth, and Peter Sheridan Dodds. 2021. “Storywrangler: A Massive Exploratorium for Sociolinguistic, Cultural, Socioeconomic, and Political Timelines Using Twitter.” Science Advances 7 (29): eabe6534. https://doi.org/10.1126/sciadv.abe6534.
Bail, Christopher A. 2012. “The Fringe Effect: Civil Society Organizations and the Evolution of Media Discourse about Islam Since the September 11th Attacks.” American Sociological Review 77 (6): 855–79. https://doi.org/10.1177/0003122412465743.
Barberá, Pablo, Amber E. Boydstun, Suzanna Linn, Ryan McMahon, and Jonathan Nagler. 2021. “Automated Text Classification of News Articles: A Practical Guide.” Political Analysis 29 (1): 19–42. https://doi.org/10.1017/pan.2020.8.
Barberá, Pablo, and Gonzalo Rivero. 2015. “Understanding the Political Representativeness of Twitter Users.” Social Science Computer Review 33 (6): 712–29. https://doi.org/10.1177/0894439314558836.
Barrie, Christopher. 2023. “Did the Musk Takeover Boost Contentious Actors on Twitter?” Harvard Misinformation Review.
Baumard, Nicolas, Elise Huillery, Alexandre Hyafil, and Lou Safra. 2022. “The Cultural Evolution of Love in Literary History.” Nature Human Behaviour, March. https://doi.org/10.1038/s41562-022-01292-z.
Benoit, Kenneth, Drew Conway, Benjamin E. Lauderdale, Michael Laver, and Slava Mikhaylov. 2016. “Crowd-Sourced Text Analysis: Reproducible and Agile Production of Political Data.” American Political Science Review 110 (2): 278–95. https://doi.org/10.1017/S0003055416000058.
Benoit, Kenneth, Kevin Munger, and Arthur Spirling. 2019. “Measuring and Explaining Political Sophistication Through Textual Complexity.” American Journal of Political Science 63 (2): 491–508. https://doi.org/10.1111/ajps.12423.
Biber, Douglas. 1993. “Using Register-Diversified Corpora for General Language Studies.” Computational Linguistics 19 (2): 24.
Bollen, Johan, Marijn ten Thij, Fritz Breithaupt, Alexander T. J. Barron, Lauren A. Rutter, Lorenzo Lorenzo-Luaces, and Marten Scheffer. 2021a. “Historical Language Records Reveal a Surge of Cognitive Distortions in Recent Decades.” Proceedings of the National Academy of Sciences 118 (30): e2102061118. https://doi.org/10.1073/pnas.2102061118.
———. 2021b. “Reply to Schmidt Et Al.: A Robust Surge of Cognitive Distortions in Historical Language.” Proceedings of the National Academy of Sciences 118 (45): e2115842118. https://doi.org/10.1073/pnas.2115842118.
Bonikowski, Bart, and Noam Gidron. 2015. “The Populist Style in American Politics: Presidential Campaign Discourse, 19521996.” Social Forces 94 (4): 1593–1621. https://doi.org/10.1093/sf/sov120.
Boyd, Ryan L., Alexander Spangher, Adam Fourney, Besmira Nushi, Gireeja Ranade, James Pennebaker, and Eric Horvitz. 2018. “Characterizing the Internet Research Agency’s Social Media Operations During the 2016 U.S. Presidential Election Using Linguistic Analyses.” Preprint. PsyArXiv. https://doi.org/10.31234/osf.io/ajh2q.
Brier, Alan, and Bruno Hopp. 2011. “Computer Assisted Text Analysis in the Social Sciences.” Quality & Quantity 45 (1): 103–28. https://doi.org/10.1007/s11135-010-9350-8.
Brooke, SJ. 2021. “Trouble in Programmer’s Paradise: Gender-Biases in Sharing and Recognising Technical Knowledge on Stack Overflow.” Information, Communication & Society 24 (14): 2091–2112.
Bunea, Adriana, and Raimondas Ibenskas. 2015. “Quantitative Text Analysis and the Study of EU Lobbying and Interest Groups.” European Union Politics 16 (3): 429–55. https://doi.org/10.1177/1465116515577821.
Campos, Ricardo, Gaël Dias, Alípio M. Jorge, and Adam Jatowt. 2015. “Survey of Temporal Information Retrieval and Related Applications.” ACM Computing Surveys 47 (2): 1–41. https://doi.org/10.1145/2619088.
Chang, Jonathan, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and David M. Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” In Proceedings of the 22nd International Conference on Neural Information Processing Systems, 288–96. NIPS’09. Red Hook, NY, USA: Curran Associates Inc.
Denny, Matthew J., and Arthur Spirling. 2018. “Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It.” Political Analysis 26 (2): 168–89. https://doi.org/10.1017/pan.2017.44.
Garg, Nikhil, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes.” Proceedings of the National Academy of Sciences 115 (16): E3635–44. https://doi.org/10.1073/pnas.1720347115.
Goldenstein, Jan, and Philipp Poschmann. 2019a. “A Quest for Transparent and Reproducible Text-Mining Methodologies in Computational Social Science.” Sociological Methodology 49 (1): 144–51. https://doi.org/10.1177/0081175019867855.
———. 2019b. “Analyzing Meaning in Big Data: Performing a Map Analysis Using Grammatical Parsing and Topic Modeling.” Sociological Methodology 49 (1): 83–131. https://doi.org/10.1177/0081175019852762.
Gomaa, Wael, and Aly Fahmy. 2013. “A Survey of Text Similarity Approaches.” International Journal of Computer Applications 68 (13): 13–18. https://doi.org/10.5120/11638-7118.
Greenfield, Patricia M. 2013. “The Changing Psychology of Culture From 1800 Through 2000.” Psychological Science 24 (9): 1722–31. https://doi.org/10.1177/0956797613479387.
Grimmer, J., and G. King. 2011. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences 108 (7): 2643–50. https://doi.org/10.1073/pnas.1018067108.
Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. 2021. “Machine Learning for Social Science: An Agnostic Approach.” Annual Review of Political Science 24 (1): 395–419. https://doi.org/10.1146/annurev-polisci-053119-015921.
Grimmer, Justin, and Brandon M. Stewart. 2013a. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97. https://doi.org/10.1093/pan/mps028.
———. 2013b. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97. https://doi.org/10.1093/pan/mps028.
Haroon, Muhammad, Anshuman Chhabra, Xin Liu, Prasant Mohapatra, Zubair Shafiq, and Magdalena Wojcieszak. 2022. “YouTube, the Great Radicalizer? Auditing and Mitigating Ideological Biases in YouTube Recommendations.” https://doi.org/10.48550/ARXIV.2203.10666.
Hopkins, Daniel J., and Gary King. 2010. “A Method of Automated Nonparametric Content Analysis for Social Science.” American Journal of Political Science 54 (1): 229–47. https://doi.org/10.1111/j.1540-5907.2009.00428.x.
Jurafsky, Daniel, and James H Martin. 2021. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd ed. New Jersey: Prentice-Hall, Inc. https://web.stanford.edu/~jurafsky/slp3/.
Kaneko, Tomoki, Taka-aki Asano, and Hirofumi Miwa. 2021. “Estimating Ideal Points of Newspapers from Editorial Texts.” The International Journal of Press/Politics 26 (3): 719–42. https://doi.org/10.1177/1940161220935058.
Kim, Eunji, Yphtach Lelkes, and Joshua McCrain. 2022. “Measuring Dynamic Media Bias.” Proceedings of the National Academy of Sciences 119 (32). https://doi.org/10.1073/pnas.2202197119.
King, Gary, Patrick Lam, and Margaret E. Roberts. 2017. “Computer-Assisted Keyword and Document Set Discovery from Unstructured Text.” American Journal of Political Science 61 (4): 971–88. https://doi.org/10.1111/ajps.12291.
King, Gary, Jennifer Pan, and Margaret E. Roberts. 2017. “How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument.” American Political Science Review 111 (3): 484–501. https://doi.org/10.1017/S0003055417000144.
Klüver, Heike. 2009. “Measuring Interest Group Influence Using Quantitative Text Analysis.” European Union Politics 10 (4): 535–49. https://doi.org/10.1177/1465116509346782.
———. 2015. “The Promises of Quantitative Text Analysis in Interest Group Research: A Reply to Bunea and Ibenskas.” European Union Politics 16 (3): 456–66. https://doi.org/10.1177/1465116515581669.
Kozlowski, Austin C., Matt Taddy, and James A. Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings.” American Sociological Review 84 (5): 905–49. https://doi.org/10.1177/0003122419877135.
Krippendorff, K. 2004. “Reliability in Content Analysis: Some Common Misconceptions and Recommendations.” Human Communication Research 30 (3): 411–33. https://doi.org/10.1093/hcr/30.3.411.
Krippendorff, Klaus. 2004. Content Analysis: An Introduction to Its Methodology. London: SAGE Publications Ltd.
Laver, Michael, Kenneth Benoit, and John Garry. 2003. “Extracting Policy Positions from Political Texts Using Words as Data.” The American Political Science Review 97 (2): 311–31. http://www.jstor.org/stable/3118211.
Loughran, Tim, and Bill Mcdonald. 2011. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” The Journal of Finance 66 (1): 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x.
Lowe, Will. 2008. “Understanding Wordscores.” Political Analysis 16 (4): 356–71. https://doi.org/10.1093/pan/mpn004.
Manning, Christopher D, Prabhakar Raghavan, and Hinrich Schtze. 2007. Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Martins, Mauricio de Jesus Dias, and Nicolas Baumard. 2020. “The Rise of Prosociality in Fiction Preceded Democratic Revolutions in Early Modern Europe.” Proceedings of the National Academy of Sciences 117 (46): 28684–91. https://doi.org/10.1073/pnas.2009571117.
McInnes, Leland, John Healy, and James Melville. 2020. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.” arXiv:1802.03426 [Cs, Stat], September. http://arxiv.org/abs/1802.03426.
Michalopoulos, Stelios, and Melanie Meng Xue. 2021. “Folklore.” The Quarterly Journal of Economics 136 (4): 1993–2046. https://doi.org/10.1093/qje/qjab003.
Michel, J.-B., Y. K. Shen, A. P. Aiden, A. Veres, M. K. Gray, The Google Books Team, J. P. Pickett, et al. 2011. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science 331 (6014): 176–82. https://doi.org/10.1126/science.1199644.
Morse-Gagné, Elise E. 2011. “Culturomics: Statistical Traps Muddy the Data.” Science, April. https://doi.org/10.1126/science.332.6025.35-b.
Nelson, Laura K. 2019. “To Measure Meaning in Big Data, Don’t Give Me a Map, Give Me Transparency and Reproducibility.” Sociological Methodology 49 (1): 139–43. https://doi.org/10.1177/0081175019863783.
———. 2020. “Computational Grounded Theory: A Methodological Framework.” Sociological Methods & Research 49 (1): 3–42. https://doi.org/10.1177/0049124117729703.
Osnabrügge, Moritz, Sara B. Hobolt, and Toni Rodon. 2021. “Playing to the Gallery: Emotive Rhetoric in Parliaments.” American Political Science Review 115 (3): 885–99. https://doi.org/10.1017/S0003055421000356.
PARTHASARATHY, RAMYA, VIJAYENDRA RAO, and NETHRA PALANISWAMY. 2019. “Deliberative Democracy in an Unequal World: AText-As-DataStudy of South India’s Village Assemblies.” American Political Science Review 113 (3): 623–40. https://doi.org/10.1017/s0003055419000182.
Pechenick, Eitan Adam, Christopher M. Danforth, and Peter Sheridan Dodds. 2015. “Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution.” Edited by Alain Barrat. PLOS ONE 10 (10): e0137041. https://doi.org/10.1371/journal.pone.0137041.
Peng, Roger D, and Nicolas W Hengartner. 2002. “Quantitative Analysis of Literary Styles.” The American Statistician 56 (3): 175–85. https://doi.org/10.1198/000313002100.
Peterson, Andrew, and Arthur Spirling. 2018. “Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26 (1): 120–28. https://doi.org/10.1017/pan.2017.39.
Rheault, Ludovic, and Christopher Cochrane. 2020. “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora.” Political Analysis 28 (1): 112–33. https://doi.org/10.1017/pan.2019.26.
Rodriguez, Pedro L., and Arthur Spirling. 2022. “Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research.” The Journal of Politics 84 (1): 101–15. https://doi.org/10.1086/715162.
Rodriguez, Pedro L, Arthur Spirling, and Brandon M Stewart. 2021. “Models for Context-Specific Description and Inference in Political Science.” Working Paper, 1–43. https://github.com/prodriguezsosa/EmbeddingRegression.
Rodriguez, Pedro, and Arthur Spirling. 2021. “Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research.” The Journal of Politics, May, 715162. https://doi.org/10.1086/715162.
Rozado, David, Musa Al-Gharbi, and Jamin Halberstadt. 2021. “Prevalence of Prejudice-Denoting Words in News Media Discourse: A Chronological Analysis.” Social Science Computer Review, July, 089443932110314. https://doi.org/10.1177/08944393211031452.
Schiller, Benjamin, Johannes Daxenberger, and Iryna Gurevych. 2021. “Stance Detection Benchmark: How Robust Is Your Stance Detection?” KI - Künstliche Intelligenz, March. https://doi.org/10.1007/s13218-021-00714-w.
Schmidt, Benjamin, Steven T. Piantadosi, and Kyle Mahowald. 2021. “Uncontrolled Corpus Composition Drives an Apparent Surge in Cognitive Distortions.” Proceedings of the National Academy of Sciences 118 (45): e2115010118. https://doi.org/10.1073/pnas.2115010118.
Schoonvelde, Martijn, Anna Brosius, Gijs Schumacher, and Bert N. Bakker. 2019. “Liberals Lecture, Conservatives Communicate: Analyzing Complexity and Ideology in 381,609 Political Speeches.” Edited by Daniel Wisneski. PLOS ONE 14 (2): e0208450. https://doi.org/10.1371/journal.pone.0208450.
Schwartz, Tim. 2011. “Culturomics: Periodicals Gauge Culture’s Pulse.” Science, April. https://doi.org/10.1126/science.332.6025.35-c.
Schwemmer, Carsten, and Oliver Wieczorek. 2020. “The Methodological Divide of Sociology: Evidence from Two Decades of Journal Publications.” Sociology 54 (1): 3–21.
Siegel, Alexandra A., Evgenii Nikitin, Pablo Barberá, Joanna Sterling, Bethany Pullen, Richard Bonneau, Jonathan Nagler, and Joshua A. Tucker. 2021. “Trumping Hate on Twitter? Online Hate Speech in the 2016 U.S. Election Campaign and Its Aftermath.” Quarterly Journal of Political Science 16 (1): 71–104. https://doi.org/10.1561/100.00019045.
Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. London: O’Reilly.
Slapin, Jonathan B., and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating Time-Series Party Positions from Texts.” American Journal of Political Science 52 (3): 705–22. https://doi.org/10.1111/j.1540-5907.2008.00338.x.
Smith, Steven T., Edward K. Kao, Erika D. Mackin, Danelle C. Shah, Olga Simek, and Donald B. Rubin. 2021. “Automatic Detection of Influential Actors in Disinformation Networks.” Proceedings of the National Academy of Sciences 118 (4): e2011216118. https://doi.org/10.1073/pnas.2011216118.
Tatman, Rachael. 2017. “Gender and Dialect Bias in YouTube’s Automatic Captions.” In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, 53–59. Valencia, Spain: Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-1606.
Tausczik, Yla R., and James W. Pennebaker. 2010. “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods.” Journal of Language and Social Psychology 29 (1): 24–54. https://doi.org/10.1177/0261927X09351676.
Urman, Aleksandra, Mykola Makhortykh, and Roberto Ulloa. 2021. “The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines.” Social Science Computer Review, April, 089443932110068. https://doi.org/10.1177/08944393211006863.
Voigt, Rob, Nicholas P. Camp, Vinodkumar Prabhakaran, William L. Hamilton, Rebecca C. Hetey, Camilla M. Griffiths, David Jurgens, Dan Jurafsky, and Jennifer L. Eberhardt. 2017. “Language from Police Body Camera Footage Shows Racial Disparities in Officer Respect.” Proceedings of the National Academy of Sciences 114 (25): 6521–26. https://doi.org/10.1073/pnas.1702413114.
Waller, Isaac, and Ashton Anderson. 2021. “Quantifying Social Organization and Political Polarization in Online Platforms.” Nature 600 (7888): 264–68. https://doi.org/10.1038/s41586-021-04167-x.
Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science. London: O’Reilly Media.
Ying, Luwei, Jacob M. Montgomery, and Brandon M. Stewart. 2021. “Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures.” Political Analysis, September, 1–20. https://doi.org/10.1017/pan.2021.33.
Young, Lori, and Stuart Soroka. 2012. “Affective News: The Automated Coding of Sentiment in Political Texts.” Political Communication 29 (2): 205–31. https://doi.org/10.1080/10584609.2012.671234.
Yu, Bei, Stefan Kaufmann, and Daniel Diermeier. 2008. “Classifying Party Affiliation from Political Speech.” Journal of Information Technology & Politics 5 (1): 33–48. https://doi.org/10.1080/19331680802149608.
Ziblatt, Daniel, Hanno Hilbig, and Daniel Bischof. 2020. “Wealth of Tongues: Why Peripheral Regions Vote for the Radical Right in Germany.” SocArXiv. https://doi.org/10.31235/osf.io/syr84.