Current Research

Google Scholar Page here

Thomas Däubler and Kenneth Benoit. February 13, 2017. “Estimating Better Left-Right Positions Through Statistical Scaling of Manual Content Analysis.”

Borrowing from automated “text as data” approaches, we show how statistical scaling models can be applied to hand-coded content analysis to improve estimates of political parties’ left-right policy positions. We apply a Bayesian item-response theory (IRT) model to category counts from coded party manifestos, treating the categories as “items” and policy positions as a latent variable. This approach also produces direct estimates of how each policy category relates to left-right ideology, without having to decide these relationships in advance based on out of sample fitting, political theory, assertion, or guesswork. This approach not only prevents the misspecification endemic to a fixed-index approach, but also works well even with items that are not specifically designed to measure ideological positioning.

Kenneth Benoit and Paul Nulty. April 8, 2013. “Classification Methods for Scaling Latent Political Traits.” Paper prepared for presentation at the Annual Meeting of the Midwest Political Science Association, April 11–14, 2013, Chicago.

Quantitative methods for scaling latent political traits have much in common with supervised machine learning methods commonly applied to tasks such as email spam detection and product recommender systems. Despite commonalities, however, the research goals and philosophical underpinnings are quite different: machine learning is usually concerned with predicting a knowable or known class, most often with a practical application in mind. Estimating political traits through text, by contrast, involves measuring latent quantities that are inherently unobservable through direct means, and where human “verification” is unreliable, prohibitively costly, or otherwise unavailable. In this paper we show that not only can the Naive Bayes classifier, one of the most widely used machine learning classification methods, can be successfully adapted to measuring latent traits, and also that it is equivalent in general form to \cite{lbg:2003}’s “Wordscores” algorithm for measuring policy positions. We revisit several prominent applications of Wordscores reformulated as Naive Bayes, demonstrating the equivalence but also revealing areas where the original Wordscores algorithm can be substantially improved using standard techniques from machine learning. From this we issue some concrete recommendations for future applications of supervised machine learning to scale latent political traits.

Older papers:

William Lowe and Kenneth Benoit. 2011. “Estimating Uncertainty in Quantitative Text Analysis“. Paper prepared for the 2011 Midwest Political Science Association. Version: 30 March 2011.

Several methods have now become popular in political science for scaling latent traits— usually left-right policy positions—from political texts. Following a great deal of de- velopment, application, and replication, we now have a fairly good understanding of the estimates produced by scaling models such as “Wordscores”, “Wordfish”, and other variants (i.e. Monroe and Maeda’s two-dimensional estimates). Less well understood, however, are the appropriate methods for estimating uncertainty around these esti- mates, which are based on untested assumptions about the stochastic processes that generate text. In this paper we address this gap in our understanding on three fronts. First, we lay out the model assumptions of scaling models and how to generate un- certainty estimates that would be appropriate if all assumptions are correct. Second, we examine a set of real texts to see where and to what extent these assumptions fail. Finally, we introduce a sequence of bootstrap methods to deal with assumption failure and demonstrate their application using a series of simulated and real political texts.