Current Research

Google Scholar Page here

Schwarz, Daniel, Denise Traber, and Kenneth Benoit. “Estimating Intra-Party Preferences: Comparing Speeches to Votes.” Conditional acceptance at Political Science Research and Methods.

Kenneth Benoit and Thomas Däubler. May 8, 2014. “Putting Text in Context: How to Estimate Better Left-Right Positions by Scaling Party Manifesto Data using Item Response Theory.” Paper prepared for the Mapping Policy Preferences from Texts Conference, May 15-16, 2014, WZB Berlin Social Science Center.

For over three decades, party manifestos have formed the largest source of textual data for estimating party policy positions and emphases, resting on the pillars of two key assump- tions: that party policy positions can be measured on known dimensions by counting text units in predefined categories, and that more text in a given category indicates stronger emphasis. Here we revisit the inductive approach to estimating policy positions from party manifesto data, demonstrating that there is no single definition of left-right policy that fits well in all contexts, even though meaningful comparisons can be made by locating parties on a single dimension in each context. To estimate party positions, we apply a Bayesian, multi-level, Poisson-IRT measurement model to category counts from coded party mani- festos. By treating the categories as “items” and policy positions as a latent variable, we are able to recover not only left-right estimates but also direct estimates of how each policy category relates to this dimension, without having to decide these relationships in advance based on political theory, exploratory analysis, or guesswork. Finally, the flexibility of our framework permits numerous extensions, designed to incorporate models of manifesto au- thorship, coding effects, and additional explanatory variables (including time and country effects) to improve estimates.


Kenneth Benoit and Paul Nulty. April 8, 2013. “Classification Methods for Scaling Latent Political Traits.” Paper prepared for presentation at the Annual Meeting of the Midwest Political Science Association, April 11–14, 2013, Chicago.

Quantitative methods for scaling latent political traits have much in common with supervised machine learning methods commonly applied to tasks such as email spam detection and product recommender systems. Despite commonalities, however, the research goals and philosophical underpinnings are quite different: machine learning is usually concerned with predicting a knowable or known class, most often with a practical application in mind. Estimating political traits through text, by contrast, involves measuring latent quantities that are inherently unobservable through direct means, and where human “verification” is unreliable, prohibitively costly, or otherwise unavailable. In this paper we show that not only can the Naive Bayes classifier, one of the most widely used machine learning classification methods, can be successfully adapted to measuring latent traits, and also that it is equivalent in general form to \cite{lbg:2003}’s “Wordscores” algorithm for measuring policy positions. We revisit several prominent applications of Wordscores reformulated as Naive Bayes, demonstrating the equivalence but also revealing areas where the original Wordscores algorithm can be substantially improved using standard techniques from machine learning. From this we issue some concrete recommendations for future applications of supervised machine learning to scale latent political traits.


Older papers:

William Lowe and Kenneth Benoit. 2011. “Estimating Uncertainty in Quantitative Text Analysis“. Paper prepared for the 2011 Midwest Political Science Association. Version: 30 March 2011.

Several methods have now become popular in political science for scaling latent traits— usually left-right policy positions—from political texts. Following a great deal of de- velopment, application, and replication, we now have a fairly good understanding of the estimates produced by scaling models such as “Wordscores”, “Wordfish”, and other variants (i.e. Monroe and Maeda’s two-dimensional estimates). Less well understood, however, are the appropriate methods for estimating uncertainty around these esti- mates, which are based on untested assumptions about the stochastic processes that generate text. In this paper we address this gap in our understanding on three fronts. First, we lay out the model assumptions of scaling models and how to generate un- certainty estimates that would be appropriate if all assumptions are correct. Second, we examine a set of real texts to see where and to what extent these assumptions fail. Finally, we introduce a sequence of bootstrap methods to deal with assumption failure and demonstrate their application using a series of simulated and real political texts.