Current Research

Google Scholar Page here

Daubler, Thomas, Kenneth Benoit, Slava Mikhaylov, and Michael Laver. “Natural Sentences as Valid Units for Coded Political Texts.” Paper prepared for presentation at the “Why and How of Party Manifestos in New and in Established Democracies” workshop of the 2011 ECPR Joint Sessions, 12-17 April, St. Gallen, Switzerland. Version: 8 April 2011.

Despite the recent focus on scaling policy positions by treating political text as quantitative data, huge investments in political science continue to use expert-coded content analysis, namely the 30-year Comparative Manifesto Project (CMP) of coded manifestos as well as the Comparative Policy Agendas Project (CAP). All text analysis methods require the identification of a fundamental unit of analysis. The fundamental unit of analysis in both CMP and CAP is the “quasi sentence”, which is either a natural sentence, or a part of a sentence judged by the coder to have an independent component of meaning. The use of subjective judgment in identifying quasi-sentences, however, means that specification of the fundamental unit of data analysis is endogenous to the content of the text. In addition, it is known that the unitization of political texts into endogenous quasi sentences by expert coders generates unreliable specifications of the unit of analysis. The justification for using quasi-sentences is a supposed gain in associated validity of the codings. In this paper, we show that this justification is empirically questionable, since using quasi-sentences does not produce valuable additional information in characterizing substantive political content. Defining text units exogenously as natural language sub-units separated by one of a predefined list of punctuation marks, by contrast, generates perfectly reliable unitization, with no measurable cost in terms of the content validity of the resulting estimates.

William Lowe and Kenneth Benoit. “Estimating Uncertainty in Quantitative Text Analysis“. Paper prepared for the 2011 Midwest Political Science Association. Version: 30 March 2011.

Several methods have now become popular in political science for scaling latent traits— usually left-right policy positions—from political texts. Following a great deal of de- velopment, application, and replication, we now have a fairly good understanding of the estimates produced by scaling models such as “Wordscores”, “Wordfish”, and other variants (i.e. Monroe and Maeda’s two-dimensional estimates). Less well understood, however, are the appropriate methods for estimating uncertainty around these esti- mates, which are based on untested assumptions about the stochastic processes that generate text. In this paper we address this gap in our understanding on three fronts. First, we lay out the model assumptions of scaling models and how to generate un- certainty estimates that would be appropriate if all assumptions are correct. Second, we examine a set of real texts to see where and to what extent these assumptions fail. Finally, we introduce a sequence of bootstrap methods to deal with assumption failure and demonstrate their application using a series of simulated and real political texts.

Kenneth Benoit and Michael Laver. “The Dimensionality of Political Space: Epistemological and Methodological Considerations.” Version: February 10, 2011.

Spatial characterizations of agents’ preferences lie at the heart of many theories of political competition. These give rise to explicitly dimensional interpretations. Parties define and differentiate themselves in terms of substantive policy issues, and the configuration of such issues that is required for a good description of political competition affects how we think substantively about the underlying political space in which parties compete. For this reason a great deal of activity in political science consists of estimating such configurations in particular real settings. We focus on three main issues in this paper. First, we discuss the nature of political differences and from this construct an interpretation of the dimensionality of the political space needed to describe a given real setting, underscoring the essentially metaphorical and instrumental use of this concept. Second, we contrast ex ante and ex post interpretations of this dimensionality. Third, we illustrate potential hazards arising from the purely inductive estimation of political spaces using a spatial example from the physical world and political competition in the EU Parliament as a political example.

McElroy, Gail and Kenneth Benoit. “Policy Positioning in the European Parliament.” Version: January 21, 2011.

Party politics in the European Parliament consists of competition between transnational party groups, each consisting of multiple national member parties from the EU’s 27 member states. Characterizing the policy space that these parties inhabit and their ideological positions is both practically and conceptually challenging. In this paper we characterize this policy competition by tracking EP political groups from three separate, original expert surveys taken in 2004, 2007, and 2010. We look at the relative positioning of the groups on multiple dimensions of policy, as well as changes in party group policy since 2004. Additionally, we characterize the policy cohesion of party groups by examining the relative positions of each group’s constituent parties, using independent national level expert surveys. The results reinforce previous findings that EP party groups not only occupy the entire range of the left-right spectrum, but also are clearly distinguishable from one another in policy terms. Moreover, their national party makeup consists of parties that are broadly cohesive in terms of their policy locations.

Slava Mikhaylov, Michael Laver, and Kenneth Benoit. “Coder Reliability and Misclassification in the Human Coding of Party Manifestos.” Version: February 19, 2010.

The long time series of estimated party policy positions generated by the Comparative Manifesto Project (CMP) is the only such time  series available to the profession and has been extensively used in a wide variety of applications. Recent work (e.g. Benoit, Laver,  and Mikhaylov 2009; Klingemann et. al. 2006, chs. 4–5) focuses on non-systematic sources of error in these estimates that arise  from the text generation process. Our concern here, by contrast, is  with error that arises during the text coding process, since nearly  all manifestos are coded only once by a single coder.  First, we  discuss reliability and misclassification in the context of  hand-coded content analysis methods.  Second, we report results of a coding experiment that used trained human coders to code sample manifestos provided by the CMP, allowing us to estimate the reliability of both coders and coding categories.  Third, we compare  our test codings to the published CMP “gold standard” codings of  the test documents to assess accuracy, and produce empirical estimates of a misclassification matrix for each coding category. Finally, we demonstrate the effect of coding misclassification on the CMP’s most widely used index, its left-right scale.  Our findings indicate that  misclassification is a serious and systemic problem with the current CMP dataset and coding process.