This page contains replication materials and the error dataset providing estimates of random error in the Comparative Manifesto Project’s dataset of coded quasi-sentence from election programmes. These are described in:
Kenneth Benoit, Slava Mikhaylov, and Michael Laver. 2009. “Treating Words as Data with Error: Uncertainty in Text Statements of Policy Positions.” American Journal of Political Science 53(2, April): 495-513.
If you wish to replicate the article, read the section that follows. If you want to use the error dataset for your own purposes, see the section entitled “Error Dataset for End Use” below.
Replication Materials
The simplest method is to download both the error dataset and the replication code (in R), including the datasets from Adams et al (2006) and from Hix et al (2006), as a zipped archive:
All the analysis in the article can be reproduced by running R script contained in blm_ajps_replication.r. In order to produce standard error estimates for the CMP data you need a CMP dataset. For our analysis we used a concatenation of datasets published in Mapping Policy Preferences and Mapping Policy Preferences II.
Part of our analysis is the replication of two recent AJPS publications, listed below. In order to replicate the analysis in BLM (2009) you will need need the corrected datasets from both of these articles, available below but also found in the replication zip file. Both datasets are also available on this website with the kind permission of the authors.
- Adams, James, M. Clark, L. Ezrow & G. Glasgow. 2006. “Are Niche Parties Fundamentally Different from Mainstream Parties? The Causes and the Electoral Consequences of Western European Parties’ Policy Shifts, 1976–1998.” American Journal of Political Science 50(3):513–529. Corrected dataset here.
- Hix, Simon, Abdul Noury & Gerard Roland. 2006. “Dimensions of Politics in the European Parliament.” American Journal of Political Science 50(2, April):494–511. Corrected dataset here.
The key outcome of our analysis is to produce standard error estimates for the CMP data. Our results are contained in Stata format in the file BLM_CMP_uncertainty.dta contained in the replication zip file).
Error Dataset for End Use
Our research has advanced a bit since the publication of this article, and we have extended the method to cover several cases from the CMP dataset that were missing from the analysis in the AJPS article. In particular, we have added the following cases that are not in the original article:
- The (post-communist) manifestos that make use of the extended categories from MPP2 are now added to the dataset, although only for the non-extended categories.
- Cases where the peruncoded variable was missing for Sweden and Norway are now included, with peruncoded assumed to be zero. This affected 63 (from 99 total) coded manifestos in Sweden and 78 (from 106 total) in Sweden.
- Cases from the 1989 Norway election (7 in total) where the total variable was missing are now included, with total imputed as the midpoint between each party’s total sentences from the adjacent (1985 and 1993) elections.
This revised dataset can be downloaded in Stata format as: BLM_CMP_uncertainty_extended.dta.
These standard error estimates can and as we showed should be used in any research that utilizes the CMP data. Our standard error dataset contains two variables that uniquely identify the results: cmp_party (a numeric code identifying the country and party, using the CMP’s numeric party coding scheme) and cmp_edate (a date-formatted variable indicating the election). These two unique identifiers allow merging standard error estimates with any CMP based dataset. For example, in Stata this is done simply by typing the following commands:
use BLM_CMP_uncertainty_extended.dta, clear merge party edate using YOUR_CMP_DATASET.dta, sort
A similar merge can be performed in R using the merge() function:
require(foreign)
CMP.original <- read.dta("mds2005f.dta", convert.factors=FALSE)
CMP.BLMerror <- read.dta("BLM_CMP_uncertainty_extended.dta")
# convert CMP's edate to numeric so can be matched to BLM file
# and harmonize the names of the election date variable
require(date)
CMP.BLMerror$edate <- as.date(CMP.BLMerror$cmp_edate)
# harmonize name of the party variable
names(CMP.original)[which(names(CMP.original)=="party")] <- "cmp_party"
# merge the files
CMP.merged <- merge(CMP.original, CMP.BLMerror, c("cmp_party","edate"))
Logit-scaled dataset
If it’s left-right scales of policy that you are looking for, be aware that we have a paper that applies a new scaling procedure, including our error method, to two dozen new left-right scales constructed from the CMP categories that uses a logit-scale. Our recommendation is to use this dataset instead of any scales provided by the CMP. This paper and the current snapshot of the dataset can be found from my current research page.