It has been noted that the field of public administration lags behind compared to other social sciences in the application of advanced research techniques (Groeneveld et al. 2015), for instance when developing and validating measurement scales. Public administration scholars usually do not employ scale development as rigorous as scholars in other disciplines (Grimmelikhuijsen et al. 2016). Although many factors affect the quality of surveys and scale development procedures (Lee et al. 2011), a crucial component is the degree to which the measurement scales employed truly measure what they are supposed to measure: “The point is not that adequate measurement is ‘nice’. It is necessary, crucial… Without it we have nothing” (Korman 1974, 194).
In light of the finding that the public administration discipline is becoming increasingly quantitative and the number of articles based on the analysis of quantitative survey data increased significantly (Groeneveld et al. 2015), the quality of employed surveys is becoming ever more important. This is expected to increase even more in light of the emerging tradition of a behavioral public administration (Grimmelikhuijsen et al. 2016). In line with this ‘emerging tradition’, it seems that the number of high quality measurement scales is steadily increasing, including for instance the work on public service motivation (Kim et al. 2013), public service ethos (Rayner et al. 2011) and public leadership roles (Tummers and Knies 2015).
To assess, among others, the correctness of this claim, this article systematically and rigorously reviews the scale, scope and quality of measurement scale development in public administration research. Therefore, our main research question is: What is the current state of scale development practices in public administration research and what suggestions for further improvement can we distill from this?
We start our review in 1995, when the well-used General Red tape (GRT) and Personnel Red Tape (PRT) scale (Rainey et al. 1995) were published, immediately followed Perry’s (1996) well-recognized measure of public service motivation. We end in 2016, thereby covering over 20 years of research. During the systematic review, we adhere as much as possible to the widely used ‘Preferred Reporting Items for Systematic Reviews and Meta-Analyses’ (The PRISMA statement, referred to as PRISMA from here on), which ensures transparent and complete reporting (Moher et al. 2009). Furthermore, we compare the sample of articles focusing particularly on scale development to a random sample of articles that do not focus particularly on scale development, but in which the authors develop scales to measure core concepts of their study (as a 'side product').
Our analyses show that, overall, the number of measurement scales developed is steadily increasing, and, apparently, so does the quality of these measures. However, the results also indicate that valuable lessons from other disciplines are not yet incorporated. Summarizing, our results suggest that public administration scholars aiming to develop scales to measure the perceptions, opinions or experiences of (individual) public sector actors, should strengthen their methodological work. Based on our review, we provide a number of suggestions to further strengthen rigorous and valid scale development.
F1b - Behavioral and Experimental Public Administration: Leadership and Decision-Making