Frameworks of State: Assessment Policy in Historical Perspective
Baruch College, CUNY
This paper uses archival and secondary sources to examine the early history of state student assessment in the United States. While it is generally accepted that school evaluation and accountability are the raison d’être of assessment policy making in the United States, between 1865 and 1965 an accountability model for state testing failed to take hold, despite numerous attempts. Instead, state testing programs were used in this period for
purposes of high school admissions and student guidance. In accounting for these policy developments, this paper argues that state assessment policies cannot be reduced to the intentions of policy makers or the response of those
same governments to powerful interests. While both intentions and interests are important, the study focuses on a third level of analysis--ideas. Specifically, the paper stresses the role that institutionalized clusters of normative and causal ideas play in educational policy making. These idea structures, or policy frameworks, define the core
principle or principles that animate state action, the legitimate aims served by intervention and the manner in which these ends are to be achieved. Three frameworks--examination, guidance, and accountability--play a
prominent role in the history of assessment policy making in the United States.
It is commonly accepted that school evaluation and accountability are the raison d‘être of
assessment policy making in the United States. Yet between roughly 1865 and 1965, a period that begins with the emergence of a ‗preliminary‘ grammar school leaving examination in nineteenth century New York State and closes with a wave of guidance-oriented testing policies during the
early 1960‘s, an accountability model for state testing failed to take hold, despite many attempts. Although one can find individual instances of accountability-oriented testing in the states, these
disparate efforts never jelled into a coherent and wide spread policy model. In short, accountability testing is both an uncommon and relatively recent historical phenomenon.
This paper explores why this is the case. Drawing on a combination of primary and secondary
sources, the paper examines the early history of state student assessment in the United States, focusing on the period between 1865 and 1965. If not evaluation and accountability, what models and conceptions of assessment policy were prominent in the late nineteenth and early twentieth
century and why?
In answering this question, the paper argues that state assessment policies are neither reducible to the intentions of policy makers nor the response of those same governments to powerful political
interests. While both intentions and interests are important, the focus is on a third level of analysis – ideas. Specifically, the paper stresses the role that institutionalized clusters of normative and causal ideas play in educational policy making. These idea structures, or policy
frameworks, define the core principle or principles that animate state action, the legitimate aims served by intervention and the manner in which these ends are to be achieved.
In the history of state assessment in the United States there have been three, at times overlapping, policy frameworks. The examination framework emerged in the middle of the nineteenth century
and lasted through the 1930‘s. In a period of expanding elementary school enrollments and still limited opportunities for high school education, a handful of states saw testing student readiness for advanced education as a legitimate state interest. Between 1865 and 1915, twelve states developed and deployed written tests to determine admissions to high school. The ―eighth grade
examination‖ was a state-constructed essay examination, graded by teachers and/or staff in the state department of education with a passing rate set centrally. Though state examinations were employed primarily for promotion to high school, they were also used to allocate state educational resources fairly, shape teaching and learning in elementary schools, and reform – some say control –
Examinations were eventually displaced in prominence by an alternative policy model, organized
around the core concept of student guidance. State guidance testing first appeared in the 1920's
and lasts through the late 1960‘s. The core theory of action in guidance testing was that information about student capabilities, interests and achievements would allow local educators to
diagnose student learning problems and ―guide‖ pupils more effectively and efficiently. Newly available standardized achievement and ability tests were used to place these students into what were deemed appropriate academic programs. State assessment policies in the guidance era were framed around static and reductive conceptions of student ability or ―capacity‖ conceptions that constrained a belief that schools could influence student performance, except at the margins.
The most recent assessment policy framework is animated by the core notion of accountability.
The accountability model of state testing first appeared in the early seventies and persists, in somewhat altered form, to the present day. Initially, the accountability framework was rooted in a
working theory that information, private and internal, would help local educators and state officials
detect educational problems at the school level and, ultimately, use this knowledge to improve academic performance for all students. By the early eighties, states began shifting the focus of their assessment policies from detecting problems to effecting the behavior of individuals and institutions. In these recent endeavors, accountability is defined less as producing information
than effecting change by motivating students, mobilizing the public and shaping the curriculum. Key policy approaches include competency testing, reporting test scores to the general public and attaching rewards and sanctions to educational performance.
Underlying both the older and newer model of assessment for accountability is a belief that schools play a major role in determining how much students learn. Also implicit in accountability testing is a clear faith that achievement can be measured with precision by standardized tests.
Section two of the paper situates the study within the recent research on assessment policy and politics. Section three defines what a policy framework is and briefly describes the examination,
guidance and accountability frameworks. In section four, I describe the emergence and institutionalization of the student guidance framework. Section five takes the guidance framework to the period after World War II when state assessment begins to reappear on the state
and national policy agenda. In the final section, I offer some tentative thoughts regarding the shift to an accountability model for state testing and discuss the implications of this analysis for future research.
Testing is, of course, not lethal in itself, but it can be used in a most destructive fashion.
When the results of a single test are made the sole criterion for promotion of a student,
granting or withholding state moneys for school support, or the assessment or success or
failure of a teacher or school, it begins to assume a too great importance in the whole
educational picture. A high score on the test is used as an educational objective. (Bird 1961,
Though Rose Bird‘s words may sound familiar to the contemporary observer of American education, they were actually written in 1961, an era in many ways distant from our current one. The quotation can be found in an obscure report written by Bird, then a legislative intern, and submitted to the Joint Interim Committee on Public Education of the California state legislature (Bird 1961). During this period, California was considering a set of recommendations from a Citizens Advisory Commission (CAC) empowered in 1958. Among the recommendations of the Commission was a proposal for a ―mandatory statewide testing of the public schools‖ (California Interim Committee on Education 1961).
In its final report, the CAC reasoned that statewide testing was needed to ―set a minimum level of
instruction‖ and to ―properly and effectively evaluate the educational program of the public school system in California.‖ The CAC also suggested that assessment would ―stimulate high academic achievement‖ in the state. Dissenting voices on the commission argued that mandatory statewide testing would ―encourage uniformity and an emphasis on educational purposes that can be measured by objective tests.‖ The ultimate effect of state testing, the dissenters argued, would be to
―inhibit creativity‖ and limit ―the development of an educational program of high quality‖ (California Interim Committee on Education 1961).
Ultimately, the views of Bird and the dissenters carried the day. While California did pass statewide
testing legislation in 1961, the underlying approach diverged in crucial ways from the majority viewpoint of the CAC. The primary purpose of the state‘s new testing program was student guidance and the ―identification of talent,‖ not the goals of school evaluation or standard setting
advocated by the commission. California would not develop and construct its own test but require districts to select one from an ―approved list of nationally-normed standardized achievement and
intelligence tests.‖ The law also contained an explicit provision that ―no (test) results which identify the school or school district shall be made public‖ without the written consent of the school district (California Legislative Bills 1961).
Similar debates, with similar outcomes, took place in other states in the early sixties. Some states, such as Pennsylvania and Massachusetts, passed comparable laws. Others such as Oregon deliberated but did not mandate state testing, while continuing to support a voluntary testing program with a focus on student guidance. Still other states, including Washington and New Jersey,
had little or no discussion of this ―new‖ policy approach called statewide testing. Responding to a query by the Interim Committee on Education, Washington‘s State Superintendent of Public
Instruction Lloyd Andrews replied that his state had abolished its testing program in 1935 because it ―failed to serve the purpose of establishing a minimum standard for the elementary schools.‖ Washington instead was ensuring standards through curricular guides and ―self-evaluation‖ by
teachers and administrators (Bird 1961, p. 22).
To some readers, this brief historical snapshot may be surprising. It is commonly accepted that school evaluation and accountability are the raison d‘être of assessment policy making in the
United States (see Resnick and Resnick 1985). Yet between roughly 1865 and 1965—a period that
begins with the emergence of a ―preliminary‖ grammar school leaving examination in nineteenth century New York State and closes with a wave of guidance-oriented testing policies during the
early 1960s—an accountability model for state testing failed to take hold, despite many attempts. Although one can find individual instances of accountability-oriented testing in the states, these
disparate efforts never jelled into a coherent and widespread policy model. In short, accountability testing is both an uncommon and relatively recent historical phenomenon.
In this paper, I ask why this is the case. Drawing on a combination of primary and secondary
sources, the paper examines the early history of state testing in the United States, focusing on the period between 1865 and 1965. If not evaluation and accountability, what models and conceptions of assessment policy were prominent in the late nineteenth and early twentieth century and why?
In what follows, I argue that state assessment policies are neither reducible to the intentions of policy makers nor the response of those same governments to powerful political interests. While
both intentions and interests are important, the focus is on a third level of analysis—ideas.
Specifically, the paper stresses the role that institutionalized clusters of normative and causal ideas play in educational policy making. These idea structures, or policy frameworks, define the core
principle or principles that animate state action, the legitimate aims served by intervention, and the manner in which these ends are to be achieved. Three frameworks—examination, guidance, and
y—have played a prominent role in the history of assessment policy making in the accountabilit
As the reader may have gleaned by this point, I use the terms testing and assessment interchangeably throughout this paper. Though sympathetic to those who would use assessment to
refer to a broader set of evaluation tools (Gipps 1994), this paper follows George Madaus (1994) in arguing that, in practice, the terms mean the same thing. To Madaus, testing and assessment are both methods of measurement:
where we elicit a small sample of behavior from a larger domain of interest . . . to make
inferences about a person‘s probable performance relative to the domain. Then on the basis
of those inferences, we classify, describe or make decisions about individuals or institutions.
An additional feature of this definition is that it leaves open the specific purpose or purposes of
assessment, an issue that becomes particularly important when tests are employed in the political realm as policy instruments (Baker 1989; Haertel 1989; Linn 1993; McDonnell 1994a, b). Deborah Stone (1988) defines policy instruments as ―strategies for structuring relationships and coordinating behavior to achieve collective purposes‖ (p. 208). Achievement or other forms of testing
become instruments of public policy when they are (1) initiated by a public authority external to the
school, such as a state legislature or department of education, and (2) used by this external authority to accomplish one or more public purposes.
By assessment policy or state assessment I mean the employment of written tests or examinations
s. Note a few things about this by state or national governments for explicit policy purpose
definition. First, state assessment can make use of any number of different test
technologies—achievement or ability testing, essay or multiple choice items, machine scored or teacher graded. Second, assessment policies can either be formally legislated or more informally used by a state administrative agency—they may be mandated or voluntary. Third, states can
design their own test, have one customized by a private company, or purchase an assessment also used by other states or districts.
The advantage of such an open-ended definition is that it allows for crucial choices about
technology, authority, and design, as well as the overall purpose and rationale behind state assessment to e. Part of understanding why governments create vary across time and spac
assessment policies is understanding how and why these specific choices are made.
In addition to their information capabilities, tests can also be employed more mechanistically, as devices to shape behavior or create effects (Baker 1989; Linn 1993). College entrance examinations, for example, are used to provide information for admissions decision makers and to motivate
students to study and teachers to focus on certain subjects and curricular items. Christopher Hood (1886), a British political scientist, characterizes this split as a general division in the types of
policy instruments governments have at their disposal, distinguishing between a state‘s tools for detecting problems and those for effecting change. ―Detectors are all the instruments that
government uses for taking in information. Effectors are all the tools that government can use to try
and make an impact on the world outside‖ (p. 3). Put another way, policy makers can use student assessments in two distinct ways—to monitor schools or to change their behavior.
WHY STUDY THE HISTORY OF STATE ASSESSMENT?
The last twenty years have produced an ongoing and often heated debate among scholars, policy makers, and practitioners over the proper role of testing in education (Taylor 1994; Nettles and Nettles 1995; Baron and Wolf 1996). Though most parties agree that tests provide powerful
leverage for state and local policy makers, they differ over the impact of mandatory testing on teachers, students, and schools (compare McDonnell 1994a, b; Koretz 1996; Broadfoot 1996; BOTA 1999). Recent debates over high stakes testing in states such as Texas, New York, and
Massachusetts have taken place in this contentious political environment.
Somewhat curiously, this debate and controversy has failed to spur any sustained research on the history of assessment policy making at the state level. This lack of attention to the history of
state assessment is especially glaring in light of recent policy developments. Today, in over forty
states, test results are publicly reported and eagerly consumed by parents, the media, and
proponents and opponents of educational reform. In a few states, rewards and sanctions for schools, teachers, and students are tied to test performance, while many others consider attaching stakes to
assessment programs. Assessment policies are theorized to motivate students and teachers to improve performance and to guarantee that important subject matter is taught in schools (Cibulka and Derlin 1995; Koretz 1996). Other policy makers and reformers hope to use the information
produced by assessments to inform parents about school quality as part of a school choice or voucher system. Still others believe that testing at every grade level will end ―social promotion‖ and raise academic standards from kindergarten through graduation (Olson 1998; Quality Counts 1998, 1999).
Though historians and policy researchers will undoubtedly recognize some or all of these hopes for testing in American educational history, the accountability impulse that undergirds all of them has
never been as prominent as in recent years. It is therefore crucial to understand how and why state assessment was transformed into a high stakes, intensely political tool of public accountability. As Daniel Koretz (1992) argues, ―the forces that have caused this metamorphosis have not abated‖ (p.
In the next section, I situate this study within the research on assessment policy and politics. Section three defines what a policy framework is and briefly describes the examination, guidance,
and accountability frameworks. In section four, I describe the emergence and institutionalization of the student guidance framework. Section five takes the guidance framework to the period after World War II when state assessment begins to reappear on the state and national policy agenda. In
the final section, I offer some tentative thoughts regarding the shift to an accountability model for state testing and discuss the implications of this analysis for future research.
INTENTIONS, IDEAS, AND ASSESSMENT POLICIES
To conduct research on the history of state assessment is to ask an implicit, prior question: Why do state and national governments find tests attractive tools in the first place? Traditionally, the literature on state assessment has not focused on this question. Research on student assessment
policy making has its origins with a wave of state and federal testing efforts in the middle and late sixties. The focus of the early research was providing practical information about the uses of
assessment for local educators and policy makers (Carlson 1981; Womer 1981).
More recently, a body of ―critical‖ scholarship on educational testing has emerged in the United States and Great Britain. This research is less focused on solving problems and advising policy
makers than in trying to under stand the purpose and intent underlying the use of assessments as policy devices. Out of this scholarly literature came a picture of state testing as an explicitly political phenomenon. Two images of assessment policy—as political symbol and mechanism of
control—appear with regularity in this work (Karier 1972; Broadfoot 1979; Madaus 1985, 1988; Glass and Ellwein 1986; Airasian 1987, 1988; Corbett and Wilson 1991).
To treat a public policy measure as a political symbol is to suggest that one cannot understand the
policy by examining its plain meaning. Rather, the policy ―stands for something else‖ (Stone 1988, p. 108) and possesses or conveys some other, more important meaning in the political arena. The other
meaning most commonly attributed to symbolic policies is political reassurance—the governmental
entity producing the policy does so to show the public that it is taking action against some pressing social problem (Edelman 1973, 1988; Levin 1978; Glass and Ellwein 1986). As a political symbol,
the main purpose of assessment is ensuring that something is being done about poor educational performance. In this view, testing policies have little impact or import other than to convince citizens that government is on the job.
The imagery of control also suggests that policies have a hidden or less prominent function—to
―limit the range of acceptable behavior‖ (Weiss and Gruber 1984, p. 225) of some set of individuals or groups in a manner that serves the interests of the state. To do so, states use testing to control the curriculum and orient students and teachers to externally set standards of performance (Wise 1979; McNeil 1988). As mechanisms of control, assessment policies serve to limit the discretion of
teachers and local administrators and shift authority away from schools to higher levels of educational authority and governance (Madaus 1988; Corbett and Wilson 1991).
These perspectives share a focus on the intentions of policy makers— defined primarily as
legislators, governors, and state education officials. While the two images differ on what these actors are trying to accomplish with mandated assessments, each suggests that the perspectives and priorities of policy actors crucially shape the design and implementation of state testing efforts.
These scholars do not deny that interest groups and the general public are important, only that the wishes of these actors are given flesh and substance by policy officials.
The ―critical‖ scholarship provides important insights into why tests are so attractive to policy makers. This paper builds on, and extends, many of its concerns while adapting the approach to the somewhat different task of historical analysis. Two issues in particular stand out.
First, as a historically minded researcher, I am skeptical of attempts to attach an essential meaning to a policy instrument. In their long history as tools of government, assessment policies have served any number of different functions and purposes: setting standards, determining admissions to
advanced education, guiding and placing students, providing information to policy makers, motivating students and teachers, mobilizing concerned parties to improve the educational system, to list a few. The central question instead should be which conceptions and purposes are dominant in particular periods and why?
Second, as Stephen Ball (1994) points out, ―the purposes and intentions of political actors are important but do not provide a sufficient basis for the interpretation of policies and policy making‖
(p. 108). The critical literature equates intentions with interests; policy makers deploy tests to serve either their own political purposes or to represent interests in the broader society. What is missing from such a formulation is that educational policy making is fraught with uncertainty and conflict over how to achieve desired aims. Given the uncertainty of policy making, the principled beliefs and working theories policy makers employ become a necessary and crucial analytic focus (Cohen and
Garet 1975; Weiss 1990; Kingdon 1993; Wells et al. 1999). Whatever interests policy makers wish to serve, they need ideas in order to do so.
THE STUDY OF POLICY FRAMEWORKS
Ideas, of course, need not enter politics on the backs of public officials to become influential. Ideas can also affect policy—as Marc Eisner (1991) puts it, ―once they have an institutional presence and are integrated into the policy process‖ (p. 18). Along these lines, a diverse body of work has emerged among policy scholars. Influenced particularly by Thomas Kuhn (1961), these authors examine the role that institutionalized clusters of ideas play in public policy making. I use the term ―policy
framework‖ to capture the essence of this new concept in the literature.
To illustrate what a policy framework is, let me use an example. Hugh Davis Graham (1990), in his book The Civil Rights Era, traces the origins of civil rights policy in the United States to three interconnected pieces of legislation in the mid-sixties—the 1964 Civil Rights Act; the 1965 Voting
Rights Act, and the Open Housing Act of 1968. Graham argues that these policies had at their
foundation a handful of principles and strategies rooted in classical liberalism. The foundational principles of early civil rights policy were nondiscrimination and individualism, where ―rights inhered in individuals, not in tribes or clans or races‖ (Graham 1994, p. 14). Related to the primacy of individual rights were the ideas that these rights were universal and timeless. Fundamental civil rights were ―everywhere and for every one the same‖ (p. 14), and once recognized they could not be
From these core principles, three primary policy strategies followed. First, civil rights were to be protected negatively, by prohibiting discrimination. Second, the rights would be procedural ones; they would only guarantee equal treatment, not equal outcomes. Third, rights were best protected though the central efforts of the federal government, given the poor record of states in protecting fundamental rights. Finally, agencies and programs like the Equal Employment Opportunity Commission (EEOC) and the Office of Federal Contract Compliance (OFCC) were created to implement the strategies (Graham 1990, 1994).
In short, a policy framework is an integrated and more or less coherent set of ideas that influences and reflects the way policy makers, professionals, and the public understand and act upon a
e. Policy frameworks institutionalize dominant or settled upon conceptions particular policy issu
about problems and solutions in a given area of public concern. A policy framework contains two key
elements: a core concept or animating principal and a set of policy strategies and working theories that flow from the core concept. In the case of education, a policy framework contains both a theory of action and a theory of how students learn (see Fullan 1999).
Though frameworks are not separate from general political discourse (see Rein 1976), the keepers of the frame are specialists organized into ―policy communities‖ (Walker 1981; Smith 1989). A policy framework provides a broad, overarching template for members of the policy community.
Frameworks do not directly shape how policy makers think or act; rather they influence behavior by ―providing a limited set of resources out of which individuals and groups construct strategies of action‖ (Swidler 1986, p. 281). As Peter Hall (1993) eloquently puts it:
the terms of discourse in which that political sphere and the policies appropriate to it are
discussed constrain and enable often in highly specific ways. Even when the leitmotif of
policy is simply an overarching metaphor . . . the metaphor and its attendant elaborations
can structure many aspects of what is to be done. Policy making in virtually all fields takes
place within the context of a particular set of ideas that recognize some social interests as
more legitimate than others and privilege some lines of policy over others. (p. 291)
As Hall‘s words suggest, once such a framework is established, interests and expectations converge around it, making change difficult even in the face of shifting social, political, and economic circumstances (Pierson 1993).
STATE ASSESSMENT POLICY FRAMEWORKS
In the history of state assessment in the United States there have been three, at times overlapping, policy frameworks. These are summarized in Tables 1–3.
Table 1. The Examination Policy Framework
Core Concept Examination
Basic strategies Admissions of students to high school
Reform rural schools
Theory of action Tests as effectors (Hood 1988)
provides opportunity for
shapes teaching and learning in elementary schools
Type of test State -constructed achievement exam
Essay type, open ended
Objective, short answer, multiple choice (1920–1930)
Role of state Strong, change agent
Enforcing standards, creating virtuous
Target student Elite student desiring advanced
Student needing motivation
Underlying theory Problem of student motivation and
of learning incentives
The examination framework emerged in the middle of the nineteenth century and lasted through the 1930s. In a period of expanding elementary school enrollments and still limited opportunities for high school education (Krug 1964; Tyack 1974; Labaree 1997), a handful of states saw testing student readiness for advanced education as a legitimate state interest. Between 1865 and 1915, at least twelve states developed and deployed written tests to determine admissions to high school.
The ―eighth grade examination‖ was a state-constructed essay examination, graded by teachers, staff, or both, in the state department of education with a passing rate set centrally. Though state
examinations were employed primarily for promotion to high school, they were also used to allocate state educational resources fairly, shape teaching and learning in elementary schools, and reform—some say control—rural education.
A specific instance of the model can be seen in Kansas around the turn of the century. Beginning in 1905, the Kansas State Department of Education began preparing tests to be administered to those students completing the eighth grade. The tests were essay based and administered and scored by county superintendents. Promotion to high school was made fully dependent upon student performance with a passing rate of 75 percent (Kansas Superintendent‘s Report 1908, 1909).
Nominally designed to weed out under-qualified students, examinations were equally seen by Kansas policy makers
Table 2. The Guidance Policy Framework
Core Concept Guidance
Basic strategies Student diagnosis and differentiation
Identify and nurture talent (1955–1965)
Theory of action Tests as detectors (Hood 1988)
detects educational problems/student
provides information for school
school personnel will act on information— ―correct
Type of test Nationally norm -referenced test, purchased from private company
Both achievement and aptitude tests used
Objective, multiple choice test
Role of state Weak, voluntary assessment
Defray costs, facilitate decision making at local
Target student All students (1925 –1955)
Gifted and talented (1955–1965)
Underlying theory Student capacity determines what and how much
of learning students should learn
School quality and performance constrained by student capacity
as a means of providing educational opportunity; tools that could ―arouse and sustain the ambition
of the pupil in the country school by giving him the same goal as that set up for his city cousins‖ (Kansas Superintendent Report 1908, p. 75). Unlike comparable urban high school admissions tests, there is no evidence that state exam scores in Kansas, or elsewhere, were published in newspapers.
Examinations were eventually displaced in prominence by an alternative policy model, organized around the core concept of student guidance. State guidance testing first appeared in the 1920s and
lasted through the late 1960s (Kandel 1936; Atkins 1938; Traxler 1954; Dragositz 1971). The core theory of action in guidance testing was that information about student capabilities, interests, and achievements would allow local educators to diagnose student learning problems and ―guide‖ pupils