Assessing Free Schools: Impact Evaluation for Small Scale Interventions
Using Non-State Provision in Education
;Harry Anthony Patrinos and Laura Lewis
Internationally, there is increasing use of the non-state sector in
school provision. Such interventions can be evaluated even when the
initial intervention is small scale. This note outlines the options open
to policy makers and the relative merits of each.
Governments around the world are looking for innovative ways to improve school quality. International tests such as PISA, TIMSS and PIRLS have led Governments to not only look inwards at performance, but also to compare the quality of education with countries around the world. A recent report by the OECD found that raising PISA performance by 25 points over next 20 years implies an aggregate gain in GDP of ?68 trillion over the lifetime of the generation born in 2010.In the United Kingdom, this could translate to an increase in GDP of ?3.5 billion.
One innovative approach to system-wide school improvement involves enabling non-state actors to set up and operate schools funded by government finance. The establishment of such autonomous schools is based on an assumption that they will have a positive impact on key education outcomes, including performance in standardised tests. A current example of this approach is the establishment in England of Free Schools, new all-ability schools which will be opened in response to local needs and parental demand. The rationale for these Free Schools has been clearly articulated by the Secretary of State for Education.
The rationing of good schools must end. Our reforms are about creating a
generation of world-class schools, free from meddling and prescription, that
provide more children with the type of education previously reserved for the rich.
Secretary of State for Education Michael Gove
; Visiting Research Fellow, CfBT Education Trust and Lead Education Economist, World Bank; and Senior Consultant, CfBT Education Trust. The views expressed here are those of the author and should not be attributed to CfBT Education Trust or the World Bank Group.
Reform programs, such as the Free School initiative, are likely to be controversial and receive more scrutiny than other education programs, In England as elsewhere, sections of the media, and many of the education trade unions, are likely to adopt a position of hostility to reforms of this kind. It is therefore imperative that such reforms be subject to rigorous impact evaluation.
Impact Evaluation Continuum
Innovative new programmes sometimes however start on a small scale. For example, fewer than thirty new Free Schools will open in the first wave in September 2011. Initiatives are also often started without serious considerations to how their impact will be evaluated. Policymakers and providers have a number of options as to how they can evaluate their impact. The impact evaluation continuum below outlines those options
The results from impact evaluations can help inform policymakers on how to allocate scarce resources and can also provide evidence on whether current policies are working or not. Impact evaluations can also be useful for providers to ascertain whether they should scale up the specific practices currently used in non-state provision.
Impact evaluation techniques compare the impact on the beneficiaries of a certain policy intervention or project with a counterfactual group that has not been exposed to the same intervention or project. The best mechanism of
guaranteeing a proper counterfactual and unbiased evaluation is
randomization. This method gives all an equal chance of being in the control or treatment groups. The counterfactual gives us the information needed to ascertain what would have happened without the program, which is crucial for inferring causation. It guarantees that all factors and characteristics will be on average equal between the two groups. The only difference is the intervention.
There are two extreme examples, the full national pilot randomized control study which would provide causal information about the program overall, and the observational study which might be useful for monitoring but would not provide any information about causation. We ignore these two extremes as one is not possible in the short term and the other is not useful for assessing causation.
We take each of the impact evaluation continuum options and describe how these could be applied to a new intervention, such as, for example, Free Schools:
1. Small Scale Randomised Control Trial:
； Definition: Randomly allocates students to a treatment or control group, so
each student has an equal chance of being in either the treatment or control
； Evaluation comparison: Students are allocated on a random basis, such as
through a lottery. The evaluation compares the student outcomes of those
who were successful in the lottery and gained a place and those who were
not successful in the lottery.
； Benefits: Unbiased estimate of the impact of the individual intervention.
Can be used to prove efficacy and provide a basis for the roll out of the
； Example: The recently published randomised evaluation of 457 students at a
KIPP charter school in Lynn, Massachusetts used this technique.
； Issues: A small scale randomised control trial would not establish that this
program works on a national level. A follow up large scale random control
trial would need to be used.
； Uses: Some Free Schools are using a partial lottery for allocating students.
Students have an equal chance of being allocated a place at the school thus
creating a treatment and control group.
； Definition: Randomly allocating encouragement to join the program
rather than places, so parents and students within the target group have an
equal chance of receiving the encouragement/information
； Evaluation comparison:
a. The comparison is between the student outcomes of those who are
given information and act on the information and the student
outcomes of those who are in the same target group but did not
receive the information (intent to treat).
b. Reviewing the quality of the encouragement provided, by looking at
the compliance rate, what percentage of those encouraged actually
chose the intervention.
； Benefits: Unbiased estimate of the impact of the encouragement. Can also
be used to help target specific beneficiary groups.
； Example: In an employment program job seekers were given a screening
questionnaire. They were then randomly assigned to treatment or control
group. Treatment participants received an invitation and a financial
inducement to participate in the United States’ Jobs Program. The control
group was still eligible for the Jobs Program but received no encouragement. ； Issues: The impact of the intervention is only valid for the target group that
is encouraged, not for all participants in the intervention/school. Therefore,
the external validity of encouragement is lower than a small scale
randomised control trial where inferences can be made for all students in the
； Uses: Free Schools may not wish to allocate all places by lottery but may
still wish to allocate places for a certain group of underprivileged children in
an area. The Free School could then randomly apply encouragement to
parents within the targeted geographic area about the Free School and its
approach to education.
； Definition: entry into the program happens in different time periods
； Evaluation comparison: Compares student progress of those in the first
wave with student progress of those who will enter the program in the future.
； Benefits: Creates a counterfactual group by exploiting differences in
； Example: A recent study of English Academies compared the original
academies to a subsequent cohort of Academies.
； Issues: There must be similarity of the student cohorts entering in the first
phase with those entering in subsequent phases. This is often difficult due to
policy changes which affect the program implementation. The approach also
requires comprehensive longitudinal data.
； Uses: The progress of students in the first wave of Free Schools can be
compared to the progress of students in schools who will become students of
a Free School in a later wave. Both groups have effectively self-selected (for
example, they almost certainly had identical motivations) to enter into the
program but the subsequent wave do not currently exist.
； Definition: Uses observable characteristics to create a group similar to the
； Evaluation: Compares student outcomes of schools in the intervention with
similar schools in terms of size, geographical area, students served, and so
on, who are not in the intervention.
； Benefits: Statistically controls for differences in observable characteristics. ； Example: Colombia Concession Schools.
； Issues: The problem with matching is that there are unobservable factors,
for example motivations in school, which may have a significant impact on
student outcomes which often cannot be controlled for.
； Uses: Free Schools could compare the outcomes of those other schools who
most closely match the Free Schools with the Free Schools themselves.
For the following non-random evaluation tools we present a quick overview:
5. Within-group intervention:
； Definition: Compares subgroups within the intervention. It can focus on
outcomes but could also be used to look at processes.
； Evaluation comparison: An example would be comparing the outcomes of
the children whose parents were given information to the children whose
parents were not given information. The information is not randomly
assigned but can provide comparison groups within the school.
； Definition: Outlines the effective elements in the delivery of the
； Evaluation comparison: Process evaluations measure what is done by the
program, and for whom these services are provided. Ideally, process
evaluations assist in the identification of “active ingredients” of treatment,
and assess whether a program is meeting accepted standards.
7. Effectiveness study:
； Definition: Effectiveness studies look at whether interventions work under
； Evaluation comparison: Comparing standardised test scores at the start of
the intervention and comparing their scores at the start of the next year. This
would give a sense of achievement at the school, among those treated, but
would not be representative or a randomised study.
Until the full national pilot randomised trial can be undertaken, then the information from small scale evaluations or other rigorous techniques, or even from efficacy and efficiency studies, can provide a sense of the scale and time needed to see impact. This is crucial for setting up a more elaborate evaluation because then we would know how long it might take for impact to occur and what sort of effects to see, for instance, by how much will test scores change over time.
The time scale of the evaluation is also important. Geoffrey Borman and colleagues reviewed over 200 studies on whole school reform in the United States and showed significant impacts only after five years of implementation, though rising to half a standard deviation after eight years. Michael Fullan states that its takes up to eight years to turn around a school, meaning a significant increase in student achievement. These, of course, are timescales for the turnaround of Government schools; Free Schools are new non-state provided schools. The recent evaluation of the Harlem Children’s Zone’s Charter Schools shows one of the largest effect sizes – at 0.3 to 0.5 of a standard deviation – for non-state provision.
The program was evaluated after two years. The program covered a middle school but longitudinal data allowed comparisons of performance before entering the middle school and after two years at Promise Academy. This is an approach which could also be used by Free Schools due to the longitudinal student database in the United Kingdom with Key Stage tests at ages 7, 11 and 16.
It is also important to ensure an impact evaluation is in place early in the intervention to capture the effect of the new innovative approach. Retrospective evaluations can be used but are far more difficult to implement, are likely to have more biases and more investment is needed in order to collate a higher amount of data to support the evaluation. The possibility of hanging political will is also another reason for ensuring the impact is captured, making it more likely that policy decision are evidence based.
Evaluation techniques are not mutually inclusive. An evaluation can use multiple evaluation techniques for example even if a randomised control trial is used to look at outcomes, a process based tool could also be used to document the processes in place which supported those outcomes. An evaluation initial tool could also be used first in order to prove the concept before a more expensive rigorous impact evaluation takes place. Small scale randomised trials can also outline the effect size/ impact of the intervention. This can then be used to determine the sample size needed for a large scale randomised control trial.
Communications are important. Creating a culture of evaluation will ensure all stakeholders understand that the evaluation results will be used to demonstrate impact. The use of any rigorous technique helps build acceptance of the concept and provides evidence to help support the innovation. It is therefore important to ensure all stakeholders understand the importance of impact evaluation going forward and their role in helping to support the evaluations. Stakeholders will include educators, administrators, providers, managers, policy makers, parents and the general public. Transparency and consultation should be used to strengthen buy in to the impact evaluation.
New forms of school provision, such as Free Schools, offer the possibility of improved educational outcomes. However this promise cannot be realised without systematic evaluation. It is the Government’s responsibility to ensure their impact is effectively measured at a small scale, so those programs that are demonstrably effective are expanded and those that are ineffective are discontinued. There are several ways to measure impact evaluation with best practice being randomised control trials (RCT). Other impact evaluation methods can be used in the interim or combined with RCTs to increase knowledge on the process which make the program is effective. It is important to create a culture of evaluation both within Government and school providers, while simultaneously ensuring wider stakeholders also understand the importance of evaluation. The Free School initiative could use impact evaluations to provide evidence on their effectiveness. Since Free Schools are at an early stage of development, an investment in evaluation is now urgently needed. Large scale roll out should follow on from the proven success of the early small scale implementation. Retrospective evaluation is expensive and difficult.