By Brandon Mills,2014-04-20
    In 2005, the Education and Skills White Paper raised some concerns over GCSE coursework and QCA was asked to conduct a thorough review. Its review highlighted positive strengths in coursework, but also the abuse of the internet; plagiarism; authentication of students work and the manageability of coursework for students and teachers needed strengthening. It was decided that controlled assessment replaced coursework. Controlled assessment applies different levels of control for the three processes: task setting, task taking and task marking. This paper evaluates the validity and reliability of GCSE controlled assessment using Crooks, Kane and Cohen‟s (1996) eight linked stages of

    validity chain model. The review highlighted that validity of controlled assessment can be enhanced or reduced depending on the level of control applied to the three processes.

Key Words

    Controlled assessment GCSE coursework



    Crooks, Kane and Cohen GCSE assessment

    The English government holds the overall responsibility for our education system, including the curriculum, assessments and the development of qualifications. The system is highly regulated and each successive government is known to increasingly review the curriculum, implement their own changes and intervene with the educational system in the aim of improving standards and achievement (Isaacs, 2010).

    Until recently, the majority of General Certificate of Secondary Education (GCSE) subjects had an element of internally assessed coursework. However, the Qualifications and Curriculum Authority (QCA), which maintains, develops and advises the government on the National Curriculum and qualifications published a report in in October 2005 raising concerns over GCSE and General Certificate of Education (GCE) coursework. Consequently, controls for coursework were tightened. As part of the government‟s reform of 14-19 learning,

    in 2006 GCSE‟s were again reviewed and the report suggested that „controlled

    assessments‟ replace coursework in the majority of subjects (QCA, 2007).

    This paper aims to explore whether this new form of GCSE assessment (controlled assessment) fully addresses the concerns surrounding coursework or if it is yet another opportunity for the government to reinforce its control over assessment. It begins by briefly outlining the three main concerns raised from the 2005 14-19 Education and Skills White Paper, which assigned QCA with the responsibility of reviewing GCE and GCSE coursework arrangements to be implemented for new specifications being taught from September 2009. Controlled assessment is then described, along with its purpose. The validity and reliability of assessments are discussed and Crooks, Kane and Cohen‟s (1996) eight linked stages of

    validity chain model is then used as a structure to evaluate the validity and reliability of controlled assessment. This validity chain model has been selected because it focuses on issues such as the type of task used, how it is administered and scored, the aggregation of

    scores, the evaluations and judgements made and the impact on participants. Strengths for validity include achieving assessments objectives and students reaching their full potential when tasks are set by the awarding body. If tasks are set by teachers, reliability and validity may be compromised as many teachers find the technique of setting tasks against assessment criteria difficult. The paper also evaluates how controlled assessment can be further improved and concludes by determining if controlled assessment was the ideal way forward.

Coursework Review

    Coursework was defined by QCA (2005:6) “as any type of assessment of candidate performance made by the centre (that is, the school or college) in accordance with the specification (or syllabus) of the course of study that contributes to the final grade awarded for GCE or GCSE qualification”.

    In February 2005, the Education and Skills White Paper (DfES, 2005) raised three main concerns about coursework assessment which QCA was asked to address. The first issue was making sure there was consistency in the approach of coursework in similar subjects and ensuring it examined attributes and skills which could not be assessed by final examinations. The second issue related to fairness, and the third concern was that coursework placed a heavy burden on students. Fairness in assessment is used to represent the more technical term equity. Equity is defined in the dictionary as „the spirit of justice; a definition which is supported by Stobart (2005). Equity in assessment is used to imply that the “assessment practice and interpretation of results are fair and just for all groups” (Gipps and Murphy, 1994, p.18). The White Paper also asked QCA‟s review to

    guarantee that coursework examined what it purported to examine and similar marks were being awarded for similar pieces of work.

In November 2005, QCA published their review of GCE and GCSE coursework

    arrangements. The main aim of the review was “to canvass opinions on the effectiveness of

    coursework in teaching, learning and assessment; to examine issues relating to the authentication, marking and moderation of coursework and to ensure that appropriate risk management procedures are in place to minimise the potential for malpractice” (QCA 2005,

    p.3). Ken Boston, the then Chief Executive of QCA, sent a letter (see Boston, 2005) to Ruth Kelly the Secretary of State for Education and Skills at the time, advising that although its report highlighted positive attributes of coursework, some current practices needed to be strengthened immediately. These included helping teachers to ensure that the coursework assessed was the student‟s own piece of work. Yet, a research study on teachers‟ views on GCSE coursework conducted by MORI in 2006 for QCA concluded that only 10 per cent of teachers believed that students may have received help and only 7 per cent of teachers could not confirm if students did the coursework themselves, whilst 7 per cent of teachers were unsure if there was an element of cheating and collusion among their students (MORI 2006:19). Another practice which Boston (2005) stated needed urgent attention was the abuse of internet usage, along with this tool dramatically increasing plagiarism. In contrast, the MORI (2006, p.23) research study reported that 82 per cent of teachers rejected and disagreed with the claim of students using the internet far too much and 63 per cent opposed the statement that the use of the internet made it difficult for them to authenticate coursework; only 31 per cent of teachers agreed. The report further highlighted that 66 per cent of teachers were opposed to removing coursework and 51 per cent were strongly opposed to its removal. The majority of teachers felt that the main drawback of coursework was the burden of marking them.

Evidently the MORI (2006) research study contradicted Boston‟s report as it highlighted that

    the teaching profession were not in support of his findings. So why did QCA proceed with controlled assessment? I believe QCA took the decision to implement controlled assessment

    because as regulators, they were under great pressure from Alan Johnson (Ruth Kelly‟s replacement as Secretary of State) to maintain public confidence in internal assessments and this was emphasised throughout Johnson‟s correspondences to Boston (see Johnson,


    Following on from QCA‟s 2005 review, in 2006, it presented a report focusing on the arrangements required to be implemented for new GCSE specifications which were to be taught from September 2009. Its recommendations on how GCSEs would be assessed were based on three key principles (QCA 2006a, p.17)

    ; the intended learning outcomes in a subject are the critical factors

    determining the appropriate form of assessment to use

    ; the most valid (including reliable) form of assessment for a learning outcome

    should be determined so that results are fair and robust in any circumstances

    and maintain public confidence in them

    ; the assessment process should be manageable

    These three principles were to ensure a higher level of consistency in GCSE assessments for the future; therefore QCA decided that coursework and non-coursework alternatives would no longer be available in a given subject. It was to be replaced by controlled assessments (QCA, 2007).

Controlled Assessments

    The main purpose of controlled assessment is to achieve validity and reliability; sustain good teaching and learning; allow good manageability for both teachers and students and to enable teachers to confidently authenticate students work (QCA, 2007). Controlled

    assessment is a new form of summative (see Wiliam and Black, 1996; Stobart and Gipps, 1997), internal assessment but differs from older forms of internal assessments as control

    levels are set for the three processes: task setting, task taking and task marking. A range of possible levels of control can be applied to each of the three processes and the QCA (2007) document will be used to describe the various levels of control, which can be found below.

    For the task setting process, at the highest level of control, the awarding body sets the tasks. At the next level of control, teachers devise their own tasks in line with the awarding body‟s

    guidance and criteria, which is then approved. A step down from this, teachers set the task in accordance with the awarding body‟s guidance, however the task devised is not approved.

    At the lowest level of control, teachers devise their own task with limited guidance from the awarding body.

    The task taking process entails the biggest variation. The enhanced level of controls presents the public with greater confidence in the assessment and allows teachers to have assurance in authenticating students‟ work. At the highest level of control, students‟ work is

    completed under teachers‟ direct supervision (authenticity control); the awarding body specifies tight deadlines for student feedback (feedback control); the amount of time given to students to complete the work is limited (time control); all the work is completed by students individually (collaboration control) and the awarding body specifies a limited access to resources (resource control).

     At a medium level of control, parameters are defined by the awarding body for three or four key controls and the rest are defined by the school or college. One step down, parameters are defined externally by the awarding body for one or two key controls and the rest of the key controls are defined by the school or college. At the lowest level of control, the parameters for all key controls are defined by school or college.

    For the task marking process, at the highest level of control, the task is marked by the awarding body. Next step down, the task is marked by the teacher using the awarding body‟s mark scheme and the marking is externally moderated. A step down from this, the

    awarding body trains and accredits teachers in marking, thus external moderation does not occur. At the lowest level of control, teachers complete the marking using guidelines, without any external moderation.

    Although high levels of controls are likely to secure validity and reliability, it is not possible to select the highest level of control for each process as the assessment would resemble external examination requirements and would no longer be a controlled assessment. Likewise, the lowest levels of control are almost never used since that would undermine the principles supporting the creation of controlled assessment in the first place.

Validity and reliability of an assessment

    The validity of an assessment is the scale to which a test measures what it purports to measure (Gipps, 1994; Borsboon et al., 2004; Stobart and Gipps, 1997). Messick (1989) discusses validity in relation to the inferences derived from assessment outcomes. Five types of validity are discussed in early writings (Stobart and Gipps, 1997): predictive validity

     whether a test can correctly predict the outcome of a future performance; concurrent

    validity whether a test produces the same outcome as a different test of the same skill (does two test correlate with one another?); construct validity whether the test measures

    the fundamental skills being assessed; content validity whether the test appropriately

    covers the skills to be assessed; criterion validity a combination of concurrent and

    predictive validity. Wiliam (1992) and Wood (1987) discuss other forms of validity. Due to the varying emphasis on validity, test development in the past concentrated on only one or two of these types of validity (Stobart and Gipps, 1997). However, recent literature by Messick (1989) and Cronbach (1988) insists that validity is a combined concept; where social impact

    and values of tests must be considered. Messick (1989) argues that a test is invalid if the results are misused.

    On the other hand, reliability relates to whether an assessment can be repeated to achieve the same outcome, with little error. In other words, can the test be repeated to achieve identical results? The issue of reliability is particularly important to assessment developers (Stobart and Gipps, 1997). Crooks, Kane and Cohen (1996) view reliability as a component of validity. They believe marking reliability is part of validity because markers can influence the confidence in the inferences drawn from assessments (Crisp, 2009).

    Controlled assessment was introduced to improve both reliability and validity of internal assessments (QCA, 2007) and the risks affecting reliability and validity can be reduced by the levels of controls applied to task setting, task taking and task marking process.

Crooks, Kane and Cohen’s (1996) eight linked stages of validity chain model

    Crooks, Kane and Cohen‟s (1996) eight link stages of validity chain model will be used as a structure to examine potential increase in validity and possible threats associated with controlled assessment. Although other frameworks such as that of Frederiksen and Collins (1989) could have been used for this discussion, Crooks, Kane and Cohen (1996) has been selected as it focuses on the validity that controlled assessment possesses. Frederiksen and Colins‟ (1989) framework concentrates on performance assessment and characteristics contributing to or detracting from systemic validity, such as directness of measurement, transparency, reliability and scope. Although these characteristics are important elements of validity, Messick (1994) criticises Federiksen and Collins‟ framework as it can neglect other

    essential elements of validity, including scoring and interpretation.

