The Appropriate Use of Student Assessments
Today, testing and accountability, instead of curriculum and instruction, have taken center stage in schools and classrooms across the country. As more accountability provisions are piled on schools, staff, and students, attention has shifted away from what kids should be learning and moved to test scores and their implications. However, what seems to have been forgotten is that students and test scores are a reflection of what is taught in the classroom.
We know that skilled teachers who assess students using high-quality, relevant, and timely assessment instruments use the results to guide their teaching and ensure that students understand the content that is being taught. However, too many schools, classrooms, and students are being held hostage to testing policies that discourage this practice and instead promote the misuse of tests and test results. In some states and districts, these policies have lead to testing, testing and more testing without providing any useful information to teachers or students.
The AFT believes it is critical to define and describe appropriate test practices and to advocate on behalf of our members and their students for sound and aligned assessment policies at all levels of the system.
The Current Landscape
Polls and practice have shown us that too many states and districts are using tests inappropriately by:
o Requiring teachers to use large-scale, state assessments for diagnostic purposes, even
though such tests do not provide diagnostic information
o Requiring teachers to use large-scale test results to guide instruction, even though
teachers receive these test results too late to be used as a guide for instruction
o Basing high-stakes decisions, such as the ability to graduate or grade promotion, on a
single test score, even though assessment experts all agree that such decisions should
not be based on a single test score
o Using tests, including large-scale and benchmark assessments, that are not aligned to
the standards and curriculum that teachers are required to teach
o Requiring teachers to use benchmark or interim tests—in some cases as often as every
six weeks—to help predict performance on the state-level assessment, even though such
predictions are volatile at best and only force teachers to narrow their curriculum and
focus on test preparation
o Trying to measure the performance of individual teachers based on student test scores,
even though the tests are not designed to provide such specific information, many are
not aligned to the curriculum, and the methodology involved has been questioned by
education experts across the country
o Requiring that teachers spend valuable instructional time on test preparation, which
has resulted in testing becoming the center of attention, instead of teaching and
learning and has resulted in a narrowed curriculum due to the focus on testing solely in
reading and math
o Failing to provide adequate professional development opportunities for teachers to
understand the appropriate use of tests and to learn to design formative (also known as
classroom-based) assessments that are necessary to guide instruction
o Implementing policies for students with disabilities and English language learners are
unfair and serve only to discourage both students and teachers
Recommendations for a Fairer Testing Program for Our Students
It is imperative that states and districts use tests and student test results for their intended purposes. To help create a common language and understanding of what is appropriate, we’ve defined the most commonly used terms. Examples for each test type are included in Appendix A.
1. Norm-referenced tests (NRTs) should be used to compare students, schools, district,
and states with each other. NRTs give us some insight into how students in California,
for example, compared to students in New York. These tests do not tell us how well
any of these students did in relation to a standard. Instead, students are scored based
on how well they did compared to their peers. These results are typically reported as
percentiles and are reported as a “bell-shaped” curve where half of students will fall thbelow the 50 percentile and half will fall above.
2. Criterion-referenced tests (CRTs) should be used to compare individual student
performance against a specified standard. CRTs give us information about whether
students met the standards. The results are typically reported as performance levels
(basic, proficient, advanced). Student scores are based on how well they knew the
content and could answer the questions, and not on how well their peers performed on
the same questions. Data from CRTs should be used to inform
3. Formative assessments should be used to guide instruction. These assessments occur
during teaching and are embedded in instruction. Results are received instantly, which
allows teachers to adjust their instruction immediately. Most are teacher-developed,
and all should be implemented based on teacher judgment.
4. Summative assessments should be used to give a snap shot on whether students
mastered the standards by a particular point in time. These assessments occur at the
end of a unit of instruction and tell us whether students “got it.” Results are also
received anywhere from two weeks to two months later. As a result, these tests cannot
guide instruction in the short term. However, results can provide some information
regarding programmatic/instructional decisions and guide the future delivery of
material covered during the unit if, for example, all students failed to comprehend a
specific set of concepts and thus all failed to perform on certain questions.
5. Benchmark/interim assessments should not interrupt classroom instruction and
should reflect the standards/curriculum being taught. Benchmark/interim tests that
are used as a predictor of future success are typically not aligned to the curriculum
currently being taught and interrupt classroom instruction rather then complement it.
Benchmark/interim tests should reflect the content being taught in the classroom and
should serve to supplement and provide another piece of information to teachers about
their instruction and where each student is in relation to the content they are learning.
6. Diagnostic assessments must cover a few concepts in depth. In order for an assessment
to provide educators with diagnostic information about a student it must include
enough questions about a topic and must include easy and difficult questions (called
“outliers”) to make a valid judgment. Most tests, including high-stakes tests,
benchmark/interim tests cover numerous topics which mean they can only have a few
questions per topic. In addition, these tests are deigned to eliminate “outliers” which
could skew the data. As a result, they should not be used to make diagnostic decisions.
7. Adaptive testing should be used to identify the appropriate level that students are
performing at for a particular subject or concept. Adaptive testing is done by computer
and asks students more difficult or less difficult questions based on their answers to
previous questions. Sometimes called “off-grade” testing, this approach allows teachers
to better focus instruction on each child’s strengths and weaknesses by helping to
identify the specific concept or process where their learning has broken down.
8. Value-added assessments should only be used to estimate student’s educational growth
overtime. Value-added can assist schools and classroom teachers in making data
informed decisions regarding the effectiveness of instructional strategies and programs
for individuals and groups of students. Value-added is an estimation tool and therefore,
should not be used to make high-stakes decisions about students, teachers, or other
school staff. Effective, value-added assessments must be of high quality and must be
closely aligned with classroom instruction. In addition, states or districts must create
data systems that include unique student identifiers to track individual students from
year to year. Finally, officials must be able to compare test results from year to year on a
Providing a common language around assessments is a small, albeit important, step in creating a valid and useful assessment system. States and districts must also ensure that: o The state standards are clear, specific, focused on specific content, and provided for
each grade at the K-8 level and for each course at the high school level
o The state tests are aligned to the standards and curriculum teachers are expected to
deliver in their classrooms. A 2006 study from AFT found that only 11 states
administered tests that reflected the content and skills required in their state standards.
This means most states have significant work to do to ensure that what teachers are
expected to teach in the classroom is aligned with what students are expected to
demonstrate mastery of on the state test
o Quality curriculum and classroom resources are provided for teachers to use to teach to
the standards in all subject areas. This will allow teachers to expose students to content
at is now being squeezed out in many districts while also ensuring that students are th
prepared for the state test in the subject areas that they already test
o Assessment literacy professional development opportunities are offered to all school
staff to help them understand the appropriate use of tests and test results. Topics that
need to be covered include: appropriate uses of testing, developing quality classroom
assessments, assessing special populations, incorporating formative assessment
techniques in instruction, analyzing student performance data
o High quality and sustained professional development be provided for teachers to assist
them in implementing a range of strategies to meet all students’ needs, especially
students with special needs and ensure their success on the assessments
If we want students to have a deeper understanding of important topics, then we need to ensure that they have opportunities in the classroom to delve deeper into various concepts and skills. This is not possible in the current testing environment that uses test results for purposes that they weren’t designed and requires teachers to spend endless hours on test preparation. Now more than ever the need for content-rich common standards, curriculum, and assessments that reflect these standards and curriculum has become essential.
Assessment Type Examples
Norm-Referenced Test Stanford 10
Iowa Test of Basic Skills (ITBS)
Criterion-Referenced Test Standards-based tests
state level tests
Advanced Placement exams
National Assessment for Education Progress
Formative Assessment one-on-one and group questioning
Summative Assessment State tests
Advanced Placement exams Benchmark / Interim Assessment District benchmark tests
Diagnostic Assessment DIBLES
DRA (Direct Reading Assessment). Adaptive Assessment Measures of Academic Progress (MAP)
Children’s Progress Value-Added Assessment TVAAS
OVERALL COMMENT FROM ONE PPC MEMBER:
o Need to include or maybe write the document to address: The Principles of
Assessments and how they help promote instruction
COMMENTS FROM NYSUT STAFFER, MARY ANN AWAD,
PER CHUCK SANTELLI’S REQUEST FOR HER FEEDBACK:
Current Landscape: I suggest that this section provide some concrete examples of polls, article and research that support their statements. Although the issues raised are good examples of the inappropriate uses of tests, perhaps it would include the negative impact testing is having on children, teachers and communities
Recommendations for a Fairer Testing Program for Our Students: The definitions of various types of tests and the appropriate or inappropriate use of a particular type of test are lumped together. Perhaps it would be clearer if there was a definition and a recommended use for that type of test. Maybe they could also use definitions that are found in the Standards for Education and Psychological Testing of the AERA or some other professional source. Since they provided a policy brief on interim assessments perhaps they could even use the definitions used in that document. I liked the definition in that brief on formative assessment. Number #8 page three Value-added…needs to be
rewritten. There are value-added models and assessments used in these models. They need to define the assessment and the model.
On page 4 the bullets are a good start for the what else is needed but it need to be expanded to include the appropriate assessment of all students and the need to know the clear purpose of each and every assessment; how do these assessments help improve student success?
UFT article is good and provides a beginning rational for formative assessments.
Margaret Heritage article is excellent. Her paragraph on the student involvement is unique and should be included in any discussion on formative assessment. The knowledge teachers need section is also excellent and provides good information on what professional development maybe needed. The skills teachers need section has a good paragraph on creating the conditions for good formative assessment. Both of these concepts are missing from the third paper The Role of Interim Assessments in a Comprehensive Assessment System. This paper is very well done and I agree with most of the points that are made. My major concern is on page 5, middle of page where it states “…but for evaluating the effectiveness of program, strategy, or teacher.”
It would be helpful if the AFT would send the materials in word documents so that I could put comments in where necessary. Please let me know if you have questions. Mary Ann.
Mary Ann Awad