An Analysis of Students‟ Mathematical
Thinking to Judge the Effectiveness of an
Assessment Analysis Procedure
Linda Dager Wilson
Richard S. Kitchen
University of New Mexico
Presented at the NCTM Research Presession Orlando, Florida
April 3, 2001
An Analysis of Students‟ Mathematical Thinking to Judge the Effectiveness
of an Assessment Analysis Procedure
Linda Dager Wilson, Washington DC
Richard Kitchen, University of New Mexico
In this era of educational reform and accountability, many states, districts and schools are attempting to develop new curricula and new assessment instruments to determine the extent to which students are meeting new academic standards. The issue of alignment is becoming central to these developments. Policy makers, educators, and journalists have called for better alignment between standards and assessments (Romberg, Wilson, Khaketla & Chavarria, 1992; Romberg & Wilson, 1995; National Council of Teachers of Mathematics, 1995; Quality Counts, 2001). Alignment in the context of assessment is usually taken to mean the degree to which a test (or test item) assesses the same learning goals as a given standard or set of standards (Wilson & Kenney, 2001). Because of the high-stakes nature of assessment as a tool for educational reform, issues of alignment have taken on increased immediacy. Calls for alignment have been addressed in varied ways (e.g., Webb, 1999; LaMarca, Redfield, Winter, Bailey & Despriet, 2000). Several projects have been initiated to evaluate the alignment of an entire test with a given set of standards. This study, in contrast, takes a more narrow and in-depth approach to the alignment of a single assessment item with one or more learning standards.
A procedure was developed by researchers at Project 2061 of the American Association for the Advancement of Science (AAAS) to analyze whether K-12
1mathematics and science assessment tasks address the specific content of a benchmark or
set of benchmarks (Kulm, 1999). The procedure was also designed to describe whether the tasks are likely to reveal something useful about students‟ understanding of the specific benchmark ideas. The authors of this study have attempted to use this “alignment procedure” to closely examine one specific assessment task in mathematics. Our goal was to determine whether the conclusions reached by the alignment procedure could be supported by analyzing student work for this task. We used the following research questions to guide our work:
1. How well does the task align with content standards for mathematics,
according to the Project 2061 procedure? Does student data support or
refute this conclusion?
2. How good are the technical qualities of the task? What can the student
data tell us about the technical qualities of the task, and does this support
or refute the conclusions based on the alignment procedure?
We hope that our analysis will provide one evaluation of the Project 2061 alignment procedure. Additionally, it may provide insight into alignment procedures in general and perhaps into the more general issue of what constitutes a quality assessment task in mathematics.
The Alignment Procedure
There are two major components to the Project 2061alignment procedure. First,
2the complete task is analyzed in depth, along with a set or sets of content standards, to
1 An assessment task is used here to mean a test item along with accompanying materials (see next footnote). 2 According to the procedure, a task is considered to be “complete” if it includes the following: a) a complete statement of the task; b) a statement of the intended satisfactory response; c) student responses
determine whether there are individual standards that might possibly align with the task. At this stage both the task and each potential standard are “clarified,” or examined carefully to determine the mathematical and cognitive content of the standard and what is actually required for a satisfactory response for the task. A subset of the standards, those that seem to have the most likely “fit,” is chosen for further analysis. The task and each potential standard are then analyzed for content, which includes multiple criteria on issues of substance, sophistication, the degree to which the task assesses content not in the standard, the setting of the task, and the cognitive demand of both the task and the standard.
The second major part of the alignment procedure examines how well the task is written and presented. This analysis of the technical qualities includes comprehension, engagement, clarity, commonly held ideas, (for example, how well the task anticipates common misconceptions) and alternative responses.
The task we chose to analyze is a fourth grade released item of the 1996 National Assessment of Educational Progress (NAEP). This task is an extended constructed response item (see Appendix A for the item and Appendix B for the scoring guide). Students are asked to examine two figures, a rectangle and a parallelogram of the same length and width, drawn on a grid, and asked to describe all the ways the figures are alike and all the ways they are different. This particular task was chosen primarily because of the breadth and depth of access to student performance data that was available. Not only
from the intended population or grade level; and d) a scoring guide, preferably accompanied by student
did we have access to the performance results at the national level for this task, but we also had a national random sample of 237 student papers. In addition, we conducted individual cognitive labs with 13 students from two geographically and demographically diverse areas. In the cognitive labs, students completed the task, were asked to verbally explain their solution, and then asked questions about how they knew when they were finished writing their solution (expectations) and whether they liked the task (engagement). We completed each cognitive lab by asking the students to describe the two figures on the phone to a friend who could not view the figures.
We first each applied the Project 2061 alignment procedure to the task individually, then combined our results to reach consensus. Both authors had participated in training sessions conducted by Project 2061 on the alignment procedure. For standards we chose the Benchmarks for Scientific Literacy (American Association for the Advancement of
Science, 1990) and Principles and Standards for School Mathematics (National Council
of Teachers of Mathematics, 2000). After choosing four potential standards (three from the Principles and Standards for School Mathematics and one from the Benchmarks for
Scientific Literacy) and using the alignment procedure on each one, we determined that one standard from the NCTM document was most closely aligned with the task. Although we report on the alignment analysis for all four standards, the second part of the study was based on the alignment with the NCTM standard that proved to be most closely aligned with the task.
Once the alignment procedure was accomplished, we turned to the student data to determine whether the claims made within the procedure for that standard were
responses that illustrate each score level.
supportable by the student data. Specifically, we examined the data from the written work and the cognitive labs to see if the task actually elicited student thinking about the mathematics in the chosen content standard. We then examined the technical qualities of the task to see if the student data supported or refuted the claims made by the procedure.
The first step in the alignment process is to select learning standards or benchmarks that appear to align most closely with the learning objectives of the task. The following four standards were selected because each appeared to contain elements of the learning goals inherent in the fourth-grade task:
1) Standard from Standards 2000 (PSSM 1): Geometry Standard for Grades
3-5 (page 164): Identify, compare, and analyze attributes of two- and three-
dimensional shapes and develop vocabulary to describe the attributes.
2) Standard from Standards 2000 (PSSM 2): Geometry Standard for Grades
3-5 (page 164): Classify two- and three-dimensional shapes according to their
properties and develop definitions of classes of shapes such as triangles and
3) Standard from Standards 2000 (PSSM Comm): Communication Standard
for Grades 3-5 (page 194): Use the language of mathematics to express
mathematical ideas precisely.
4) Benchmark from Benchmarks for Science Literacy (Benchmark): (page
223): Many objects can be described in terms of simple plane figures and solids.
Shapes can be compared in terms of concepts such as parallel and
perpendicular, congruence and similarity, and symmetry. Symmetry can be
found by reflection, turns, or slides.
We proceeded by clarifying these four standards, how each standard is similar and dissimilar from the other standards, and what learning standards are connected to each
in the strands of academic development extending above and below each. An important part in the initial stages of the alignment procedure is to clarify the intent of the task. In addition, what we know from the research literature about the task selected was documented.
The analysis clearly demonstrated that PSSM1 is the most closely aligned to the task. In our analysis of the substance of the task we determined that knowledge of PSSM 1 is necessary to accomplish the task because students must identify, analyze and compare attributes of the two shapes to determine likenesses and differences. With regards to the other three standards analyzed, we found that knowledge of PSSM 2 is not necessary, because students do not need to classify shapes according to their properties or develop definitions of classes of shapes to correctly respond to the task. Knowledge of PSSM Comm and the standard from the Benchmarks for Scientific Literacy (Benchmark) could
contribute to a correct response to the task, but is not necessary for a correct response.
Similar ratings were found across “content contributions,” “within the goals,” “relative sophistication,” and “relative misconceptions,” that knowledge of PSSM 1 is necessary to meet the criterion across these areas. While the other three standards identified, PSSM 2, PSSM Comm and Benchmark, met the criterion for “alignment” because knowledge of these learning goals could contribute to a successful solution to the task, these learning goals rated poorly on the criteria of “content contributions,” “within the goals,” “relative sophistication,” and “relative misconceptions.” Lastly, with regards to “depth of knowledge,” the task matches level #3: “The task is at the „applicability‟ level. (This level incorporates a range of both familiar and novel contexts. The novelty of a
context cannot be judged without looking at the curriculum that each student has been exposed to.)
When reviewing student papers and the 13 students‟ work in the cognitive labs, we
investigated PSSM1 in two parts: 1) whether students identified, compared, and analyzed the attributes of the shapes in the task; and 2) whether students used geometric vocabulary to describe the attributes of the shapes. The data from the student papers and the cognitive labs demonstrated that the majority of students showed evidence of thinking about attributes of the figures (64% of the student papers), but most (70% of the student papers) did not use geometric vocabulary correctly in their responses. We ascribe students‟ lack of use of geometric vocabulary to the task‟s failure to require the use of geometric vocabulary and to students‟ poor understanding of measurement concepts (particularly, perimeter and congruence).
When students did incorporate correct geometric vocabulary in their responses, it was not uncommon that the only geometric word used was “squares.” For example, in the cognitive labs, one student wrote that “they (the figures) are both five squares across and
three squares down,” another wrote “the two shapes both have 15 complete squars (sic),” and a third wrote “they both have 3 squars (sic) in each column.” Two students misunderstood the shapes to be squares. One of these students discussed how the shapes looked like squares and the other wrote that the “squares are the same but 1 is tilted a little.”
Rarely in the cognitive labs or in the student work did students identify the first figure as a rectangle and the second as a parallelogram. In addition, students infrequently made any references to the opposite sides in the two figures being parallel, nor did they
make many references to the adjacent sides in the rectangle being perpendicular while in the parallelogram adjacent sides are not perpendicular. Several of the teachers of the students who participated in the cognitive labs stated that their students had not yet studied geometry prior to the administration of the labs. Students‟ lack of preparation in geometry could have contributed to their incorrect and inadequate use of geometric vocabulary in their responses.
The alignment procedure showed us that the task seemed to have two areas of strength and two areas of weakness. Specifically, the scores were high on comprehensibility and engagement, but low on expectations and alternative responses. That is, we would expect that students would not have difficulty comprehending the task, or understanding what they are being asked to accomplish. We would also expect that students would find the task relatively interesting. The analysis showed, however, that there were potential difficulties with expectations. That is, it might not be clear to students what would constitute a satisfactory response to this task, or even how to know when they were finished with the task. In addition, the task did not allow for more than one way to respond.
To determine whether these technical qualities were borne out in the experiences of students, we examined the data from the cognitive labs. In our protocol for the labs, we looked for evidence of each of these four technical qualities. Our results showed that the conclusions reached from the alignment procedure were supported by the data from the cognitive labs.
Of the 13 students who participated in the cognitive labs, only one seemed to have some difficulty comprehending the task. This particular student had a difficult time perceiving that there were actually two figures on the grid. He seemed to focus his attention on the trapezoidal-shaped “space” between the rectangle and the parallelogram,
describing that shape as “the keel of a boat.” Although he was directed toward the two shapes, he was unsuccessful at attending to particular attributes, likenesses or differences, of those shapes. This student‟s difficulty was not necessarily with the task prompts, but with the visual presentation of the figures on the page. The other 12 students, however, had no difficulty understanding what the prompt meant, and all proceeded without any further assistance from the researcher. It is important to note that two of the participating students were English Language Learners. Neither of these two students had difficulties comprehending the task.
All of the students were enthusiastic about the task, and expressed this when asked. Most said that they thought other students would also find it interesting. When asked why they found it interesting, some students said it was because the task was “easy,” while
others said because it was “hard,” but that that made it fun.
In the cognitive labs we asked the students when they knew to stop working on the task. Nearly all of them responded that they stopped when they could no longer think of anything to write. One researcher observed that the students also seemed to be influenced by the amount of space provided on the page for the answer, and were satisfied with stopping when the space for each question was filled up. One boy seemed intent on writing the same number of likenesses as differences. Clearly, students were unsure about how much was expected of them to score high on the task.