excerpted from a chapter written by DAVID FOSTER, PENDRED NOYCE, AND SARA SPIEGEL; read the full chapter here.
How has assessment been used to inform instruction? A number of districts, challenged urban districts in particular, have responded to the need to boost student scores by increasing the frequency of benchmark assessments. Some districts developed assessments aligned with local curricula to help ensure that coverage and learning across schools. Other districts invested in technology-based programs that offer quarterly updates on student progress along a linear scale, based on easily scored (but often skills-oriented) computer multiple choice assessments. These programs, while they may reassure a school’s staff about student progress or alert them to trouble ahead, do little to inform teachers about how students are thinking, what they understand, where they are falling down, and how, specifically, teachers might change their own instructional practices to address students’ difficulties.
A Brief History of the Mathematics Assessment Collaborative
In 1996, the Noyce Foundation formed a partnership with the Santa Clara Valley Mathematics Project at San Jose State University to support local districts with mathematics professional development. The new partnership was dubbed the Silicon Valley Mathematics Initiative. Its early work focused on providing professional development, establishing content-focused coaching in schools, and collaboratively examining student work to inform teachers of pupils’ understandings.
At that time, the state of California was beginning a long and turbulent battle over the establishment of new state curriculum standards [Jacob and Akers 2000; 2001; Jackson 1997; Schoenfeld 2002; Wilson 2003]. Following the state board’s adoption of standards in mathematics, the governor pressed to establish a high-stakes accountability system. For the first time, California would require a test that produced an individual score for every student. Because developing a test to assess the state standards was expected to take several years, the state decided in the interim to administer an off-the-shelf, norm-referenced, multiple choice test — Harcourt’s Stanford Achievement Test, Ninth Edition (known as the SAT-9) — as the foundation for the California Standardized Testing and Reporting (STAR) program.
In the spring of 1998, students in grades 2 through 11 statewide took the STAR test for the first time. In an effort to provide a richer assessment measure for school districts, the Silicon Valley Mathematics Initiative formed the Mathematics Assessment Collaborative (MAC). Twenty-four school districts joined the collaborative, paying an annual membership fee.
Selecting An Assessment
MAC’s first task was to create a framework characterizing what was to be assessed. Keeping in mind William Schmidt’s repeated refrain that the U.S. curriculum is “a mile wide and an inch deep,” MAC decided to create a document that outlined a small number of core topics at each grade level. The goal was to choose topics that were worthy of teachers’ efforts, that were of sufficient scope to allow for deep student thinking, and that could be assessed on an exam that lasted just a single class period. Using as references, standards developed by the National Council of Teachers of Mathematics, by the state of California, and by the local districts, teacher representatives from MAC districts met in grade-level groups to choose five core ideas at each grade level.
Once the core ideas document was created, the next task was to develop a set of exams that would test students’ knowledge of these ideas. MAC contracted with the Mathematics Assessment Resource Service (MARS), creators of Balanced Assessment, to design the exams. Each grade-level exam is made up of five tasks. The tasks assess mathematical concepts and skills that involve the five core ideas taught at that grade. The exam also assesses the mathematical processes of problem solving, reasoning, and communication. The tasks require students to evaluate, optimize, design, plan, model, transform, generalize, justify, interpret, represent, estimate, and calculate their solutions.
The MARS exams are scored using a point-scoring rubric. Each task is assigned a point total that corresponds to the complexity of the task and the proportional amount of time that the average student would spend on the task in relation to the entire exam. The points allocated to the task are then allocated among its parts. Some points are assigned to how the students approach the problem, the majority to the core of the performance, and a few points to evidence that, beyond finding a correct solution, students demonstrate the ability to justify or generalize their solutions. In practice, this approach usually means that points are assigned to different sections of a multi-part question.
The combination of constructed-response tasks and weighted rubrics provides a detailed picture of student performance. Where the state’s norm-referenced, multiple-choice exam asks a student merely to select from answers provided, the MARS exam requires the student to initiate a problem-solving approach to each task. Students may use a variety of strategies to find solutions, and most of the prompts require students to explain their thinking or justify their findings.
This aspect of the assessment seems impossible to duplicate by an exam that is entirely multiple choice. Details of the administration of the exams also differ from the state’s approach, in that teachers are encouraged to provide sufficient time for students to complete the exam without rushing. In addition, students are allowed to select and use whatever tools they might need, such as rulers, protractors, calculators, link cubes, or compasses.
The Assessment in Practice
In the spring of 1999, MAC administered the exam for the first time in four grades — third, fifth, seventh, and in algebra courses — in 24 school districts. Currently the collaborative gives the exam in grades two through grade 8, followed by high school courses one and two. Districts administer the exam during March, and teachers receive the scored papers by the end of April, usually a couple of weeks prior to the state high-stakes exam.
Scoring the MARS exams is an important professional development experience for teachers. On a scoring day, the scoring trainers spend the first 90 minutes training and calibrating the scorers on one task and rubric each. After that initial training, the scorers begin their work on the student exams. After each problem is scored, the student paper is carried to the next room, where another task is scored. At the end of the day, teachers spend time reflecting on students’ successes and challenges and any implications for instruction. Scoring trainers check random papers and rescore them as needed. Finally, as a scoring audit, 5% of the student papers are randomly selected and rescored at San Jose´ State University. Reliability measures prove to be high: a final analysis across all grades shows that the mean difference between the original score and the audit score is 0.01 point.
Along with checking for reliability, the 5% sample is used to develop performance standards for overall score reporting. The collaborative has established four performance levels in mathematics: Level 1, minimal success; Level 2, below standards; Level 3, meeting standards; and Level 4, consistently meeting standards at a high level. A national committee of education experts, MARS staff members and MAC leaders conducts a process of setting standards by analyzing each task to determine the core of the mathematical performance it requires. The committee examines actual student papers to determine the degree to which students meet the mathematical expectations of the task, and it reviews the distribution of scores for each task and for the exam as a whole. Finally, the committee establishes a cut score for each performance level for each test. These performance levels are reported to the member districts, teachers, and students.
Once the papers are scored, they are returned to the schools, along with a copy of the master scoring sheets, for teachers to review and use as a guide for further instruction. Each school district creates a database with students’ scored results on the MARS exam, demographic information, and scores on the state-required exam. Using these, an independent data analysis company produces a set of reports that provide valuable information for professional development, district policy, and instruction.
Part of a WSVMI member district? You can access past years' MAC Assessments at svmimac.org.