Value Added Testing: Improving State Testing and Teacher Compensation in Wisconsin
By Mark C. Schug Ph.D. and M. Scott Niederjohn Ph.D.
This is the second of two reports on Wisconsin’s state testing program published by the Wisconsin Policy Research Institute. The first report—Mandated K-12 Testing in Wisconsin: A System in Need of Reform—detailed that Wisconsin’s current testing requirement has been a target of national criticism. That report recommended that Wisconsin’s testing regime be replaced or significantly modified.
This companion report—Value-Added Testing: Improving State Testing and Teacher Compensation in Wisconsin—describes what improvements should be made in student testing. New developments in testing have emerged and are now coming into widespread use across the nation. These new testing approaches not only could serve as a basis for changing state-required tests, but they could also pave the way to improvements in how Wisconsin’s teachers are compensated. These changes would have important implications for the teaching profession.
Historically, discussion of education finance has centered on inputs, primarily on the amount of spending on schools. By any measure, Wisconsin spending on education is substantial. Consider that: education spending consumes 38% of the state budget, spending per student averages $11,413, Wisconsin per student spending is 15th highest in the nation, and in the 10 years between 1997 and 2007 Wisconsin’s spending per student increased by 50%.
Yet focusing exclusively on spending is inadequate. Time and time again research has shown there to be little correlation between how much is spent and educational effectiveness. To demystify this debate there is a new approach to testing, one that more accurately measures not just performance but effectiveness. The new category of testing is called value-added testing. For the first time, value-added testing holds the promise of allowing schools, families and the public to objectively assess the effectiveness of educational spending and methods.
The most promising new approach is known by a collective term, value-added methodologies (VAM). VAM has rocked the world of educational testing because it introduces an important shift in what is measured. Assessment tests to date have typically focused on achievement—on knowledge levels attained by students as revealed in their annual tests.
Wisconsin’s current tests—known as the Wisconsin Knowledge and Concepts Examination–Criterion Referenced Tests (WKCE-CRT)—are an example of this approach. By contrast, testing conducted via VAM describes growth in students’ test scores over a school year. Thus, value-added testing reveals what a year’s worth of learning actually achieved, regardless of whether a child passes the annual test. Because value-added tests focus on gains rather than raw scores, each student’s performance is gauged against his or her own past performance.
Several states have embraced value-added testing and have launched efforts to apply the approach statewide. Wisconsin is not among them. States leading the way include Tennessee, Ohio, Arkansas, Delaware, Florida, Louisiana, and South Carolina.
Unfortunately, the Wisconsin Department of Public Instruction (DPI) has been slow to show leadership in the value-added area. Yet many local school districts are interested. A survey of local school districts revealed that most of the responding school districts supplement state testing with additional testing they fund themselves. This fact suggests that they find the state assessments to be inadequate. Many districts report using growth or value-added methodologies.
As educators have begun to reconsider their testing programs, many also have begun to reconsider teacher compensation. There is a renewed interest in performance pay for teachers due in part to widespread use of value-added testing. Data obtained from value-added testing show very clearly that individual teachers can have a substantial effect on student achievement.
The effects revealed by value-added testing could be used as one outcome measure in performance-based compensation programs, nullifying the old argument that there is no reliable way to distinguish good teachers from poor teachers.
Nationwide, several serious efforts are now under way to dismantle the outdated salary schedules unions have traditionally negotiated with school districts and to replace them, or augment them, with incentive-based pay programs. The Center for Performance Incentives reports that programs are under way in 22 states; examples include Colorado (Denver, specifically), Florida, Texas, and Minnesota.
Now is the time for Wisconsin to reform its state testing program by moving quickly toward the use of value-added assessment. In order to accomplish this, we make the following recommendations:
•The state Legislature should act now to abolish statutory provisions that disallow the use of results from state testing in teacher evaluation.
•The state Legislature should create incentives to encourage school districts to develop alternatives to outdated compensation programs based on salary schedules.
•The WKCE-CR testing regime should be replaced or significantly modified. Testing students in the fall of the year makes it impossible to use test results in a timely manner for improving curriculum and instruction. The state should move toward a testing program with computer-based scoring so that results could be obtained and used promptly. One possibility would be to adopt part of the value-added testing program developed by the Northwest Evaluation Association, which many Wisconsin school districts have already adopted on their own.
•The DPI should publicly embrace value-added testing. The DPI should provide technical support to school districts that wish to move toward growth-oriented, value-added testing.
•Wisconsin should continue to develop a statewide student database to allow for comprehensive, in-depth VAM analysis. This database should link every student with his or her teacher in every grade and subject.
It appears that state testing is here to stay. In recent years, state testing has taken center stage in reforming public schools. In the early 1990s, the Wisconsin Legislature required the Department of Public Instruction (DPI) to implement the Wisconsin Knowledge and Concept Examinations (WKCE). Districts were required to implement the WKCE in the 1993-1994 school year. State testing expanded significantly in early 2000 due largely to the passage of the federal No Child Left Behind (NCLB) Act of 2001. Among the many changes that were eventually required were that all students be tested in reading and math each year in grades three to eight, beginning in 2005-06, with science assessments once each in elementary, middle, and high school grades beginning in 2007-08. In 2006-2007, Wisconsin spent over $10 million for the DPI’s assessment programs.
The plans of the Obama administration remain vague when it comes to how NCLB might be modified in the reauthorization process delayed until sometime in 2010. Nonetheless, it seems clear that state testing will remain an important tool in efforts to make schools more accountable. Arne Duncan, the new secretary of Education, has supported high-stakes testing for students and schools in Chicago. Furthermore, there is evidence that President Obama understands the problems that many school districts face—especially urban schools—with regard to the current testing procedures approved by NCLB. In one talk on the campaign trail, he implied that value-added approaches are a key innovation.1
Value-added methodologies reorient testing programs fundamentally in order to provide the sort of data about achievement that schools need to improve classroom instruction. Value-added testing describes individual students’ growth in achievement in the course of a school year. It tells, for each student, how far he or she has progressed over the school year. Results from value-added testing show that students differ markedly in growth over a given school year. These differences can be correlated with the work of the teachers in question. Some teachers are shown to produce significant gains in their students’ achievement; others are shown to be less effective. And the differences cannot be explained away easily by reference to the students’ socioeconomic status, since each student’s growth is assessed by reference to his or her own starting point. Thus value-added testing underscores the importance of classroom instruction and the instructional skill of individual teachers.
Value-added methodologies provide (in versions such as the one developed by the Northwest Evaluation Association) teachers and school administrators with remarkable new tools to assist teachers in improving the achievement of their students. With computer-assisted scoring of value-added tests, schools no longer have to wait months to obtain detailed reports on student performance, and they no longer have to guess where students are weak and where they are strong. As we will demonstrate, many school districts in Wisconsin are embarking on this path despite the lack of leadership from the DPI.
Value-added testing is also a key component in new pay-for-performance compensation programs for teachers. Early research suggests that these programs—for example, in the form of bonus payments awarded for student-achievement gains—enhance student achievement.2 At present, however, teacher pay in Wisconsin is marked by widespread use of the old-fashioned salary schedule, with its exclusive reliance on years of experience and degrees held. Almost nowhere in Wisconsin have we seen a movement to add teacher performance criteria into the mix. However, new pay-for-performance programs are taking hold elsewhere across the nation. Our survey reveals that Wisconsin school districts are actually in a good position to begin to experiment with these new compensation systems.
Later in this report we discuss the results of a survey we conducted to learn what sorts of testing Wisconsin’s school districts are doing in addition to their use of the WKCE-CR tests. Our data show that many Wisconsin school districts—apparently recognizing the inadequacies of the WKCE-CR system—have chosen to use additional tests and other testing methodologies. In many cases, Wisconsin school districts are turning to value-added or growth testing.
Value-added methodologies (VAM) represent a profound shift in testing—away from the testing history documented in previous sections of this report. Student performance on assessment tests can be measured in at least two ways. Achievement tests describe knowledge levels attained by students in their end-of-year tests; Wisconsin’s WKCE-CR testing system is an example. Value-added or growth tests, by contrast, describe the progress in test scores that individual students make over a school year. One advantage in using value-added assessment is that it eliminates the vexing problem caused by the strong correlation between socioeconomic status and achievement.3 Because value-added testing focuses on gains rather than raw scores, each student’s performance is based on a comparison with his or her past performance.
Value-added assessment is a method of analyzing test results based upon growth in standardized test scores over multiple points in time. Based on a review of a student’s test score gains from previous grades, researchers can predict the amount of growth those students are likely to make in a given year. This likely or predicted growth is based entirely on the student’s prior academic achievement. The actual growth can then be compared to this benchmark to determine whether students performed above, below, or at their individual baseline. Using value-added testing, it is possible to look back over several years to measure the long-term impact that a particular district, school, curriculum, or teacher had on any given student’s achievement.
Background and History
The pioneer of value-added testing is William L. Sanders, a senior research fellow with the University of North Carolina system and, for more than 34 years, a professor and director of the University of Tennessee’s Value-Added Research and Assessment Center. Professor Sanders developed the Tennessee Value-Added Assessment System (TVAAS) as a method of measuring the effectiveness of school systems, schools, and teachers. He developed the technique while working in the field of agricultural genetics at the University of Tennessee.
In the 1980s, when then Gov. Lamar Alexander was in search of a measure to evaluate schools and teachers and hold them accountable for student learning under a new education financing scheme, Sanders wrote a proposal for the governor and was given access to all test data for a county in Tennessee. In 1992, when the Tennessee Supreme Court demanded a more equitable funding system for the state’s schools, a new interest in accountability surfaced, and Sanders’ work became increasingly influential. Tennessee’s legislature passed the Tennessee Educational Improvement Act, and out of this education reform package the Tennessee Value-Added Assessment System was born. This system, still in effect, provides the following:
Data to the public on the performance of districts and schools, and data for appropriate administrators on the performance of teachers;
Information to teachers, parents and the public on how schools are doing in helping each child make academic gains each year;
Information to school administrators to help identify weaknesses in even the strongest schools.
Value-added assessment systems require significant databases. Students must be tested in each grade, every year, in each subject. Since 1992, a database developed by Sanders has been used to track each Tennessee student from 2nd through 12th grade. This database includes more than 10 million records with tests in every subject and links to every teacher. The state has linked rewards, aid, and sanctions to its school rating system.
Other versions of value-added testing have been developed since Sanders began his work in the 1980s. The differences often originate in an effort to resolve various statistical issues that arise. Some models, in contrast to Sanders’ model, include student socioeconomic characteristics as control variables; this is true, for example, of the model developed at the Wisconsin Center for Educational Research (WCER). In general, the various VAM models can be classified in four categories.4 These include the Fixed-Effects Models (FEM), where school effects are taken to be fixed rather than random. An extension of this model is called the Simple Fixed-Effects Model (SFEM). There is also the Layered Mixed-Effect Model (LMEM) used by Sanders and the Hierarchical Linear Models (HLM), which assume that school effects are random.
Since the inception of these models, many other states and districts have followed in Tennessee’s footsteps, including Ohio and Pennsylvania.
Value-added models have been heavily researched. Studies have investigated the advantages and challenges of the methodology, the potential for improving instruction, and implications for teacher compensation. Much research, not surprisingly, has focused on the TVAAS, evaluating its methodology and utilizing data from the system to investigate various education questions.
In a series of research reports, Ted Hershberg of the University of Pennsylvania discusses the powerful implications that value-added assessments represent for school districts. He discusses the significance of VAM in making adjustments to curriculum, pedagogy, and professional development.5 Another study explains the benefits of VAM as a diagnostic tool school districts can use to focus their instruction where it can make the most impact.6
Sanders and his colleagues have utilized the massive database assembled in Tennessee to study a number of topics related to value-added assessment, including socioeconomic variables and teacher effects on student growth. These studies consistently yield two major findings: Student socioeconomic variables are poor predictors of student success, and teachers are the most important determinant for student academic growth. In a summary of the VAM research, Sanders and Horn7 state that “Differences in teacher effectiveness [are] the dominant factor affecting student academic gain. The importance of . . . certain classroom contextual variables (e.g., class size, classroom heterogeneity) appears to be rather minor….” And again, citing a 1997 study:8 “the two most important factors impacting student gain are the differences in classroom teacher effectiveness and the prior achievement level of the students. The teacher effect is highly significant in every analysis and has a larger effect size that any other factor in twenty of the thirty analyses.” Investigators for a major research report by the RAND Corporation9 have reached similar conclusions.
Others have examined the TVAAS program for accuracy, reliability, and validity. David Harville, a statistician at Iowa State University, evaluated the system and concluded that it is a sound approach.10 Another independent analysis of the system found the system valid, while also offering some recommendations.11 An evaluation by Walter Stroup, of the University of Nebraska, concluded that the “statistical model being used is reasonable, and the data are consistent with the assumptions of the model.”12
These favorable results do not mean that no problems arise in the use of VAM. One common problem has to do with randomization. VAM models are based upon the idea that students are randomly assigned to their classes and that teachers are randomly assigned to teach these classes. To the extent that this is not true, results from value-added testing can be misleading. The treatment of missing data, physical school conditions, district policies, and resources—all matters outside of a teacher’s control—also constitute potential limitations to the uses of VAM. A 2005 report by the Educational Testing Service (ETS) summarizes these limitations while also arguing that VAM models show great promise and should continue to be pursued.13
VAM and NCLB
Value-added analysis differs from the Adequate Yearly Progress (AYP) analysis called for by the No Child Left Behind (NCLB) Act. AYP requires that schools measure the performance of students at several grade levels, as well as the performance of several sub-groups of students (race, gender, disability and so forth), and then determine the proportion of students meeting a fixed standard. A fundamental problem with this approach is that some students will enter a grade with higher levels of achievement than others. Those students will obviously find it easier to meet the proficiency standards set by AYP. And students who are performing well below grade level might face great difficulty in meeting the standards set by AYP even if they do a great job of improving their performance over the school year. AYP focuses on achievement while excluding the analysis of growth.
In many cases, and for many schools, this will not cause any problems. For example, for schools with high proficiency (achievement) and high rates of growth, there isn’t a problem; AYP will properly label these schools as succeeding. It is similar for those schools with low achievement and low rates of growth; AYP will properly sort these schools into the failing category. The rub comes in, however, for a school that may show low achievement levels but high rates of growth—or, conversely, high achievement levels and low growth. The former schools may face sanctions under the AYP model even though the teachers are doing exactly what the law should be encouraging—improving the achievement of students who are far behind to begin with and thus will not reach a given proficiency level despite making a year of impressive gains. Further, AYP does nothing to sanction schools that start out with high achievement but do nothing to further their students’ growth.
Value-added analysis addresses this problem by comparing individual students to themselves—by reference to their achievement levels early and late in a school year. Such a methodology would recognize schools that are changing the trajectory of their students by converting static achievement scores to dynamic growth scores. In November of 2005, then Secretary of Education Margaret Spellings announced a new pilot program14 that will allow selected states to use growth models to determine whether their schools and districts are meeting NCLB performance targets. Nine states are currently participating in this pilot program.15
Value-Added Models Elsewhere
Several states have embraced value-added testing and launched statewide efforts to apply the approach. Wisconsin is not among them. The National Council on Teacher Quality, in its 2007 State Teacher Policy Handbook, identifies six areas in which states are evaluated. One of these areas is Teacher Evaluation and Compensation. Goal B in the relevant section states the following: “The state should install strong value-added instruments to add to schools’ knowledge of teacher effectiveness.” Wisconsin is ranked with 22 other states in a group that “meet a small part of the goal.” Tennessee is ranked first as a best-practice example, while Ohio, Arkansas, Delaware, Florida, Louisiana, and South Carolina also rank high. The report does make note of work going on in the Milwaukee Public Schools, where, since 2000, the Wisconsin Center for Education Research (WCER) has been working with the district to use value-added testing to evaluate schools. The MPS value-added model calculates the average growth of the same groups of students in a school, controlling for variables such as prior academic achievement, ethnicity, gender, mobility, and economic status. Scores from reading, language arts, and mathematics WKCE/Terra Nova tests are analyzed. The report recommends that Wisconsin develop the capacity to link teacher and student records to enable the development of a value-added analysis in the state’s school districts. WCER is currently developing a web-based system that would enable school districts to pilot and implement value-added systems.
Value-added models have been developed around the country; many are in effect in other states. In addition to Tennessee, where value-added methodologies have been mandated for use in public schools since 2002, value-added testing is required for all school districts in Ohio and Pennsylvania, and for several hundred school districts in 21 states.16
Ohio is a nearby state whose experience with value-added models is worth noting. In 2003, its legislature passed legislation to add a value-added measure to the state’s accountability system. Beginning in August 2008, value-added testing was fully integrated into Ohio’s educational accountability system. State officials selected the value-added model developed by Sanders and his colleagues. Essentially, Ohio, while allowing school districts to meet AYP objectives in the traditional manner, now has a value-added alternative. The advantage of this approach is that it allows Ohio to identify schools and districts where performance is of greatest concern. These are schools and districts in which proficiency rates are low and in which students are failing to make gains sufficient to achieve proficiency during their enrollment.
All Ohio school districts have now received value-added reports for 4th- and 8th-grade reading and mathematics. The first report showed that achievement growth in 45 percent of its school districts exceeded expectations; achievement growth in 23 percent of the districts met expectations; and achievement growth in 32 percent of the districts failed to meet expectations. Data like these enable Ohio schools to improve their understanding of what districts, schools, and individual students are achieving, and to understand how achievement is changing over time.
Other states have followed similar approaches. Minnesota has added a value-added component to its assessment system. In 2004 the Minnesota legislature enacted a statute titled “Student Academic Achievement and Progress.” It requires that the state’s assessment of individual students’ progress must be, to the extent that annual tests are administered, based on indicators of achievement growth referenced to an individual student’s prior achievement. The Minneapolis Public Schools, as a result, have developed an elaborate system to measure the performance of schools and students in which performance level and growth indicators are used. The growth indicators report achievement information on students who were continually enrolled in the Minneapolis Public Schools across specified periods of time, usually across two testing periods. With value-added data, individual schools are compared with the district average to determine which schools are doing better.
Testing Beyond the WKCE-CR in Wisconsin
In order to determine what Wisconsin school districts are currently doing in the area of student achievement testing, we conducted a survey in the fall of 2007. With help from the Wisconsin Association of School Boards17 (WASB), we sent a survey on student achievement and testing to all 426 of Wisconsin’s school districts. The survey was addressed to the school district administrator and sent on WASB letterhead. We received completed surveys, via fax, from 272 school districts, or about 64 percent of the total Wisconsin school district population. The survey queried school districts on the types of testing they conduct, the grade levels in which various tests are conducted, the subjects tested, and how data obtained from the testing is used.
The results from this survey provide several insights into the status of student achievement testing in Wisconsin’s K-12 schools. Perhaps the most important finding is that most Wisconsin school districts consider the WKCE testing to be inadequate, so much so that it must be supplemented with additional testing. Just over 68 percent, or 186 school districts, reported conducting periodic student achievement testing beyond the required WKCE tests.
These 186 school districts were asked which tests they are currently using. While there was substantial disparity in the number of tests reported, some trends could be identified. About half (49.2 percent ) of the districts doing testing beyond the WKCE use the Northwest Education Association (NWEA) Measure of Academic Progress (MAP) assessment system.18Another 14.5 percent of districts testing beyond WKCE report using district-developed assessments. Another 33.5 percent of the districts report using other tests including Standardized Testing and Reporting (STAR) and Terra Nova, among others.
We asked the school districts that reported conducting testing beyond the WKCE to identify the grade levels and subjects for which they are testing. Responses are summarized in Tables 1 and 2. A total of 168 school districts provided information about the grade levels they test, while 90 school districts provided responses to the question about subjects tested. The most common grade levels tested are 4th (85.1 percent of districts) and 5th (85.7 percent of districts). Testing is also common in 2nd, 3rd, 6th and 7th grades. In terms of subjects tested, reading is by far the most common. Eighty-one of the districts, or 90 percent, reported testing beyond WKCE in reading. The next most common subject area for assessment beyond the WKCE is math. Seventy districts, or 77.8 percent of the total, reported testing in math. Other areas tested include English/language (37.8 percent), science (17.8 percent), social studies (5.4 percent), and other (23.3 percent).
Grade Levels Tested Beyond WKCE
Total responses = 168
Subject Areas Tested Beyond WKCE
Total responses = 90
We next asked about the use of assessment results for value-added analysis. Of the 186 school districts that reported testing beyond the WKCE, about 85 percent, or 158 districts, state that they use these tests to measure student achievement using growth or value-added testing methodologies.
We then asked these 158 districts what they judge to be the most important purpose(s) of analyzing student test data using growth or value-added analysis. Responses are summarized in Table 3. The most popular application of value-added methodology is in measuring student performance in particular subjects (93.8 percent). Large percentages of school districts report using value-added methodology to measure grade-level student performance (87.6 percent) and school wide student performance (80.7 percent). Fewer school districts use value-added methodology to measure school wide teacher performance (13.7 percent), grade-level teacher performance (16.8 percent), and teacher performance in particular subject areas (14.9 percent). Last, very few Wisconsin school districts (2.5 percent) report using value-added methodology to measure the performance of individual teachers with an eye toward using the results as one basis for teacher compensation.
Here is a summary of the results of the school district survey on student achievement testing:
• 186 out of the 272 school districts that responded to our survey (68.4 percent) report conducting testing beyond the WKCE.
• Of these 186 school districts, about half (49.2 percent) report using the Northwest Education Association (NWEA) Measure of Academic Progress (MAP) assessment system. Another 14.5 percent report using district-developed assessments, while 33.5 percent report using another testing system.
•The most common grade levels tested are 4th (85.1 percent of districts) and 5th (85.7 percent of districts). Testing is also common in 2nd, 3rd, 6th and 7th grades. 81 of the districts, or 90 percent, report testing beyond WKCE in reading; 70 districts, or 77.8 percent of the total, report testing in math. Other areas tested include English/language (37.8 percent), science (17.8 percent), social studies (5.4 percent and other (23.3 percent).
•Of the 186 school districts that reported testing beyond the WKCE, about 85 percent, or 158 districts, state that their school district uses these test results to measure student achievement using growth or value-added methodologies.
•The most popular application of value-added methodology is in measuring student performance in particular subjects (93.8 percent). Many school districts report using value-added methodology to measure grade-level student performance (87.6 percent) and schoolwide student performance (80.7 percent).
•Fewer school districts use value-added methodology to measure schoolwide teacher performance (13.7 percent), grade-level teacher performance (16.8 percent), and teacher performance in particular subject areas (14.9 percent).
•Very few Wisconsin school districts (2.5%) report using value-added methodology to measure the performance of individual teachers with an eye toward using the results as one basis for teacher compensation.
In summary, it seems clear that the majority of Wisconsin school districts find the WKCE-CR testing to be inadequate—so much so that it must be supplemented with additional testing. Many districts report using growth or value-added methodologies. How does the use of value-added methodologies in Wisconsin compare to other states? Which Wisconsin school districts are leaders in the use of this new technique, and what do they do with the data? The answers to these questions are both interesting and revealing in terms of the future of state testing in Wisconsin.
Uses of Value-Added Testing
To measure schoolwide student performance.
To measure schoolwide teacher performance.
To measure grade-level student performance.
To measure grade-level teacher performance.
To measure student performance in particular subjects such as
elementary reading, algebra, or history.
To measure teacher performance in particular subjects such as
elementary reading, algebra, or history.
To measure the performance of individual teachers with an eye
toward using the results as one basis for teacher compensation.
Total Responses = 158
Implications for the Future
The Department of Public Instruction—while not providing overt leadership in this area—has begun to move the state in the direction of more value-added testing methodologies. The DPI has started the process of developing a statewide database for student test data, it has worked with outside consultants to evaluate various value-added growth models and has formed a technical advisory committee. While these activities—primarily conducted behind the scenes—are a useful first step toward a more useful testing regime, more leadership is required. As evidence that Wisconsin’s school districts are looking for leadership in this area, several school districts in Wisconsin—as indicated by our survey—have been experimenting with growth-oriented, value-added models on their own.
In this respect, the Milwaukee Public Schools, as mentioned earlier, is almost certainly the leader in the state. Madison is another Wisconsin district that has made a commitment to value-added approaches.
Let’s examine the situation in Milwaukee. The stated goal of its testing programs is to measure the value-added performance of Milwaukee’s schools, teachers, programs, and policies, in an effort to increase accountability for performance and data-driven improvement. The MPS has worked for several years with the Wisconsin Center for Education Research at the University of Wisconsin-Madison to develop its highly sophisticated models of value-added analysis.
The MPS value-added model depends on the use of WKCE-CR data. It provides a great deal of information about student performance. It permits grade-by-grade comparisons over time. It has been used in several research projects. It allows the district to identify high-growth schools as well as low-growth schools. The MPS administration has used value-added models to assist it in making decisions regarding the school-closing process, planning involving the District in Need of Improvement status of the MPS, and decisions about differentiated management of schools.
It is worth noting here that a number of states are experimenting with pay-for-performance compensation programs for teachers and schools. Florida, Minnesota, and Texas are among the leaders. Wisconsin is nowhere to be found. Why? Wisconsin state law prohibits using the results of state testing to evaluate teachers’ performance, to discipline teachers, or as a reason for nonrenewal of contracts.19 State law places a significant roadblock in the way of using data from WCKE-CR tests to reward teachers who consistently produce achievement gains among their students.
Our school district survey indicated that the most widely used assessment system in Wisconsin outside of the WKCE is the one developed by the Northwest Evaluation Association (NWEA). The NWEA is a national nonprofit organization that provides assessments, professional training, and consulting services to improve teaching and learning. The NWEA works with more than 3,400 school districts around the nation. Its website lists 191 Wisconsin school districts as partners. The vast majority are public schools, although some charter schools and private schools also are listed.
The most widely used NWEA test is the Measures of Academic Progress (MAP). This is a computer-based test—that is, students can take the test online. It is aligned with state standards. MAP tests measure academic growth over time, independent of grade level or age. NWEA calls the MAP test “adaptive.” This means that the difficulty of a test is adjusted to the student’s performance, so each student sees different test questions. The difficulty of each question is based on how well the student has answered the questions up to that point. As the student answers correctly, the questions become more difficult. If the student answers incorrectly, the questions become easier. This makes the test particularly useful in that it is appropriate for students at many different academic levels.
Districts have the option of testing their students with MAP up to four times a year. Students typically take tests at the beginning and end of the school year. Some districts may also choose to test students in winter and summer. The MAP assessments are used to produce percentile scores, achievement scores, and growth scores. The NWEA uses a metric it calls a RIT (Rasch Unit) scale to measure a student’s progress. The RIT scale is an equal-interval scale that is used to chart a student’s academic growth from year to year.
While WKCE-CR results take months to come back to school districts, MAP test results are available to the school within 24 hours. The fact that this information is available immediately has important implications for curriculum and instruction. In fact, it represents a serious change in how teachers and administrators can monitor instruction. The data produced by MAP are extremely rich. Teachers and administrators can learn about changes in performance over time reported at the district, school, classroom, and individual student level. Teachers trained to use these data can learn with high precision how their individual students are performing. They can also identify specific steps that should be taken in order to accomplish projected growth targets. This capacity gives teachers an immediate game plan for how to improve achievement. This has implications for all schools, but especially for districts that are failing to produce achievement gains.
Performance-Based Pay and Value-Added Testing
During the past several decades, policymakers have grown increasingly interested in innovative compensation plans, including performance-based pay for K-12 educators. Much of this interest is driven by the widespread application of value-added testing models. Could the widespread use of value-added testing in Wisconsin set the stage for breaking away from the traditional salary schedule?
School districts have for many years clung to the old-fashioned salary schedule. Kershaw & McKean (1962) explain that the salary schedule emerged in the 1920s, long before the domination of teacher unions in negotiations of teacher pay. The authors maintain that salary-schedule systems were established largely in response to teacher complaints regarding the difficulty of individual teacher negotiations with school boards and due to a widespread sense that pay rates set through individual negotiations were arbitrary.
The salary schedule, however, could hardly be more arbitrary and unjust. It sets one salary for a generic teacher—as if all teachers had the same marketable skills and therefore the same opportunity costs. The salary specified for a teacher by the salary schedule is insensitive to actual conditions in the labor markets in which school districts compete. Among the many problems is that salary schedules tend to reward mediocre (or worse) teaching while failing to reward teaching that produces achievement gains. The salary schedule may provide an incentive to drive talented teachers away from the classroom. Hoxby and Leigh20found evidence that high-ability women were “pushed” out of the classroom due to pay compression—a result of the traditional forms of teacher compensation. Finally, the salary schedule contributes to shortages of mathematics and science teachers and contributes to the surpluses of early childhood and social studies teachers because, in effect, it sets wages too low for teachers with highly specialized skills (mathematics teachers, for example) while simultaneously setting wages too high for other teachers (early childhood teachers, for example).
Economists and others believe that incentives are fundamental to understanding human behavior. They believe that incentives matter a lot. The role of incentives in public education, however, has been a matter of longstanding debate. Do different forms of compensation matter? The landmark Coleman Report21 implied that school governance matters such as incentives and teacher effort may have little effect on academic achievement. The Coleman Report concluded that students’ family backgrounds trumped all others variables in terms of school outcomes. One inference was that poverty and ethnicity exerted a powerful influence on academic achievement. Family background and neighborhood environment mattered more than school governance or teacher effort.
The Coleman Report touched off a debate that has continued for many years. Educational researchers and economists have often disagreed on the role of incentives in education. Studies of teacher incentives appear to fall into two groups. The first group includes advocates for alternative forms of teacher compensation to reward performance. This group offers little empirical support for its claims22. The second group seeks to verify or refute Eric Hanishek’s controversial finding that “money doesn’t matter” in evaluating the impact of teacher compensation on student performance.23
Today, there is renewed interest in incentive-based compensation programs. Value-added testing has contributed to the rise in interest. First, value-added studies have shown that individual teachers can have a substantial effect on student achievement. What teachers do in the classroom does, indeed, make a difference. Second, adding the results of value-added measures into the mix of factors involved in teacher compensation programs significantly diminishes the criticism that in education there is no way to reliably distinguish good teachers (those who produce achievement gains) from poor teachers (those who do not). Value-added testing programs, especially ones yielding data collected over several years, provide important data that can link the efforts of individual teachers to the achievement gains of their students. While we think schools are long overdue to implement pay-for-performance plans for teachers, the arguments today to maintain the old-fashioned salary schedule are truly outmoded.
Several efforts across the nation are under way to dismantle the salary schedule and replace or augment it with pay-for-performance compensation programs. While the exact structure of these new models for teacher compensation varies, a common pattern emerges among several. Here is what often happens. Districts choosing to implement pay-for-performance systems usually leave the traditional salary schedule in place. The salaries determined by this schedule might be regarded as base pay. Then a variable component is introduced. This often takes the form of a bonus payment awarded on the basis of students’ achievement gains. Compensation programs of this sort are not inexpensive. Minnesota’s Q-Comp program costs $86 million. Wisconsin spent about $10 million on the WKCE-CRT program. The costs are high for these new efforts, in part, because the bonus payments come as add-ons to salary-schedule salaries, which treat all teachers as if they faced the same labor market conditions.
Research at the University of Arkansas’ Department of Education Reform provides some evidence about the effect of bonus programs on student achievement.24A bonus program in the Little Rock public schools linked teachers’ merit pay to student test scores. In this program, a four percent improvement in achievement scores earned teachers a $100 bonus payment per student, with the payment rising to $400 if the student gained 15 percent. The investigators found that providing teachers with bonuses based on test-score improvements significantly increased student math proficiency in comparison to the performance of students attending similar schools that were not participating in the bonus program.
Podgursky and Springer provide an excellent summary of other recent pay-for-performance initiatives. We draw from their report below.25
Denver: Professional Compensation System for Teachers (Procomp). It offers bonus payments to teachers based various criteria, including completion of advanced coursework, skills demonstration, market incentives, and a professional evaluation. Importantly, student growth is part of the compensation plan. The growth factor can result in significant bonuses for teachers.
Florida: Merit Award Program. In this program for teachers and administrators, up to 40 percent of the funds may be awarded to teachers based on principals’ assessments. However, at least 60 percent of the awards must be based on student performance.
Texas has launched several pay-for-performance initiatives. The Governor’s Educator Excellence Grant, for example, focuses on teachers of economically disadvantaged students. Seventy-five percent of the funding must be paid to full-time teachers based on a variety of measures of student performance.
Minnesota Q-Comp: Schools receive funds to reward teachers for excellence in student achievement. Districts receive $260 per student to implement the program. Forty-one school districts are currently participating.
Beginning in 2006, the U.S. Department of Education developed a program called Teacher Incentive Fund. This program provided $196 million to fund 34 initiatives around the nation. The goals of the program are to develop incentive-based teacher compensation programs, to see if such programs improve student achievement, and to attract and retain better teachers.
Podgursky and Springer26 have done an extensive review of current research on performance-based compensation plans. They reached the following conclusion:
The evaluation literature on performance-related compensation schemes in education is very diverse in terms of incentive design, population, type of incentive (group versus individual), strength of study design, and duration of the incentive program. While the literature is not sufficiently robust to prescribe how systems should be designed—for example, optimal size of bonuses, mix of individual versus group incentives—it is sufficiently positive to suggest that further experiments and pilot programs by districts and states are very much in order.
Four Milwaukee Compensation Case Studies
Today, pay-for-performance compensation programs in Wisconsin are as rare as hen’s teeth. Four such programs, however, stand out as exceptions—two were initiated at private voucher schools; the other two were started at inner-city charter schools authorized by the University of Wisconsin-Milwaukee. All are having success in the toughest areas of the city.
Case 1: St.AnthonySchool of Milwaukee. The pay-for-performance program developed in the Little Rock, Arkansas schools was the model for initiating this program at St. Anthony School of Milwaukee. It may be the best example of pay-for-performance in Wisconsin.
St. Anthony’s is located on Milwaukee’s south side and has many Latino students and families involved. It has the largest enrollment of the schools in Milwaukee’s School Choice program. St. Anthony School has a strong curriculum based on three reform models—Direct Instruction, Core Knowledge, and Renaissance Learning.
The pay-for-performance program is called the Achievement Reward Program (ARP). It was launched two years ago with the approval of the vast majority of its teachers. Here is how it works: Additional financial rewards are provided to teachers and staff for improved student achievement. The program involves two sorts of bonuses. First, administrators, teachers of special subjects (like art and music) and staff receive a bonus for overall school achievement gains. Second, teachers of math and reading—which includes nearly all regular classroom teachers—may earn an individual bonus based on the individual gain scores of their students. Bonus payments for reading and math teachers are linked to growth in Normal Curve Equivalent (NCE) points27 for each student.
There is anecdotal evidence of positive changes in the school culture stemming from the implementation of the ARP. The ARP appears to help motivate teachers to improve the performance of their students. Teachers seem to make better day-to-day decisions to make sure that students receive the maximum amount of instruction possible. For example, care is taken to make sure that school field trips are planned around times when reading and math are scheduled. Instructional time is treated as a valuable resource not to be squandered.
The ARP may also help to retain effective teachers by giving them a financial reward for exceptional work. It also creates an incentive for effective teachers to apply for teaching positions at St. Anthony’s. And, by rewarding merit, it helps to create a more positive work environment. It is this sort of change that can build the respect and professionalism that is so often denied Wisconsin teachers working under the arbitrary restrictions of the old-fashioned salary schedule.
Case 2: MilwaukeeCollegePreparatory School. Another excellent example of a pay-for-performance plan is the one used at Milwaukee College Preparatory School, an inner-city charter school on Milwaukee’s north side. Milwaukee College Prep is authorized by the Office of Charter Schools of the University of Wisconsin-Milwaukee. The pay-for-performance plan is viewed as a cornerstone of the school’s strategy to attract and retain good teachers. Teacher compensation is competitive with the area’s largest employer, the Milwaukee Public Schools. The school also offers a competitive benefit package. Milwaukee College Prep teachers, however, may earn an additional 10% of their salary through its merit pay plan.
Here is how the program works. Teachers are provided bonus payments based on six criteria, including the following:
Meeting schoolwide goals for student academic performance.
Exceeding student growth targets based on student performance on the NWEA tests.
Recruiting a new teacher to work at the school.
Participating in enrichment activities such as running an after-school club.
Maintaining good attendance.
Achieving milestones such as remaining with the school for a period of time.
These six criteria are not equally weighted. Teachers whose students achieve high academic gains get much higher rewards. About half of the merit system is based on student performance.
The school administration views recruiting and retaining high-quality and dedicated teachers as a fundamental problem in the environment of urban education, especially in high-performing, high-poverty schools that rely on a dedicated, passionate and talented teacher corps. The administration acknowledges that merit pay is only one factor in retaining good teachers. The school also strives to help teachers see how they are making a difference in the lives of children. It involves teachers in school decisions and offers relatively small class sizes and a strong school culture that allow teachers to spend a majority of their class time on academic tasks.
Case 3: Messmer Catholic Schools. Messmer Catholic Schools is an independent Catholic school system in Milwaukee’s central city. Approximately 95 percent of the Messmer students are African American or Hispanic, and most come from families with incomes near the national poverty level. Messmer initiated an incentive pay system during the 1999-2000 school year. The system was modeled after a bonus program at Johnsonville Foods Company (Sheboygan, Wis.), providing incentives for high-performing employees.
Messmer teachers can earn up to $3,000 in additional compensation (i.e., over and above base pay) on an annual basis. Incentive payments are based on a teacher’s performance. Performance is rated by reference to customer satisfaction (student and parent surveys); attainment of educator goals (professional development plans); student achievement (growth in test scores and grade improvements); professional accomplishments; service to school community (tutoring, committee assignments, and extracurricular activities); and administrative appraisal.
The Messmer incentive program provides teachers with the understanding that high performance and effectiveness in the classroom and school is valued and rewarded. Further, it places an emphasis on instituting best practices that have positive impacts on student learning and development. Teachers are responsible for demonstrating the achievement of students through the implementation of these best practices. Further, the incentive system causes Messmer teachers to view the parents and students as customers deserving a high-quality educational experience.
Case 4: The Business and EconomicsAcademy of Milwaukee (BEAM). BEAM is a charter school authorized by the University of Wisconsin-Milwaukee Office of Charter Schools. It has established a pay-for-performance compensation program, currently in its second year of operation. BEAM is the only K-8 school in Milwaukee that specializes in teaching its students and parents how to become financially successful. It features an innovative business and economics curriculum, including a volunteer business and economics curriculum consultant, and after-school activities such as the Millionaire’s Club. While BEAM has a unique mission and organization, in some ways it is a typical inner-city school. The vast majority of its students live in the Metcalf Park area of Milwaukee’s north side. Ninety-five percent of them are African American and come from single-parent households.
In the 2007-2008 school year, BEAM initiated a pay-for-performance program to fully implement its business and economics curriculum at all grades. Among other obligations, each teacher is responsible for teaching a distinct set of business and economics lessons during the school year. All students are pre- and post-tested. The average schoolwide pre-test score for 2007 was 10.85, with a standard deviation of 4.98. The average schoolwide post-test score was 17.34, with a standard deviation of 8.837. These results show that students realized a mean gain score of 6.484—a statistically significant gain. Other analyses showed significant gains in raw scores at all grade levels. Gains were most impressive in grades 2 and 5. The data were also analyzed by individual teacher to determine how bonus payments should be allocated. All but two teachers produced statistically significant gains and received bonus payments according to the amount of the gain, with the top teacher receiving $1,000 plus a dinner for two.
The BEAM performance-based pay plan delivered tangible results in terms of student achievement. BEAM’s teachers also seemed to welcome it. They eagerly participated in an economics professional development program and quickly signed up for business and economics pre-testing of their students as soon as it became available in the 2008-2009 school year.
New developments in testing methodology have shifted the focus in student assessment from testing for achievement to testing for achievement growth. These developments point toward new directions for Wisconsin to consider in its state assessment program and, eventually, in compensation programs for Wisconsin teachers.
The new methodologies create the possibility for value-added testing—that is, testing to determine the extent to which individual students make achievement gains in the course of a school year. Focusing on achievement gains is important because it assesses students against their own starting points, not against the achievement levels of a norm group. Also, the focus on gains underscores the importance of the instruction provided by individual teachers. Gain scores can highlight the effective (or ineffective) instruction some teachers provide, regardless of contextual circumstances surrounding their students. Grasping the significance of these new prospects, several states have embraced value-added testing and launched statewide efforts to apply the approach. Wisconsin is not among them.
This is not to say that testing programs in Wisconsin are stagnant. Our 2007 survey of Wisconsin school districts revealed that most districts now supplement state testing with additional testing they pay for themselves. This suggests that they, too, find the state assessments to be inadequate.
In the additional testing they carry out, many districts use some variant of value-added testing. The most widely used alternative is the system offered by the Northwest Evaluation Association. Over 190 schools in Wisconsin use some part of this program.
This surge of interest in value-added testing sets the stage for experiments with new compensation models in the state.
Today, there is a renewed interest in performance pay for teachers, fueled in part by developments in value-added testing.
Data obtained through value-added testing can be analyzed to determine the effects individual teachers have on student achievement.
These effects can serve as measures of performance in performance-based compensation programs. To the extent that value-added testing does provide valid and reliable measures of performance, the argument for traditional salary schedules is nullified. Mindful of this, educators and legislators have launched several initiatives nationwide to dismantle the traditional salary schedule and replace it or augment it with incentive-based pay structures. Experiments are now under way in Denver, Florida, Texas, and Minnesota. Moreover, the U.S. Department of Education has developed a program called the Teacher Incentive Fund. Its goals are to develop incentive-based teacher compensation programs, to learn whether such programs bring about improvements in student achievement, and to attract and retain better teachers. Programs are under way in 22 states.
As indicated in our survey, many Wisconsin school districts have moved to embrace new trends in testing. Nonetheless, the Wisconsin Department of Public Instruction has been slow in providing leadership in this regard. While the DPI has taken some positive initial steps toward a statewide value-added assessment system—including the development of a statewide data system, working with outside consultants to consider growth-oriented models and forming a technical advisory committee—Wisconsin lags behind many states in the implementation of new value-added testing methodologies. Wisconsin is far behind in experiments regarding pay-for-performance for teachers.
We think that now is the time for Wisconsin to reform its state testing program by moving quickly toward the use of value-added assessment. In order to accomplish this, we make the following recommendations:
The WKCE-CR testing regime should be replaced or significantly modified. Testing students in the fall of the year makes it impossible to use test results in a timely manner for improving curriculum and instruction. The state should move toward a testing program with computer-based scoring so that results could be obtained and used promptly. One possibility would be to adopt part of the value-added testing program developed by the Northwest Evaluation Association; many Wisconsin school districts have already moved in this direction on their own.
The DPI should publicly embrace value-added testing. While the DPI has begun to study and evaluate various value-added models behind the scenes, little leadership to the school districts has been provided. To that end, the DPI should provide technical support to school districts that wish to move toward growth-oriented, value-added testing.
Wisconsin should continue to develop a statewide student database to allow for comprehensive, in-depth VAM analysis. This database should link every student with his or her teacher in every grade and subject.
The state Legislature should act now to abolish statutory provisions that disallow the use of results from state testing in teacher evaluation. At a time when many districts have begun to use testing programs that go beyond the WKCE-CR, it makes little sense to prohibit them from taking into account the information they obtain from these programs in their evaluations of teachers’ effectiveness.
The state Legislature should create incentives to encourage school districts to develop alternatives to outdated compensation programs based on salary schedules. These programs should be replaced or augmented by performance-based-pay programs, with growth in student achievement serving as the most important criterion for assessing teachers’ performance.
1 YouTube - Obama on NCLB, Part 1 www.youtube.com/watch?v=NpRtHDYoD-o
2 Winters, Marcus A., Ritter, Gary, W. Barnett, Joshua H. and Green, Jay P. (2007) An Evaluation of Teacher Performance Pay in Arkansas. www.uark.edu/ua/der/Research/performance_pay_ar.html
3 For example, SAT scores and family income are highly related.
4 The technical details that make these models different are explained in Hibpshman, T.L. (2004). A Review of Value Added Models. Frankfort: Kentucky Education Professional Standards Board.
5 Hershberg, Ted, Simon, Virginia Adams, and Lea-Kruger, Barbara. (2004) “Measuring What Matters,” The American School Board Journal. and Hershberg Ted, “Value-added Assessment and Systemic Reform: A Response to America’s Human Capital Development Challenge,” Aspen Institute’s Congressional Institute The Challenge of Education Reform: Standards, Accountability, Resources and Policy. Feb. 2005.
6 Hershberg, Ted. “Value-Added Assessment: Powerful Diagnostics to Improve Instruction and Promote Student Achievement,” AASA Conference Proceedings. 2005.
7 Sanders, William L., and Horn, Sandra, P. Research Findings from the Tennessee Value-Added Assessment System (TVAAS) Database: Implications for Educational Evaluation and Research. Journal of Personnel Evaluation in Education 12:3, 247-256, 1998.
8 Wright, S.P., S.P. Horn, and W.L.Sanders. Teacher and Classroom Context Effects on Student Achievement: Implications for Teacher Evaluation. (1997) Journal of Personnel Evaluation in Education 11:1 57-67.
9 McCaffrey, Daniel, Lockwood, J.R., Koretz, Daniel, and Hamilton, Laura. “Evaluating Value-Added Models for Teacher Accountability.” An Education Report by the Rand Corporation (prepared for the Carnegie Corporation), 2004.
10 Harville, David A. “A Review of the Tennessee Value-Added Assessment System (TVAAS)” Iowa State University. June 6, 1995.
11 Bock, R. Darrel, Wolfe, Richard, and Fisher, Thomas. “Review and Analysis of the Tennessee Value Added Assessment System.” (part one) Tennessee: Office of Education Accountability (Nashville, TN), 1996. and Fisher, Thomas H. “A Review and Analysis of the Tennessee Value-Added Assessment System.” (part two) Tallahassee, FL: Florida Department of Education: 1996.
12 Stroup, Walter. “Assessment of the Statistical Methodology Used in the Tennessee Value-Added Assessment System.” Knoxville, TN: Tennessee Value-Added Research and Assessment Center: 1995.
13 Braun, Henry I. “Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models,” Educational Testing Service - Policy Information Center. September 2005.
14 For more information please see: http://www.cgp.upenn.edu/pdf/NCLB%20Growth%20models.pdf
15 As of June 2008, Tennessee, Ohio, North Carolina, Alaska, Arkansas, Arizona, Delaware, Florida, Iowa have been approved by the Department of Education to participate in this pilot program.
16 State-by-state use of value added methodologies are summarized at: http://www.cgp.upenn.edu/ope_nation.html
17 For more information about the WASB please see: http://www.wasb.org/cms/
18 More information about the NWEA assessment system can be obtained from: http://www.nwea.org/system.asp
19 Wisconsin Statute 118.30 (2) 4 (c) states “The results of examinations administered under this section to pupils enrolled in public schools, including charter schools, may not be used to evaluate teacher performance, to discharge, suspend or formally discipline a teacher or as the reason for the nonrenewal of a teacher’s contract.”
20 Hoxby, C. M., & Leigh, A. (2004). “Pulled Away or Pushed Out? Explaining the Decline of Teacher Aptitude in the United States.” American Economic Review, 93(2), 236–240.
21 Coleman, J.S., Campbell, E.Q. Hobson, C.J., McPartland, J., Mood, A.M. Weinfield, F.D., & York, R.L. (1966). Equality of Educational Opportunity. Washington D.C.: U.S. Printing Office.
22 Odden, K. & Kelly, C. (2002). Paying Teachers for What They Know and Do: New and Smarter Compensation Strategies to Improve Schools. Thousand Oakes, CA: Corwin Press, Inc.
23 Hanushek, E. Kain, J.F. & Rivkin, S.G. (1999). Do Higher Salaries Buy Better Teachers? (Paper presented at the Annual Meeting of the American Economic Association.) New York, NY (ERIC Documentation Reproduction Service No ED437710).
24 Winters, Marcus A., Ritter, Gary, W. Barnett, Joshua H. and Green, Jay P. (2007) An Evaluation of Teacher Performance Pay in Arkansas. www.uark.edu/ua/der/Research/performance_pay_ar.html
25 Podgursky, Michael, and Springer, Matthew G. Teacher Performance Pay: Working Paper 2006-01. National Center on Performance Incentives, November 2006.
26 Podgursky, Michael, and Springer, Matthew G. Teacher Performance Pay: Working Paper 2006-01. National Center on Performance Incentives, November 2006.
27 NCE scores are similar to percentile ranks; however, they are based on an equal-interval scale. This means that the difference between any two successive scores on the NCE scale has the same meaning throughout the scale. NCEs range from 1 to 99. They are useful for making meaningful comparisons between different achievement tests and for statistical computations, such as determining an average score for a group of students.