Three Questions and Measures for Assessment

“Assessment” has been an important aspect of teaching and learning (or perhaps more accurately, it has been a buzzword garnering much attention) for most of my career in education. Advocates for many positions (political as much as pedagogical) argue the role of assessment in achieving their vision, thus “fixing the broken educational system” once and for all.

The reality, of course, is that assessment is a much more sophisticated and nuanced part of the educational experience than is allowed by these advocates. Clearly, educators must determine what has been learned by the student, and (for many reasons) that learning must be reduced to a number of proxies; each proxy designed to capture and reflect what the student has learned.

In many ways, the summaries we use to assess students’ learning are an attempt to reify what happens in schools. We reason, “my methods must work, because I observed these changes on these assessments.” Educators do not admit, however, that our instruments are weak (“aligning your assessments with your instruction” is worthwhile, but dubious), subject to misuse (students don’t bother reading questions, educators’ biases affect their assessments), and we can be quite unskilled at understanding results.

The problem of defining and implementing appropriate assessment in schools is becoming more challenging as well. When print dominated, educators could be relatively certain of the skills that students needed. I have some of my grandfather’s college textbooks next to mine. We both studied science, which had largely changed in the 49 years between our graduation dates, but we both learned by reading textbooks and taking notes in those books. Today, students carry laptops, digital textbooks, and are as likely to use video to study as they are to use textbooks. “Becoming educated” has been a more sophisticated endeavor for my children than is was for my grandfather and me. My experiences as someone who has succeeded in both of these worlds are interesting, but the topic of another post.

Largely because information (and other) technology is changing how individual humans understand, how we organize our institutions, and the norms society holds; educators cannot predict with the same certainty what students must learn and which proxies are appropriate for assessment purposes.  This is a problem that has occupied my professional attention in recent years, and thanks to continued efforts to collaboratively design a comprehensive assessment method, colleagues and I have a much clearer, complete, and simple system for answering essential assessment questions.

First, we conclude three questions are relevant to understanding what matters in students’ learning, and each has equal value:

  • Does the student have the habits of effective learners and workers?
  • Can the students produce polished solutions to sophisticated problems?
  • How does the student compare to others?

These questions are answered in different ways, and all three comprise a reasonable and complete system for assessing students’ learning.

three assessment tools

 

In course grades, we answer the question “Does the student have the habits of effective learners and workers?” Consider the typical classroom. Over the course of months, students participate in a variety of activities and complete a range of assignments and tasks. Teachers’ make professional judgements about the characteristics of the students the degree to which he or she has mastered the material and is prepared to learn. Just as we do not always expect a supervisor to follow an objective instrument when judging workers’ performance, we should not expect educators to being completely objective.

Of course, as subjectivity enters the grading process, educators will find it necessary to defend decisions, which will motivate them to more deeply articulate expectations, observe learning, and record that learning. All of these are benefits of including educators’ judgments in course grades.

A performance is an activity in which we answer, “Can the student produce polished solutions to sophisticate problems?” Performances are those projects and products that working professionals would recognize as a familiar outcome and professionals would be interested in the motivation of the performance, the nature of the work, and the quality of the performance. Questions regarding a performance are best directed to the student because it was selected, planned, and carried out by the student.

Teachers do have a role in setting to context of a performance, guiding decisions, and facilitating the student’s reflection in the activity; but through a performance, a student demonstrates the capacity to frame and solve complex problems and complete complex communication tasks. While “projects” that are included in course grades contribute to students’ ability to complete these assessments, performances are typically independently constructed and are outside of traditional curriculum boundaries.

Tests have been at the center of intense interest in educational policy for the 21st century. The political motivation for these test have been challenged and is beyond the focus of this post. For the purposes of this essay it is sufficient to recognize that large scale tests (think SAT’s, ACT’s, SBAC, PARC, AccuPlacer, and the like) can be used to determine how a particular student did in comparison to all of the others who took that test.

A few details are necessary to complete the picture of what these tests show. First, standardized tests were used almost exclusively for these purposes in the 20th century. This century, standards-based tests have become more common. A standardized test is a norm-referenced test, which means the scores are expected to follow a normal distribution (bell curve) and an individual’s score is understood in terms of that distribution for comparison. When taking a standards-based test, and individual’s score is compared to those that he or she is expected to answer if the standard has been met.

Regardless of the exact nature of the tests, those interested in assessment of learning must recognize that these tests are administered for the purpose of comparing. Also, these tests are of dubious reliability. One of the fundamental ideas of all data collection is that measurements have errors, so a single measure taken with one instrument administered once is really meaningless. While the test results of a large group of students may allow us to draw conclusions about the group as a whole, a single student’s score cannot be used to draw reasonable conclusions about that student.

If we consider assessment as a method whereby educators can understand their program as much as they can understand students’ learning, then we see the three questions and the three types of assessments forming a meaningful and informative assessment system.

Being Data-Driven is Nothing to Brag About

Being Data-Driven is Nothing to Brag About

(c) 2016 Dr. Gary L. Ackerman

“Data-driven” has been the mantra of educators for the last generation. This mantra captures the practice of using students’ performance on tests to make instructional decisions. This model can be criticized for several reasons including the dubious reliability and validity of tests, the lack of control over variables, and incomplete and inappropriate analysis. My purpose here, however, is to criticize the “what works” focus that accompanies “data-driven decisions.”

Ostensibly, educators adopt a data-driven stance to create a sense of objectivity; they can reason, “I am taking these actions because, ‘it works’ to improve achievement.” The problem with this approach is that identifying “what works” is a superficial endeavor and it can be used on only very limited circumstances.

While designing physical systems, engineers can apply “what works” methods to improve their systems. Engineers can conceive and plan, build and test, then deploy their systems. At any point in the process they can change definitions of what “it works” means or abandon the project if “it works, but is too expensive” (or if other insurmountable problems arise). Ascertaining “what works” in educational settings is a far less controlled situation. Those who have tried to use others’ lessons plans and found the results disappointing have first-hand experience with this effect.

Understanding “Data-Driven” As a Scientific Endeavor

Humans have created two activities that are data-driven. In basic science, we use data to organize and understand nature so that we can support theories that allow us to predict and explain observations. In applied science, we gather data to understand how well our systems function.

Data-driven approaches to refine systems to build “what works” is the approach used by technologists who work in applied science. Vannevar Bush, a science advisor in President Franklin Roosevelt during and after World War II, placed basic and applied research as opposite ends of a continuum. Basic science was undertaken to make discoveries about the world, and applied science was undertaken to use and control those discoveries to develop tools useful to humans.

If we place data-driven education along this continuum, it must be considered an applied science as it is undertaken to build systems to instruct children. As it is typically undertaken, there is little attempt to understand why or how “it works,” as answering this questions are in the domain in basic science.

Continuum of basic to applied science as proposed by Vannever Bush
Figure 1. Continuum of basic to applied science as proposed by Vannever Bush

This is a very dissatisfying situation for educators (both those who claim to be data-driven and those who make no such claim). Fortunately, we can reconcile that dissatisfaction by recognizing that the basic to applied science continuum does not accurately describe the landscape of education.

Use-Based Research

In 1997, Donald Stokes, a professor of public policy at Princeton University, suggested the understanding that basic researchers seek and the use that applied researchers seek are different dimensions of the same endeavor, so research is not either basic or applied. According to Stokes, the continuum of science should be replaced with the matrix shown in figure 2.

The matrix created by placing the question “Do the researchers seek to develop or refine systems?” along the x-axis and “Do the researchers seek to make new discoveries?” along the y-axis creates four categories into which one can place a science-like activity:

  • Pure research is Bush’s basic research, and it is undertaken to satisfy curiosity, so the researchers are not motivated to create useful systems.
  • Technology development is Bush’s applied research, and researchers seek to develop useful systems, but they do not seek to make generalizations beyond those needed to build their systems.
  • Purposeful hobbies are undertaken for entertainment, and hobbyists are not motivated to share their systems they use or to make discoveries.
  • Use-based research is the label applied to endeavors in which the researchers seek to both develop new systems and make discoveries about the work.
Figure 2. Stokes' matrix of data activities
Figure 2. Stokes’ matrix of data activities

Stokes used the term “Pasteur’s quadrant” to capture the nature of work in the use-based research quadrant. He reasoned Pasteur’s work in microbiology had multiple purposes. As he developed methods of preventing disease (these are the  technologies he developed); Pasteur also sought to discover how and why the technologies worked, thus he established important details of microbial life.

Replacing Data-Driven Decisions

Educators who choose to adopt a more sophisticated approach to using data to drive decisions can adopt use-based research. This will require they begin to approach  data, its collection, and analysis in a more sophisticated manner. These educators will be faced with more work, but it is more interesting and more efficacious than the data-driven methods I typically observe. Use-based research necessitates educators:

  • Begin data projects and analysis with a question. The question cannot be “Which instruction is better?” It must be focused and precise: “Did the students who experienced intervention x perform better on y test?” They must also recognize that these questions can only be answered with large cohorts of students and using statistical methods. Further, these answers (like all answers supported with data) cannot be known with certainty.
  • Seek a theory to explain the results they find in the answers to their questions. While the “data-driven educator” may be satisfied with knowing “what worked,” the educator which uses data as a use-based researcher will seek to elucidate reasons and mechanisms, a theory, for “what worked.” This will leave them better prepared to developed and refine interventions for other settings and cohorts of students. This theory will also allow them to predict other observations that will confirm their theory.
  • Based on their predictions, seek other evidence to support their theories. This evidence cannot be the same measurements. If, for example, we accept the dubious conclusion that SBAC (or PARC tests) measure college and career readiness, then we should be able to devise other measures of career and college readiness and the instruction that affects those tests scores should be observed in other ways as well.
  • Use-based research will also cause educators to become more critical of the measures they use (including those they are mandated to use) and to better understand the reality that we must be active consumers and evaluators of the data that is collected about our students and the methods used to analyze it.

Reference

Stokes, D. E. (1997). Pasteur’s quadrant: basic science and technological innovation. Washington, D.C: Brookings Institution Press.

Download this post as a PDF file:

Being Data Driven is Nothing to Brag About