Linda Diamond on Assessment: Are We Testing Wisely?

By Linda Diamond | Categories: Assessments, Science of Reading, Reading Intervention and Acceleration (Grades K–5), Reading Intervention and Acceleration (Grades 6+), Dyslexia and Other Reading Difficulties

Are we testing wisely? As teachers of reading, we always want our instruction to be informed by data. Assessment is a key ingredient in delivering the efficient instruction that will help students develop into proficient readers.

But even the best-intentioned testing can sometimes lead us astray—especially when extensive, ongoing assessments replace instruction or when we adopt a one-size-fits-all approach to assessment. Making data-driven decisions shouldn’t mean being driven mad by data!

In this blog, let’s take a closer look at assessments and explore the following questions:

When should we assess?
How much should we assess?
Whom should we assess?
What types of assessments are necessary?

Clarifying Types and Purposes of Assessments

Much confusion exists about the different purposes of various assessments. For example, teachers are often frustrated and confused when their internal program assessment results do not match the results they obtain when students take an external measure such as a Curriculum-Based Measurement.

Let’s start by unpacking some of the assessments schools may use.

What Is a Curriculum-Embedded Assessment?

A curriculum-embedded assessment or a program assessment is typically provided by the publishers of a curriculum. It is aligned to the sequence of instruction in that curriculum. A curriculum-embedded assessment is designed to answer the following question: Are my students learning what the program is teaching?

This type of publisher-created assessment usually examines student progress and mastery of concepts specifically sequenced and taught in a curriculum. Most strong curricula build in their own such assessments.

Mastery measures, which also are often built into a curriculum, typically test students on specific skills to determine if they have achieved mastery of those skills. The SIPPS program mastery tests are good examples of mastery measures.

What is a Curriculum-Based Measurement or Universal Screener?

In contrast, Curriculum-Based Measurements (CBMs), often called universal screeners, are standardized measures that are quite independent of a specific published curriculum.

These assessments answer a different question: Are the students on a positive trajectory to meeting grade-level skill expectations or are they at risk?

A few examples of CBMs include:

DIBELS 8
AIMSweb
Fastbridge
Star Reading
i-Ready

And assessments such as:

MAP (NWEA)
ROAR (Stanford University)
Early Bird (Gaab Lab).

Many CBMs serve as universal screeners.

An elementary-aged child is seated at a desk surrounded by stacks of books. He holds a pen in one hand while the other hand is holding down the pages of a notebook. His eyes appear concentrated on the page in front of him.

Understanding the Intersection between Curriculum-Embedded Assessments and Curriculum-Based Measurements

Together, the two types of assessments—those embedded within a published curriculum and the CBMs that are independent of a curriculum—will provide information not only about student progress or risk but about the quality of a curriculum used and the implementation and overall effectiveness of instruction in a school or district.

If a student does well on a publisher’s curriculum-embedded assessment but does not perform well on a CBM or screener, we must check to see if:

The specific content measured on the CBM is taught in the curriculum at all, or if
The specific content is taught later in the curriculum than when the CBM was administered.

When the second scenario occurs, it is important NOT to change the instructional sequence of a strong, evidence-aligned curricula. Instead, we should expect that the students will perform well on the screener later.

A Real-World Example

A good example of this second scenario involved the Open Court Reading program during the era of No Child Left Behind/Reading First.

In kindergarten, children receiving Open Court instruction did poorly on the phoneme segmentation tasks when tested on a CBM in winter. However, by springtime, these students performed well because those segmentation skills were taught in Open Court after the winter CBM testing window. Because the teachers knew this and understood the curriculum’s sequence, they did not veer off the Open Court instructional sequence, and their confidence paid off.

Rarely do we see cases where students do poorly on a publisher’s curriculum-embedded assessment but perform well on a CBM. However, we do see situations where students perform well on their program measures but do poorly on a CBM because of timing.

Rarely do we see cases where students do poorly on a publisher’s curriculum-embedded assessment but perform well on a CBM. However, we do see situations where students perform well on their program measures but do poorly on a CBM because of timing.

Unfortunately, sometimes students perform well on a curriculum-embedded assessment but never do well on a CBM or screener because the program is simply not teaching the important skills that are necessary to become a reader.

Armed with an understanding of the distinct purposes of the different types of assessments, teachers can make informed decisions about students and about curriculum and implementation issues.

Digging Deeper: Understanding Specific Types of Assessments

Let’s look at specific types of assessments more closely in order to understand what questions they answer, their purposes, and when and to whom they should be administered. The types to consider are:

screening
formative
diagnostic
placement tests
progress monitoring
summative or outcome

Each one serves a unique purpose. A curated menu at the Evidence Advocacy Center describes the purpose and use of many of these assessment types.

Screening Measures

General screening measures are usually administered three times a year, at least in the primary grades. They are designed to quickly and universally assess the current status of all individuals in a population.

Screeners can answer this question: Who is at-risk in a broad area of performance and thus a candidate for further assessment and additional services and supports?

Many states have adopted universal screeners for K–2 for early identification of students potentially at risk for reading difficulties.

A list of various reliable screeners for literacy, math, and even behavior can be found on the Tools Charts at the National Center on Intensive Intervention (NCII). Such screeners enable educators to identify those individuals who may be determined, after further assessment, to not be meeting current expectations and are candidates for supplemental intervention.

Considerations When Interpreting Screening Results

If large numbers of students within a grade, class, or school have poor results on a screener, the problem may lie with overall instruction in Tier 1.

However, if such a school has a strong, evidence-aligned published curriculum—and the issue is not simply the misalignment of the curriculum’s skill sequence with the timing of the assessment as discussed in the previous section of this blog—the issue may be implementation, and teachers may require professional learning and coaching support to effectively implement the program.

That said, when large numbers of students are identified as at-risk, the problem may lie with non-evidence-aligned practices and non-evidence-aligned materials. In this case, a school may need to re-evaluate their instructional approaches,. They may need to select evidence-aligned instructional materials, and better prepare and support teachers.

A common misconception is that a screener is the way to provide a dyslexia diagnosis.

Screening measures can include general outcome measures most often from CBM or standardized achievement tests. A common misconception is that a screener is the way to provide a dyslexia diagnosis. Dyslexia is one possible cause when a screener shows that a student is at risk, but another possible cause is ineffective or limited instruction. Further assessment must occur to clarify the student’s actual learning needs.

As Dr. Michelle Hosp, a noted assessment expert said, “Universal screeners can reveal there’s a problem but don’t pinpoint a solution” (Hosp, 2023).

Formative Measures

All those activities and tasks that provide feedback to teachers about student performance and thus enable decisions about instructional modifications are formative assessments. These measures answer this question: How should I change my teaching or behavior plan to improve student outcomes?

Given that formative assessments can include subskill mastery measures, informal assessments and observations, they are usually ongoing. A good resource to understand formative measures is from NWEA.

Diagnostic Assessments

Diagnostic assessments measure specific content. They can determine the extent to which knowledge, skills, or strategies in a particular domain have been mastered. These measures can identify specific strengths and weaknesses students may have in a particular skill area. They can thus provide guidance to the teacher about what support and instruction students may need.

Diagnostic tests provide an answer to a couple of questions.

What are my student’s specific strengths, prior knowledge, and areas of needed skill development not yet mastered that will help inform my instruction or intervention?
What are the root causes of my student’s challenges?

Two sources for further information can be found at the IRIS Center and at National Center on Intensive Intervention (NCII):

IRIS Diagnostic Assessment
NCII Diagnostic Data

Placement Tests

Occasionally, some programs have built-in placement tests. Placement tests tell a teacher where to place students within the specific program as a starting point.

In a sense, such placement tests can be diagnostic. They identify the skills already mastered and those yet to be mastered. However, more often built-in curriculum program placement tests identify the set of skills within a program’s sequence to identify the programmatic starting point. For example, the SIPPS foundational skills program has built-in placement tests to determine where to start a student in the SIPPS sequence.

Progress Monitoring Measures

A good definition of progress monitoring can be found from the Center on Multi-Tiered System of Supports.

It is the ongoing, frequent collection and use of formal data in order to:

Assess students’ performance,
Quantify a student’s rate of improvement or responsiveness to instruction or intervention,
Evaluate the effectiveness of instruction and intervention using valid and reliable measures.

Educators use measures that are appropriate for the student’s grade and/or skill level. Progress monitoring measures, often within CBMs, answer these questions: Is this student improving and at an acceptable rate? and Is the intervention sufficiently effective for this student?

Again, the built-in program mastery assessments in programs such as SIPPS also serve as progress monitoring measures within the program.

Summative or Outcome Measures

Summative assessments evaluate what students have learned at the end of an instructional time period. Examples include the end of a unit, a semester, a course, or the end of a school year. Given by most states for third grade to twelfth grade, summative assessments answer the questions: Did the students achieve the expected outcome, and did the instructional program actually work?

Summative assessments are useful for schools to evaluate the overall effectiveness of their instructional program. They can also identify students who did and did not meet the goals. Such assessments are often used for grading students, as well as to “grade” a school. As a result, summative assessments are considered high stakes tests.

Minimizing Over-Testing: A Gating Approach

A gating approach in assessment reduces the number of students who may need to take more specific assessments. Gating approaches use a series of sequential measures to narrow down which students may need more detailed, in-depth assessments. Such an approach can efficiently filter students to determine those who may need more specialized intervention.

Gating approaches use a series of sequential measures to narrow down which students may need more detailed, in-depth assessments.

For example, if a second grader performs at or above grade level on an oral reading fluency measure and is doing well on grade level materials, the need for a phonics measure may be unnecessary.

However, if the same second grade student does not meet a threshold on the timed oral reading fluency measure, then a next step may be a phonics assessment.

A Two-Step Gating Process

Compton et al. described a two-step gated process that eliminated false positives and saved time for first grade screening (Compton et al., 2010).

Using phonemic decoding in the first step significantly reduced the number of first graders requiring a larger battery of measures. If students in upper elementary grades and middle and high school are performing well on a standardized achievement test, assessing them further is probably unnecessary.

Even younger students who can read a passage with good accuracy, rate, and prosody may not need a test of phonemic awareness. Once students can decode most words, phonemic awareness measures may be redundant. This is because decoding well indicates successful application of the alphabetic principle—that print maps to speech.

More research into the use of gating approaches will clarify the most efficient assessment procedures needed for most students.

Since individual assessments may address only one purpose, a school will need to build a comprehensive assessment system with measures available for different purposes. Students who fall below expectations on a screener may then need to be assessed using a diagnostic or placement measure to determine their precise instructional need and where to begin instruction.

Conclusion: Let’s Not Assess for Assessment’s Sake

The right type of data from the appropriate assessments for the right purposes will enable educators to better differentiate instruction early, support students who need more intensive instruction, and accelerate students who can move more quickly.

Let’s not assess for assessment’s sake. Instead, let’s understand how to use the right tests for the right purposes and spend more time teaching than testing.

Using data to inform teaching is vital, but only if we know what the data is telling us.

References

Compton, D. L., Fuchs, D., Fuchs, L. S., Bouton, B., Gilbert, J. K., Barquero, L. A., Cho, E., & Crouch, R. C. (2010). Selecting at-risk first-grade readers for early intervention: Eliminating false positives and exploring the promise of a two-stage gated screening process. Journal of Educational Psychology, 102(2), 327–340. https://doi.org/10.1037/a0018448

Hosp, M., Education Week, “The Critical Role of Phonics Assessment in the Science of Reading,” 2023

Why Sufficient, Deliberate Practice Is a Critical Element of Literacy Learning and Retention

Can a Curriculum Support the Science of Reading and Science of Learning?

Linda Diamond on Assessment: Are We Testing Wisely?

Clarifying Types and Purposes of Assessments

What Is a Curriculum-Embedded Assessment?

What is a Curriculum-Based Measurement or Universal Screener?

Understanding the Intersection between Curriculum-Embedded Assessments and Curriculum-Based Measurements

A Real-World Example

Digging Deeper: Understanding Specific Types of Assessments

Screening Measures

Considerations When Interpreting Screening Results

Formative Measures

Diagnostic Assessments

Placement Tests

Progress Monitoring Measures

Summative or Outcome Measures

Minimizing Over-Testing: A Gating Approach

A Two-Step Gating Process

Conclusion: Let’s Not Assess for Assessment’s Sake

References

Related Reading