Voyager Sopris Learning EDVIEW360 Blog Series
Recent Blog Posts

Context of Assessments

Updated on October 19, 2016
  • Assessment
  • General Education
  • Literacy
  • Struggling Readers

The Importance of Context when Interpreting Assessment Data, Part 1 of 2

Often, “context” is referred to in terms of reading texts or passages. Context is so important that we teach students how to use clues to understand new vocabulary words when reading. Context makes a difference when understanding ambiguous situations that might be easily misunderstood if you don’t understand what happened most recently in the passage or you don't have the culturally relevant information that helps us understand what we are reading.

Context is important in many situations, not just reading, and I am going to make the case for context being important when interpreting assessment data.

Recently, I was asked to interpret a set of scores for a seventh grade student. This student was enrolled in our LANGUAGE!® Live product for struggling readers and had completed all three benchmarks. We use three independent measures at each benchmark, the Progress Assessment of Reading (PAR), the Test of Silent Contextual Reading Fluency (TOSCRF)[1], and the Test of Written Spelling, 4th Edition (TWS-4)[2], to show progress or growth made by students across the academic year when enrolled and receiving instruction in our program. It is worth understanding these assessments to make sense of how they are useful, or in other words, context.

I could go through the technical information about these tests, but I am not sure that really gives the right context. So, let me try this a different way. I want to illustrate what makes these tests similar and different.

  • The PAR, created by MetaMetrics®, measures reading comprehension and requires students to do a lot of decoding of the texts to determine the right answer. This is a multiple-choice test, so the student picks one answer from four possibilities.
  • The TWS-4, published by PRO-ED, uses a dictated word format, like the one we experienced each Friday when we were in school. The administrator says a word, uses it in a sentence, says the word again, then the student types the word into the blank when using the online test. The TWS-4 measures encoding ability of students.
  • The TOSCRF, also published by PRO-ED, measures silent reading fluency and is a good measure of comprehension, as well. This test is pretty unique in its format. The student sees passages in upper case, without punctuation or spaces between the letters. The student has three minutes to put a line between as many words as possible, identifying the words that make up the passage. This test requires the student to recognize words, but also to understand the meaning of the passages and be able to read and understand the material at a fast enough pace to make silent reading practical and, hopefully, enjoyable. While not explicitly stated by the publisher, this test is also measuring encoding ability.

Common Measures of the Three Tests

All three tests start from a raw score. For the PAR, the raw score is the number of items correct out of 34. For the TWS-4, the raw score is the number of words spelled correctly before hitting a ceiling of five missed words in a row. For the TOSCRF, the raw score is the number of words correctly identified in passages that become increasingly difficult in three minutes. Looking at the raw scores is interesting when comparing a student’s performance across the year, but not very helpful when comparing across the three tests.

The raw scores on all three tests convert to standard scores. The PAR raw score converts to a Lexile. The Lexile scale, according to MetaMetrics, ranges from 200L to 1600L, although actual Lexile measures can range from below 0L to above 2000L. The TOSCRF and TWS-4 raw scores convert to a standard score distribution that has a mean (average) of 100 and a standard deviation of 15. Averages or means are easy for most people, but standard deviations are scary. A standard deviation is a way to talk about the variation in the data points in a group. A low standard deviation means the data points are close together and close to the average. A high standard deviation means the data points are spread out. The standard score distribution used by the TOSCRF and TWS-4 is pretty common, meaning there are a lot of tests that use it.

The tests are norm-referenced and this is where we get into the common measure between these three tests. To be norm-referenced tests means the results estimate where the student is positioned based on a predefined population. For the PAR, the population is based on students in the same grade level as the student taking the test. For the TOSCRF and TWS-4, the population is based on students in the same age range as the student taking the test. Usually, there is a norms table showing the standard score and its corresponding place within the predefined population, represented as a percentile rank. Our seventh grade student has a standard score of 88 on the TWS-4. Using a standard psychometric conversion table, an 88 converts to the 21st percentile rank, meaning 21 percent of the predefined population was below our student and 79 percent was above our student. We always want students to be moving toward the 50th percentile.

Note: The downside of percentile ranks is the distance between two points increases the further the points are from the mean. The distance between the 20th percentile and the 30th percentile is relatively small. The distance between the first and fifth percentile is probably three times the distance between the 20th and 30th percentile. That is because of the distribution of the group. There are fewer data points, students in our case, at the lower end of a normal curve than there are closer to the middle of the curve. For this reason, percentile ranks cannot be used to create averages. The standard scores, which are the same distance apart all the way across the distribution, should be used to determine an average, then that average standard score can be used to determine the percentile rank.

A common measure allows for comparison, but what does it mean? We’ll address this next week in Part 2 of this blog.

Language! Live students make gains, on average, of more than one to two years over the course of one school year. 

 "Everybody has increased..."


[1] Adapted from Test of Silent Contextual Reading Fluency, by D. Hammill, J. L. Wiederholt, and E. Allen, 2006, Austin, TX: PRO-ED. Copyright 2006 by PRO-ED. Adapted with permission. All rights reserved.

[2] Adapted from Test of Written SpellingFourth Edition, by S. Larsen, D. Hammill, and L. Moats, 1999, Austin, TX: PRO-ED. Copyright 1999 by PRO-ED. Adapted with permission. All rights reserved.


About the Author