Ohio’s Third Grade Reading Guarantee law is based off of a law in Florida that has been in place for many years. In Florida, a student must reach a certain level on their state test, the FCAT 2.0, in order to avoid being retained in third grade. As in Ohio, there are a variety of exceptions for English Language Learners and special education students. Florida has supposedly had a great deal of success with their law and individuals from the Florida Department of Education even testified in Ohio when the legislation was under consideration in the General Assembly.
Let’s take a look at how Florida’s law works and how they identify when a third grade student needs retained.
The FCAT 2.0 is the generic name for all of the content area and grade level tests now administered in Florida. On the third grade reading test, students complete 45 multiple choice questions that end up being converted to a “developmental scaled score” between 140-260. Depending on their scaled score, students are classified into one of five levels, described as follows:
For Florida’s third grade students, they must obtain a scaled score of at least 182 out of 260 and reach Level 2 in order to be eligible to advance to fourth grade. Any student scoring in Level 1 (scale score of 140-181) is retained in third grade.
I contacted the Florida Department of Education to find out how the raw scores (0-45 multiple choice questions only) converted to the scale scores since the test varies from year to year. Here’s the response I received:
Recently, our office received your inquiry about the FCAT 2.0 Raw/Scaled Score Conversion. A student’s raw score is the number of questions the student correctly answered on the test which count toward the student’s final score. The score a student receives is called a developmental scale score for FCAT 2.0 Reading and Mathematics and a scale score for FCAT 2.0 Science.
The FCAT 2.0 is not scored using a percent- or number-correct scoring method. Students correctly answering the more difficult and discriminative items receive more credit than students answering easier and less discriminative items. In other words, the scoring model involves both the number and difficulty level of questions a student answers correctly. This type of scoring, which is referred to as pattern scoring or the Item Response Theory (IRT) method, produces more accurate scores (estimates of student ability) for individual students than the number-correct scoring method, as indicated by numerous publications in the educational measurement field.
Since each question has a different magnitude in scoring depending on the item parameters, it is not possible to say which percentage of the test items a student must correctly answer to earn a specific score or Achievement Level. For instance, students having the same number of correct responses might receive scale scores that are in different Achievement Levels.
This scoring method is described in the FCAT Handbook, posted at http://fcat.fldoe.org/handbk/complete.pdf. While the FCAT and FCAT 2.0 are different in that the FCAT 2.0 does not have performance tasks that are handscored and it only utilizes one score scale for each test (FCAT Reading and Mathematics used to use two kinds of scales), the IRT methodology used for scoring is the same.
For additional information, please call our main line and request to speak with Salih Binici, the psychometrician lead in our office.
Qian Liu, Ph.D.
Director, Scoring and Reporting
Bureau of K-12 Assessment
Florida Department of Education
Got that? Good. Looking in the handbook, you’ll find that items for the FCAT are pre-screened and analyzed using the aforementioned IRT method, which is “widely used because it produces the most accurate score estimates possible”.
So, in Florida, we don’t know exactly how many multiple choice questions or exactly which multiple choice questions a child must get correct in order to demonstrate proficiency, only that they must get a scale score of at least 182 out of 260 to reach Level 2 and move on to grade 4.
If we take some liberties and try and look into the hidden math in Florida’s black box, we can see that Florida’s third graders have a scale score range of 140-260 (120 points), they have 45 multiple choice questions, and they must get at least a 182 in order to be deemed proficient at reading. Looking at this strictly based on these numbers, we see that the children must get 42 scale score points out of 120 (i.e., 260 minus 140) to reach Level 2. This essentially requires a scale score percentage of 35 (42 divided by 120).
Basically, in Florida, a student must earn over 35% of the scale score points in order to be deemed a proficient enough reader to advance to fourth grade.
35%. Remember that number.
Here in Ohio, since we’re copying their law in order to achieve the (alleged) success that Florida has experienced using a third grade reading guarantee, you’d think that we would try and replicate each component of the law, including the testing model and procedures, right?
Well, we’ve got a different methodology altogether.
First of all, our Grade Three Reading Ohio Achievement Assessment has an entirely different blueprint. While Florida’s test is composed entirely of multiple choice questions that focus on a child’s ability to read and select an appropriate answer, Ohio’s test includes:
- 29 multiple choice questions worth 1 point each
- 4 or 6 short answer questions worth 2 points each
- 2 or 3 extended response questions worth 4 points each
Depending on the year, Ohio’s test has 36 or 37 items worth a total of 49 points. And while Florida’s multiple choice questions stick to the concept of reading only, Ohio’s test adds in 20 points worth of short answer & extended response questions that measure a child’s ability to formulate answers on their own and convey those answers in writing — much different than simply reading and responding. While Florida’s multiple choice questions can be scored entirely by an objective computer, Ohio follows a different process that relies on an extremely large quantity of subjective human scorers.
The OAA multiple-choice items are scored by computer, and constructed-response items are scored by trained
scorers in central locations. These scorers work for the test contractors that support Ohio’s OAA testing programs.
The test contractors for OAA are currently the American Institutes for Research (AIR) which is the overall
contractor and Pearson which is the scoring contractor.
Secondly, the FCAT 2.0 reading test for grade three is “administered in two 70-minute sessions over two days with a break in the middle of each session.” Florida breaks up the intensity of the testing situation into two separate days with a break on each day. Meanwhile, Ohio knocks it out in one fell swoop, keeping its 8- and 9-year-olds captive for one torturous 150-minute session:
Students have up to 2.5 hours to complete each test. Schools may decide to schedule a set amount of time (perhaps an hour or an hour and 15 minutes) to administer a test to all students. At the end of that time, students who are finished may be dismissed. However, any student who has not finished the test in this allotted time must be given additional time to complete the test, up to a total of 2.5 hours on that same day.
These testing components aside, the biggest difference between the tests is the final scoring process. As mentioned before, Florida field-tests all of its items and assigns them values prior to administration, keeping their developmental scale score consistent from year to year. Ohio? Not so much. Here’s the description of how Ohio students’ raw scores are converted to scale scores [emphasis added]:
Equating and Scaling: The Conversion of Raw Scores to Scaled Scores
Ohio uses the Rasch model (a single parameter logistic model) for computing item difficulties and examinee abilities. The Rasch model is based on the probability that a given examinee answers a given item correctly. This model is used because of its widespread acceptance, its ease of use, and the commercial availability of software for implementing it. The Rasch model provides estimates of the difficulty of each item and the ability of each examinee on a linear scale in log-odds units, or logits.
Equating and Test Form Construction
Equating is a process in which test forms comprised of different items are calibrated to the same scale. Because each test form is made up of items that have been field-tested (and, in some cases, used operationally on a previous test form), item difficulty estimates are available to pre-equate operational forms so that the forms are of approximately equal difficulty for each administration.
Common Item Equating
Following the May 2013 OAA administration, item difficulty values were estimated based on an early return sample for all tested grades. The early return sample was selected to be statistically representative of all Ohio public school students. Because item difficulty estimates were available for all operational items, all could potentially serve as anchor items in the equating process.
Calibrating, equating and linking proceeded through a series of steps. First, the operational administration difficulty values were computed from the early return sample and compared with the “bank” or reference difficulty values. The mean difference between the operational and bank difficulties of the items is called the equating constant, or EQK. The equating constant was added to each difficulty value for items on the current test so that the mean item difficulties were equal. When the equating process was complete, item difficulties from the current administration were calibrated to the bank scale, on which the performance standards are located.
Test scores and performance standards are expressed as scaled scores. Scaled scores are invariant across different forms of the same test while raw scores are not because they reflect differences in the difficulty of the test items. A scaled score of, for example, 400 on one administration implies the same overall performance as the same as a scaled score of 400 on any other administration of the same test, but the number of raw score points corresponding to a 400 may shift slightly from administration to administration. Please note that scaled scores are not comparable across different grades or subjects.
After the May 2013 operational test administration, test items were calibrated and test forms were equated. Rasch ability estimates (called theta scores) were computed for each possible raw score. The Rasch ability estimates were then transformed to the appropriate OAA scaled score, all of which are calibrated so that the proficient standard is equal to a scaled score of 400.
Crystal clear, right? In short, it’s obvious that Ohio’s process for testing is distinct from Florida’s and many post-administration calculations take place in Ohio as we use a vastly different computational model (Rasch) than Florida does (IRT). As a result, while Florida’s scale scores remain constant from year to year, Ohio’s vary from one year to the next with a set number of raw points always being identified to equate to a specific scale score. Here are the raw/scale score conversions for the last seven years of Ohio’s third grade reading tests:
For seven straight years, the raw score that has been equivalent to Proficient on the 3rd Grade OAA has changed, ranging from a low of 28 points to a high of 33. What is the raw score needed to hit Proficient in 2014? We won’t know until after the test has been administered.
More important to this story than the score needed to be Proficient is the score needed to attain a scale score of 392, the arbitrary number chosen by Ohio’s State Board of Education needed for a third grade child to be deemed a proficient enough reader in to move to fourth grade. The number of raw points needed to reach a 392 has also moved for seven straight years and is an unknown target for this year. In addition to the fluctuation in raw/scale score conversions over this time frame, the actual scale score range has changed every single year while the total number of points has remained the same. Look at the chart below that shows the ever-changing mark for third grade readers:
As we consider the score of 392 that Ohio’s third graders need to obtain in order to be deemed worthy of moving to grade four, we can see that the percentage of raw points needed to obtain that score keeps changing from year to year. In 2011, third graders would have needed 25 out of 49 points (51%), while those same third graders would have needed to get 29 out of 49 (59%) in 2008 and 2009.
What’s more, the scale score range keeps changing while the 392 cut score is now set in stone (until it increases according to the Reading Guarantee law), meaning that the relative percentage of scale score points a student must earn is also a perennially moving target. Here’s how that chart looks with just the scale score percentages needed to obtain a 392 over the same seven year period:
Since we’re dealing with larger numbers, the scale score percentages are closer together, but still range from needing 53% in 2013 up to 58% just two years earlier. And we still don’t know what children will need to score this Spring to hit that magic cut score of 392.
Now stop for a second and look at the percentages we’ve discussed for Ohio’s third graders to prove that they are proficient enough readers (according to Ohio law) to move on to fourth grade. The raw score percentages have ranged from 51-59% and the scale score percentages have ranged from 53-58%. While we don’t know this year’s exact numbers, it is safe to assume they will fall in these same ranges, all of which require the student to obtain a score greater than 50% to move on.
Now, remember that number from Florida?
In Florida, the magic number to be eligible to be promoted is around 35% on the state’s standardized test. But that’s not even the full story!
Consider these facts in their entirety about Ohio’s and Florida’s testing programs that lead to the “same” retention policy for third graders:
- In Florida, students take a 45-point test comprised entirely of multiple choice questions
- In Ohio, students take a 49-point test on which 41% of the points are more challenging constructed-response items
Result: Florida’s test more accurately measures the skill of reading while Ohio’s test adds in an assessment of the student’s writing skills. Florida’s test also removes the subjective human factor in scoring student responses. Florida’s FCAT 2.0 is easier than than the OAA.
- In Florida, the test (with fewer questions) is broken up over two days with breaks built into each day.
- In Ohio, the test is administered in one long session with no formal breaks permitted.
Result: Florida’s testing process, while disrupting two days of instruction, is administered at a pace that is much more developmentally-appropriate to the attention span of 8- and 9-year-old children. Again, Florida’s FCAT 2.0 is easier than than the OAA.
- In Florida, test questions have pre-determined values and the scale scores are the same from year to year.
- In Ohio, test questions are evaluated after the test administration to determine the required number of raw points that equate to corresponding scale scores, which are also determined post-administration.
Result: Florida’s FCAT 2.0 is much more predictable and scores are much more stable than the OAA, which has a different target every year and is not known until after the test is taken.
- In Florida, students must get approximately 35% of the available points to move to fourth grade.
- In Ohio, students must get at least 50% of the available points to move to fourth grade.
Ohio has a harder test with more points on the line, a test administration process that evaluates a child’s writing and endurance, a scoring process that is created after the initial test results start coming in, and yet Ohio has set the bar for determining whether or not a child can move on to fourth grade over 15 percentage points higher than Florida’s.
So for all of those who have said that a child should be retained if they can’t read by the end of third grade, I ask you, “By whose measure?”
How is it that reading proficiency is so different in Florida than it is in Ohio? And why is Ohio’s requirement so much more stringent if it’s based on a Florida law that has had such great success? Why is it that students who will be retained here in Ohio would be passed on to fourth grade in Florida?
Do you still think the Third Grade Reading Guarantee is as simple as holding kids back who can’t read?
Contact the members of the House and Senate Education Committees to share this information and demand that immediate changes be made to the law to avoid irreparable harm to our children.