How Scientific is Reading First?

by Siegfried Engelmann
2006

In her interview with Dr. G. Reid Lyon (
1/18/06, Effective Reading Programs Share Common Characteristics, EducationNews.org), Nancy Salvato asked a direct and reasonable question: "What particular instructional programs do you endorse in order for teachers to implement what you've learned through your research?"

Lyon's short answer to this question was, "I have never nor will I endorse a program."

As part of his long answer,
Lyon asserted, "Everything I do comes from my scientific training."

If that's true, his scientific training was curious. In his long answer, he observed, "The value of any program is data driven and based on its impact on kids."

We know from reports like those published by the American Institutes for Research that there are two programs that have substantial evidence of effectiveness with whole-school reforms, Direct Instruction and Success for All. We assume that
Lyon has this information. A combination of these facts would create an argument that goes something like this:

Programs are judged according to their impact on kids.
Program D creates a large positive impact on kids.
Therefore, I will never endorse program D.

The argument doesn't make a lot of sense because we assume that the programs an investigator would endorse are the programs that create a substantial impact on kids. We would recognize that Salvato's question was reasonable, something a thoughtful teacher might ask. If the teacher is working with at-risk kids, the chances are 9 out of 10 that her kids are failing. She is failing, and knows that she is failing. She wants her kids to have a chance. So she asks someone who has specific data on which programs create a great impact on kids, and the response is, in effect, "I know the answer, but I'm not going to tell you."

Lyon's illogic does not stop there. As part of his long answer, he indicates that Reading First was initially designed to endorse only those programs that had scientific evidence of effectiveness. "What we originally wanted in Reading First was that if you want to buy a program with federal money, it should have gone through clinical trials to be sure it is effective. But there weren't enough programs that went through that level of rigor; so many programs would be screened out and only a limited number of programs would be available. The Department of Education made the decision to make the criteria more general. Programs had to be comprehensive and the instructional interactions must be based upon principles supported by converging scientific evidence."

The main problem with
Lyon's position is that it is what is called an argument from ignorance. For any program without experimental evidence of effectiveness, the reasoning goes like this:

We don't know if program A is effective or not.
Therefore, we'll assume that it is effective.

Translated into a response to the teacher who asks the question about what works, the answer now becomes something like this: "Well, I can tell you this much: There are at least two programs in this group that work, and some that we don't really know about, but instead of identifying which are which, I'm going to treat them all the same because they have some of the same features. So you just have to make your best guess. Good luck."

Viewed differently, it's the educational variation of Russian roulette, in which "at least one chamber is empty and the other chambers have some of the features of the empty chamber. Good luck."

In an article for Education Week (
1/28/04, The Dalmatian and Its Spots: Why Research-based Recommendations Fail Logic 101), I pointed out the illogic of the argument type that Reid Lyon uses about programs that have the features of effective programs. It is as illogical as this argument:

If a dog is a Dalmatian, it has spots.
Therefore, if a dog has spots, it is a Dalmatian.

Lyon is saying:

If a beginning-reading program is highly effective, it has various features: phonics, phonemic awareness, and so on.
Therefore, if a program has these features, it will be highly effective.

No. Programs are effective only if they have been demonstrated to be effective. The features that Lyon has identified (phonemic awareness, phonics, etc.) are global features that do not determine the details of a successful program, merely the details fairly naïve observers have noticed. In other words, one who knows how to create programs that are effective could design a beginning-reading program that produced horrible results, but that met all the criteria that Lyon specifies.

Geoff Colvin and I have written a rubric for identifying authentic Direct Instruction programs. The rubric is over 120 pages long and lists over 40 criteria. All these have been experimentally demonstrated to make a difference.

Consider
Lyon's reasons for changing the selection criteria from programs that are successful to programs that share features of successful programs: "… there weren't enough programs that went through that level of rigor; so many programs would be screened out and only a limited number of programs would be available."

This reasoning seems to be based on the idea that there should be a large number of programs available, whether or not they have been demonstrated to work. Someone on
Lyon's side might support this strange argument by saying, "Some of those programs that would be screened out might be able to show evidence of effectiveness. They just haven't been evaluated that way."

Consider the response that would result if this logic were applied to the drug industry. In addition to the drugs that have evidence of effectiveness, large numbers of drugs that have never reached "this level of rigor" should be included on the grounds that some of them might be able to demonstrate effectiveness if we tested them.

I think a majority of people would vote no on this practice.

Lyon's position about increasing the number of available programs, ultimately, is an example of a false dilemma—either we change the criteria from programs with demonstrated success to programs that have common features or we will have an unacceptably small number of programs. There is a middle ground, which would be to tell it as it is: Reading First would identify the programs that have significant data and acknowledge that the other programs on the list have some of the features of the programs with significant data. In this way, the answer to the teacher's question would be, "Well, I can't endorse a program, but I can tell you that the two programs with the asterisk after their names have significant data of effectiveness. The other programs don't, but they have basic features in common with the programs that are known to be effective. Your choice."

Lyon adds an abstract, historical layer to his argument. "It is important to note that we designed Reading First so that it would also stimulate publishers and program developers to develop and test programs scientifically to ensure their effectiveness. This is a very slow culture change, but there is some indication that the major publishers are beginning to move in this direction."

This perspective seems to favor a kind of affirmative action for publishers, designed to wean them slowly from their right to benefit from federal funds by supplying products with no evidence of effectiveness to at-risk classrooms. I suppose that if one considers the publishers more important than the kids, this position makes sense. If this is the case, a straight message to the teacher would be something like, "Understand that we are playing this game so that publishers who have unproven products don't suffer financially; therefore, you'll just have to subsume your concern over your kids to our concern over these corporations."

In
Lyon's defense, his position about never endorsing specific programs has a strong traditional basis, and is apparently intended to avoid conflicts of interest. Yet, the nature of the problem suggests that programs need to be named. The only thing a school or a teacher will use is some specific program, not information about phonemic awareness or phonics or guidelines about selecting programs with these features.

The slow cultural change that
Lyon refers to is not encouraging because it could have started with Project Follow Through in the 1970s. Follow Through, involving over 140 districts and 100,000 at-risk kids, showed what works with at-risk kids in K-3, but in the tradition of not naming specific programs, the winner was not named. Instead, the entire project was judged to be a failure, with the implication that all of the approaches tested in Follow Through failed, which was false. Third graders who went through Direct Instruction outperformed kids in all other models in reading, language, math, and spelling. DI students performed near the 50th percentile in all subjects; the average of 13 other models was around the 18th percentile. If this information had been disseminated at the time, a generation or more of kids may have benefited. (Of course, the outcome may have been unacceptable because there was only one winner—too small a number.)

The fact that some publishers are "beginning" to do what they should have been doing 35 years ago does not seem to generate much hope for at-risk kids who are in kindergarten and first grade now, and who will not benefit from a cultural change that may have impact after they have failed and dropped out of school. In the meantime, they will fail, like the millions who have failed since the '70s.

I wrote an article that defended Reading First (Reading First = Kids First, Oregon's Future, Winter 2005) on the grounds that Reading First required schools to take an important first step, using test results to determine whether programs are working and using back-up plans if they aren't working. This is a crude first step; however, I believe that Reading First is better than no Reading First. The tragic part is that Reading First uses teachers and kids as experimental subjects, although programs and training that would turn around the most seriously devastated schools are available now.

I would not have written the present article if
Lyon had acknowledged that Reading First was a political compromise that had some potential because it required states and schools to accept responsibility for failure and to respond constructively to data. But to frame arguments for political compromise and folksy analysis of features as either science or best practice is burlesque. Thirty-five years ago, a colleague pointed out, "We have warnings and directions for usage on a bottle of aspirin, but not a word of warning about using instructional programs that have not been demonstrated to be effective with children of poverty."

Such warning still does not exist and it probably won't occur until the public recognizes that we need some kind of pure Food and Drug Administration for at-risk kids. However, the first step in real cultural change requires a simple resolution that says, "No, kids won't fail. We will consider them FIRST, not as mere victims in the slow development of cultural change, or grist for another effort that keeps commercial interests happy and current prejudices well fed. We will use what is shown to be effective and implement it well. "