How Scientific is
Reading First?
by Siegfried Engelmann
2006
In her interview with Dr. G. Reid Lyon (1/18/06,
Effective Reading Programs Share Common Characteristics,
EducationNews.org), Nancy Salvato asked a direct and
reasonable question: "What particular instructional programs do you
endorse in order for teachers to implement what you've learned through your
research?"
Lyon's short answer to this question was, "I have
never nor will I endorse a program."
As part of his long answer, Lyon asserted, "Everything I do comes from my
scientific training."
If that's true, his scientific training was curious. In his long answer, he
observed, "The value of any program is data driven and based on its impact
on kids."
We know from reports like those published by the American Institutes for
Research that there are two programs that have substantial evidence of
effectiveness with whole-school reforms, Direct Instruction and Success for
All. We assume that Lyon has this information. A combination of these facts
would create an argument that goes something like this:
Programs are judged according to their impact on
kids.
Program D creates a large positive impact on kids.
Therefore, I will never endorse program D.
The argument doesn't make a lot of sense because we
assume that the programs an investigator would endorse are the programs that
create a substantial impact on kids. We would recognize that Salvato's question was reasonable, something a thoughtful
teacher might ask. If the teacher is working with at-risk kids, the chances are
9 out of 10 that her kids are failing. She is failing, and knows that she is
failing. She wants her kids to have a chance. So she asks someone who has
specific data on which programs create a great impact on kids, and the response
is, in effect, "I know the answer, but I'm not going to tell you."
Lyon's illogic does not stop there. As part of his long
answer, he indicates that Reading First was initially designed to endorse only
those programs that had scientific evidence of effectiveness. "What we
originally wanted in Reading First was that if you want to buy a program with
federal money, it should have gone through clinical trials to be sure it is
effective. But there weren't enough programs that went through that level of
rigor; so many programs would be screened out and only a limited number of
programs would be available. The Department of Education made the decision to
make the criteria more general. Programs had to be comprehensive and the
instructional interactions must be based upon principles supported by
converging scientific evidence."
The main problem with Lyon's position is that it is what is called an argument
from ignorance. For any program without experimental evidence of effectiveness,
the reasoning goes like this:
We don't know if program A is effective or not.
Therefore, we'll assume that it is effective.
Translated into a response to the teacher who asks
the question about what works, the answer now becomes something like this:
"Well, I can tell you this much: There are at least two programs in this
group that work, and some that we don't really know about, but instead of
identifying which are which, I'm going to treat them all the same because they
have some of the same features. So you just have to make your best guess. Good
luck."
Viewed differently, it's the educational variation of Russian roulette, in
which "at least one chamber is empty and the other chambers have some of
the features of the empty chamber. Good luck."
In an article for Education Week (1/28/04, The Dalmatian and Its
Spots: Why Research-based Recommendations Fail Logic 101), I pointed out the illogic of the argument type that
Reid Lyon uses about programs that have the features of effective programs. It
is as illogical as this argument:
If a dog is a Dalmatian, it has spots.
Therefore, if a dog has spots, it is a Dalmatian.
Lyon is saying:
If a beginning-reading program is highly effective,
it has various features: phonics, phonemic awareness, and so on.
Therefore, if a program has these features, it will be highly effective.
No. Programs are effective only if they have been
demonstrated to be effective. The features that Lyon has identified (phonemic awareness, phonics, etc.) are global features
that do not determine the details of a successful program, merely the details
fairly naïve observers have noticed. In other words, one who knows how to
create programs that are effective could design a beginning-reading program
that produced horrible results, but that met all the criteria that Lyon specifies.
Geoff Colvin and I have written a rubric for identifying authentic Direct
Instruction programs. The rubric is over 120 pages long and lists over 40
criteria. All these have been experimentally demonstrated to make a difference.
Consider Lyon's reasons for changing the selection criteria from
programs that are successful to programs that share features of successful
programs: "… there weren't enough programs that went through that level of
rigor; so many programs would be screened out and only a limited number of
programs would be available."
This reasoning seems to be based on the idea that there should be a large
number of programs available, whether or not they have been demonstrated to
work. Someone on Lyon's side might support this strange argument by
saying, "Some of those programs that would be screened out might be able
to show evidence of effectiveness. They just haven't been evaluated that
way."
Consider the response that would result if this logic were applied to the drug
industry. In addition to the drugs that have evidence of effectiveness, large
numbers of drugs that have never reached "this level of rigor" should
be included on the grounds that some of them might be able to demonstrate
effectiveness if we tested them.
I think a majority of people would vote no on this practice.
Lyon's position about increasing the number of available
programs, ultimately, is an example of a false dilemma—either we change the
criteria from programs with demonstrated success to programs that have common
features or we will have an unacceptably small number of programs. There is a
middle ground, which would be to tell it as it is: Reading First would identify
the programs that have significant data and acknowledge that the other programs
on the list have some of the features of the programs with significant data. In
this way, the answer to the teacher's question would be, "Well, I can't
endorse a program, but I can tell you that the two programs with the asterisk
after their names have significant data of effectiveness. The other programs
don't, but they have basic features in common with the programs that are known
to be effective. Your choice."
Lyon adds an abstract, historical layer to his argument.
"It is important to note that we designed Reading First so that it would
also stimulate publishers and program developers to develop and test programs
scientifically to ensure their effectiveness. This is a very slow culture
change, but there is some indication that the major publishers are beginning to
move in this direction."
This perspective seems to favor a kind of affirmative action for publishers,
designed to wean them slowly from their right to benefit from federal funds by
supplying products with no evidence of effectiveness to at-risk classrooms. I
suppose that if one considers the publishers more important than the kids, this
position makes sense. If this is the case, a straight message to the teacher
would be something like, "Understand that we are playing this game so that
publishers who have unproven products don't suffer financially; therefore,
you'll just have to subsume your concern over your kids to our concern over
these corporations."
In Lyon's defense, his position about never endorsing
specific programs has a strong traditional basis, and is apparently intended to
avoid conflicts of interest. Yet, the nature of the problem suggests that
programs need to be named. The only thing a school or a teacher will use is
some specific program, not information about phonemic awareness or phonics or
guidelines about selecting programs with these features.
The slow cultural change that Lyon refers to is not encouraging because it could have
started with Project Follow Through in the 1970s. Follow Through, involving
over 140 districts and 100,000 at-risk kids, showed what works with at-risk
kids in K-3, but in the tradition of not naming specific programs, the winner
was not named. Instead, the entire project was judged to be a failure, with the
implication that all of the approaches tested in Follow Through failed, which
was false. Third graders who went through Direct Instruction outperformed kids
in all other models in reading, language, math, and spelling. DI students
performed near the 50th percentile in all subjects; the average of 13 other
models was around the 18th percentile. If this information had been
disseminated at the time, a generation or more of kids may have benefited. (Of
course, the outcome may have been unacceptable because there was only one
winner—too small a number.)
The fact that some publishers are "beginning" to do what they should
have been doing 35 years ago does not seem to generate much hope for at-risk
kids who are in kindergarten and first grade now, and who will not benefit from
a cultural change that may have impact after they have failed and dropped out
of school. In the meantime, they will fail, like the millions who have failed
since the '70s.
I wrote an article that defended Reading First (Reading First = Kids First, Oregon's
Future, Winter 2005) on the grounds that Reading First required schools to
take an important first step, using test results to determine whether programs
are working and using back-up plans if they aren't working. This is a crude
first step; however, I believe that Reading First is better than no Reading
First. The tragic part is that Reading First uses teachers and kids as
experimental subjects, although programs and training that would turn around
the most seriously devastated schools are available now.
I would not have written the present article if Lyon had acknowledged that Reading First was a political compromise that had
some potential because it required states and schools to accept responsibility
for failure and to respond constructively to data. But to frame arguments for
political compromise and folksy analysis of features as either science or best
practice is burlesque. Thirty-five years ago, a colleague pointed out, "We
have warnings and directions for usage on a bottle of aspirin, but not a word
of warning about using instructional programs that have not been demonstrated
to be effective with children of poverty."
Such warning still does not exist and it probably won't occur until the public
recognizes that we need some kind of pure Food and Drug Administration for
at-risk kids. However, the first step in real cultural change requires a simple
resolution that says, "No, kids won't fail. We will consider them FIRST,
not as mere victims in the slow development of cultural change, or grist for
another effort that keeps commercial interests happy and current prejudices
well fed. We will use what is shown to be effective and implement it well.
"