Science relies on the careful collection and analysis of facts. Science also benefits from human judgment, but that intuition isn’t necessarily reliable. A study finds that scientists did a poor job forecasting whether a successful experiment would work on a second try.
That matters, because scientists can waste a lot of time if they read the results from another lab and eagerly chase after bum leads.
“There are lots of different candidates for drugs you might develop or different for research programs you might want to invest in,” says Jonathan Kimmelman, an associate professor of biomedical ethics at McGill University in Montreal. “What you want is a way to discriminate between those investments that are going to pay off down the road, and those that are just going to fizzle.”
Kimmelman has been studying scientific forecasting for that reason. He realized he had a unique opening when other researchers announced a multi-million dollar project to replicate dozens of high-profile cancer experiments. It’s called the Reproducibility Project: Cancer Biology. Organizers have written down the exact protocols they would be using and promised not to deviate.
“This was really an extraordinary opportunity,” he says, because so often scientists change their experiment as they go along, so it’s hard to know whether a poor forecast was simply because the experiment had changed along the way.
Kimmelman and his postdoctoral fellow, Daniel Benjamin, asked nearly 200 professors, postdocs and graduate students to forecast the results from six of those repeated cancer experiments. The follow-up studies have now been done, and the results are in.
How did the 200 scientists do? According to a report published last week in PLOS Biology, not so hot.
“Most researchers overestimated the ability of repetition studies to have effects that were as significant as the original study,” Kimmelman says.
And that over-optimism ran pretty much across the board.
“There wasn’t really a big difference between trainees and experts,” he says. People with the most knowledge of the particular field actually did somewhat worse, while prominent scientists from related fields did somewhat better.
What’s going on?
“It’s possible that scientists overestimate the truth or the veracity of those original reports,” Kimmelman suggests. Or it’s possible that the scientists were simply too optimistic that independent labs would be able to follow an experimental protocol and get it to work properly.
Clearly, optimism is an important trait for a scientist, since most experiments on most days actually don’t yield exciting results. But optimism turns out to be a poor trait for scientific forecasters — certainly in this particular experiment.
Taylor Sells, a second-year graduate student who studies molecular biology and biophysics at Yale University, ended up being one of the very best forecasters in Kimmelman’s study.
She says she doubts she has any special mojo that lets her forecast experimental results. Instead, her approach was quite simple.
“Inherently as a scientist, we’re taught to be very skeptical of even published results,” she says. “And reproducibility has been a very important topic in science in general. So I approached it from a very skeptical point of view.”
She had no special insights into the actual experiments — in fact, she hadn’t performed those sorts of rodent experiments herself.
What Sells draws from her experience in the lab is that she is acutely aware of how hard it is to get the same results consistently.
“We often joke about the situations under which things do work, like it has to be raining and it’s a Tuesday for it to work properly,” she says. “That is something we think about a lot.”
That’s on her mind when she reads other scientists’ papers and makes a judgment about how much to trust the results.
The beauty of science is that truth comes out in the long run. But the process could be more efficient if scientists could do a better job up early on, picking the diamonds from the dross.
And Kimmelman says, there’s a lesson as well for scientists who tend to trust their own intuitions: “At least using the survey methods we used, those intuitions did not seem to be reliable or sound.”