In the early weeks of the pandemic, New Yorkers paused every evening at seven o’clock to applaud the city’s frontline healthcare workers. The cheers, honking, and clattering of pots and pans could be heard from windows, fire escapes, and street corners as the city saluted those who repeatedly put themselves at risk for others. But as an August essay in the New York Times Magazine showed, many of these same frontline workers felt less supported in the hospitals where they were struggling to treat patients.
The story begins at a May conference meeting at Long Island Jewish Medical Center in Queens, which at the time had treated more Covid-19 patients than any other hospital in the country. The frontline physicians were exhausted and emotionally drained; they had spent months making grueling decisions about patient treatment, often at risk to their own health. Mangala Narasimhan, an intensive-care doctor, was anxious to return to her patients and knew that the discussion might be tense.
The meeting’s agenda included remarks by Alex Spyropoulos, a lead researcher and physician at Northwell Health system, of which Long Island Jewish is a part. Like dozens of other Northwell doctors, he had joined the conference via video, and addressed his colleagues from his home in Westchester. Spyropolous had been running a clinical trial to determine the most effective dose of an anti-clotting drug for severely ill Covid patients, and a member of his team had run into problems with a doctor on Narasimhan’s unit. As the Times article explained,
Stella Hahn, a pulmonary-critical-care doctor, arrived at work the day before the meeting to find that a Covid-19 patient had gone into cardiac arrest. She knew that the patient was enrolled in the clinical trial and had been randomly assigned to receive either the standard dose of the anticoagulant or the higher one. As is always the case in the most rigorous trials, neither the patient nor Hahn was supposed to know to which group this woman belonged. Double-blind, randomized, controlled trials — R.C.T.s — are considered the gold standard in research because they do not allow findings to be muddied by any individual doctor’s biases or assumptions. But Hahn believed that the patient’s condition now called for the higher dose, which could potentially require the patient’s removal from the trial.
A colleague of Spyropolous had called Hahn and urged her to reconsider for the sake of the trial, but Hahn had insisted that her responsibility was to the patient and that she had to rely on her own clinical judgment. During the meeting, Spyropolous emphasized that scientific progress demanded rigorous adherence to controlled trials, “even in the very stressful environment of a pandemic that was overwhelming our hospitals.” He went so far as to say that physicians’ relying on their own judgment rather than awaiting the results of RCTs was tantamount to “witchcraft.”
To the frontline doctors, Spyropolous’s comments were more than dispiriting: He seemed to be patronizing them while remaining detached from the medical emergencies at hand, placing data over the lives of individual patients.
This kind of fight was hardly limited to Northwell. As the Times article went on to explain, researchers across the country were pressing their practitioner counterparts “to slow down long enough to build a body of evidence that they knew with more certainty could help.” But to the physicians in emergency wards, every moment mattered. Pierre Kory, a critical-care doctor at the University of Wisconsin Hospital and Clinics, said that “it became like Republicans and Democrats. The two sides can’t talk to each other.”
Clashes like this have contributed to the uncertainty surrounding treatments for Covid-19. Of course, there have been many reasons for confusion about the virus: Early incorrect statements by the World Health Organization regarding the risk of contagion, a pair of infamously fraudulent studies in the Lancet and the New England Journal of Medicine, and panicked or blatantly political reactions by respected media outlets have all added to the general turmoil.
But the sense of disorientation also stems from a longstanding tension between two medical communities: practicing physicians and what might be called the academic-governmental establishment. Each group has its own sphere of influence, with practicing physicians acting as the authorities in immediate medical decision-making. Their academic counterparts in universities or government agencies, by contrast, are often removed from decisions that immediately affect patient care, yet they generally determine what tools and medicines are available to the public. Some doctors, such as those at university hospitals, may be part of both communities. But despite this overlap, there is a fundamental friction between the two, which often manifests in contrasting visions of how to evaluate evidence, approach patient care, and understand the role of the doctor.
While their educational paths are often similar, the professional ethics of academic and practicing physicians can differ greatly, as the New York Times Magazine article illustrates.
Academics are generally more inclined to pursue future, scalable benefits, and so are motivated to produce the most airtight research possible, even if it takes longer or some patients lose out along the way. In part for this reason, they are committed to the randomized controlled trial as the “gold standard” of proof of efficacy for a new therapy.
In an RCT, subjects are randomly assigned to either a treatment or a control group, with the latter typically receiving a placebo. The trials are often designed to be “double-blind,” meaning that neither the patients nor the physicians know whether a particular patient is receiving the treatment or the placebo. The patients are generally in a highly controlled environment and given the same diet and overall treatment regimen, apart from the new drug or placebo.
Withholding therapy from a control group is considered justified when there is genuine uncertainty about the merits of the therapy being tested, so that doctors are in a state of “equipoise” between providing their patients with the therapy or a mere placebo. Of course, researchers always have some reason to believe that a treatment being tested might work — whether based on theory, observation, or evidence from lab experiments — and so this state of equipoise cannot always be perfectly maintained. The placebo group is by design missing out on a potentially effective treatment.
Placebo-controlled trials thus raise complicated ethical problems even under ordinary circumstances. In the face of a global pandemic of a known fatal illness, RCTs are even harder to justify, since desperately ill people who are assigned to a placebo group would not be allowed to receive any pharmaceutical intervention — placing both their own health and that of others at risk.
In contrast to academic researchers, frontline physicians work directly with people who are suffering, and so tend to value access and speed. They feel personal responsibility to help patients as quickly as possible, and are willing to adjust things like diet and treatment to better serve them. They also tend toward what psychiatrist Norman Doidge, writing in Tablet magazine, calls an “all available evidence” approach to medical knowledge: They pay close attention not only to RCTs but to other forms of research such as observational studies and case histories. In addition, practitioners are inclined to learn as they go based on what seems to be working for individuals or groups. They take seriously the cumulative knowledge culled from their own years of technical training and in-person experience.
The goal of academic physicians is noble and crucial. They are devoted to advancing medicine by producing findings that leave little room for doubt. But as they tend not to assume personal responsibility for patients in the way practitioners do, they may risk prioritizing pristine data collection over the welfare of individual patients.
In an ideal world, each approach would function as a check on the excesses of the other. In some sense, one might think of one school of thought as primarily aimed at obtaining the best possible care for future patients, and the other as interested in good-enough treatment for patients now. Often, these two aims do balance each other out.
But there is also a clear hierarchy in medicine: Academic doctors, especially those who lead federal agencies like the U.S. Food and Drug Administration or the National Institutes of Health, are closer to major sources of funding and levers of power. Their approach is also buoyed by a culture that emphasizes “big data,” and so they exercise great influence over how medicine moves forward.
As Harvard professor Daniel Carpenter painstakingly documented in his 2010 book Reputation and Power, the FDA’s authority has not been limited to simply approving or rejecting new drugs. It also extends to shaping “standards of scientific evidence and the concepts defining them.” This power has not always served the country well.
In the book’s introduction, Carpenter focuses on a 1987 case in which the FDA rejected pharmaceutical company Genentech’s anti-clotting drug Activase, stunning financial markets and a host of editorial writers. Cardiologists across the country had been optimistic about the drug and complained bitterly about the ruling; the editor of Science wrote that when “a regulatory agency that licenses drugs for heart attacks stumbles, it may have not only egg on its face but blood on its hands.” Five months later, after results from further trials satisfied regulators, the FDA reversed course and approved the drug for marketing in the United States.
As Carpenter notes, “What counted as a ‘cure’ in the treatment of heart attacks would be defined not only by a broad community of cardiologists, but also (and perhaps primarily) by an organization of government scientists and regulators.” The episode is just one in a series of clashes between two kinds of highly trained medical professionals, each of which is bound to different aspects of science and evidence-gathering. Often in the press, as we will see, one of these viewpoints enjoys labels like “evidence-based medicine” or “waiting for solid science.” But in fact, each offers its own contrasting account of science, of what counts as reliable evidence.
Before the FDA came into being, voluntary associations created and disseminated formalized pharmaceutical compendia. Early drug regulation was conducted by the American Medical Association and the American Pharmaceutical Association. As a report from the Independent Institute on the history of FDA regulation notes, these associations were created by “medical men interested in advancing their crafts and the dignity of their professions.”
The FDA emerged to fill a gap in the role of voluntary associations, following the tragic deaths of nine children who had been given contaminated smallpox vaccines in 1901. Congress passed the Biologics Act of 1902, which required that every biological drug, such as vaccines and serums, receive approval from the federal government before going to market.
In 1906, Upton Sinclair’s muckraking novel The Jungle horrified the nation with its descriptions of the filthy conditions of Chicago’s meatpacking plants. Public-regulation advocates successfully lobbied for the Meat Inspection Act, as well as for the Pure Food and Drug Act that offered standards for the strength, quality, and purity of drugs but also placed restrictions on “misbranding” them and on “false or misleading” labeling. These and later restrictions would generate legal challenges over freedom of speech and over who gets to decide what constitutes “false” or “misleading” medical information.
In 1938, with the passage of the Federal Food, Drug, and Cosmetic Act, the FDA was charged with reviewing drug safety, but with a major caveat: Medicines were considered to be automatically approved if the FDA took no action within sixty days; approval was the default. It wasn’t until 1962, when the Kefauver–Harris Amendments to that law were passed, that the modern iteration of the FDA came into being. Approval was no longer the default. Since then, the stakes for debates over medical knowledge have risen alongside the FDA’s growing influence.
These new amendments had been under discussion for years, but they were finally instituted in response to the global shock of “thalidomide babies.” Beginning in 1957, pregnant women worldwide had been taking a new drug for morning sickness — thalidomide — which was later found to cause severe birth defects or stillbirths. The lesson the medical community drew was to curtail giving new medicines to pregnant women, as it is extremely difficult to test the effects of drugs on fetuses. The amendments, however, granted the FDA sweeping additional powers, among which was the responsibility to determine not only a drug’s safety but its efficacy before allowing medicines to enter the market.
It is not clear how determining thalidomide’s efficacy would have prevented the tragedy — the drug indeed seemed quite effective in treating morning sickness. And as the report from the Independent Institute notes, “To a great extent, efficacy, which is sensitive to individual conditions and mediated by market process, had in the past always been judged jointly by doctors and consumers. A drug’s efficacy ought to be judged relative to the alternative therapies and is therefore constantly changing, being discovered, and being proven by medical-market experience, with the use of postmarket surveillance and research.” People respond quite differently to different drugs, for reasons that are often unclear. Medical certainty can be a moving target, and proving efficacy, in addition to safety, is time-consuming and expensive.
Efficacy is also sometimes discovered as a happy byproduct of using medicines for unrelated purposes. This fact leads us into one of the most troubled aspects of the modern FDA. In many cases, a drug that has already been approved for one purpose is discovered to be effective for another purpose. The practice of prescribing drugs in this way is known as “off-label use,” and the FDA has long been on a mission to restrict it.
Consider the drug minoxidil, which was originally approved by the FDA in 1979 for the treatment of high blood pressure. During trials, researchers noted that it also seemed to cause hair growth. This effect was so clear that doctors began prescribing the drug to treat hair loss almost immediately after its release. The FDA approved the drug for this purpose in 1988, but that made little difference to physicians who were already prescribing it.
The FDA, however, had begun to operate on the premise that it alone had the authority to decide drug efficacy for specific uses, prior to the experience and experimentation of the medical community. The 1962 legislation thus exacerbated the tension between practicing physicians and their academic and regulatory counterparts — first, over how to properly weigh risks posed by a medicine against those posed by a disease, and second, over how to establish “good enough” information.
Aggravating this tension, the FDA has moved in several ways to prevent pharmaceutical companies from sharing information with doctors about ongoing trials on off-label uses. Once a drug has been approved by the FDA as safe and effective for a given purpose (as when minoxidil was approved to treat high blood pressure), doctors may prescribe it for any purpose they deem medically appropriate (such as hair loss). Doctors, under the protection of free speech, are also permitted to discuss these off-label uses with their patients.
But the FDA has long sought to prohibit pharmaceutical companies from discussing off-label uses, under the reasoning that this constitutes misbranding. Many companies have paid heavy fines for this practice. And in one case in 2006, a doctor was subject to a long and destructive legal battle when federal prosecutors argued that he had discussed off-label use of a drug produced by a company to which he had ties.
The FDA’s drive to prohibit off-label marketing — and thereby keep physicians and patients in the dark regarding potential uses of approved drugs — has resulted in several lawsuits seeking to limit the agency’s powers. Most notably, in the 2012 case U.S. v. Caronia, a U.S. Court of Appeals ruled that under free-speech protection, the agency may not prohibit pharmaceutical companies and their representatives from “the truthful off-label promotion of FDA-approved prescription drugs.”
But the decision raised the question of what constitutes “truthful” information and who gets to determine it, if not the FDA. It came perilously close to undermining the agency’s authority to decide what is safe and effective medicine.
The tension between different methods of gathering medical knowledge, and between practicing physicians and academic or government researchers, has been brought to national attention during the Covid-19 crisis. It is particularly apparent in the stories of two hotly debated treatments: hydroxychloroquine and remdesivir.
The drugs have markedly different histories. Hydroxychloroquine was approved by the FDA in 1955. Remdesivir has not yet been formally approved, though the FDA in May granted an “emergency use authorization” for the drug — a way of allowing the use of an unapproved treatment under certain circumstances, in this case for severely ill Covid patients.
The remdesivir saga, at least as it relates to Covid in the United States, began in January, when a 35-year-old man checked into an urgent-care clinic in Snohomish County, Washington. He had just returned from visiting relatives in Wuhan, China, he explained, and realized that his recent cough and fever seemed to match the CDC’s description of the novel coronavirus. The clinic immediately notified local and state health departments, and CDC staff soon got involved. The patient tested positive for Covid — the first confirmed case in the United States. His condition worsened daily, until the otherwise healthy young man required supplemental oxygen.
On his seventh day in the hospital, the patient’s health continued to decline and doctors decided it would be worthwhile to administer the experimental drug remdesivir. By the next day, his condition had improved. The supplemental oxygen was discontinued, his appetite increased, and he exhibited no symptoms apart from an intermittent dry cough and runny nose. This remarkable overnight transformation, which was reported via a detailed case study in the New England Journal of Medicine, piqued the interest of many physicians searching for treatments.
The National Institutes of Health (NIH) took note too, and began preparing a randomized controlled trial whose results eventually contributed to the FDA’s decision to issue an emergency use authorization. The trial, involving about a thousand patients, was limited to those with severe cases of Covid-19. This decision seems to have been guided by the logic that, if the drug were found to help even very sick people, one could more easily attribute their recovery to the drug than to natural processes.
Of course, this logic could also be turned around: There was a chance that the very ill patients might have been beyond the help of a drug that could still be beneficial in milder cases. Finding no effectiveness for severely ill patients could easily be mistaken as a sign that the drug had no benefit for any patients. As it happened, the study concluded that both mortality rates and recovery time were lower in the group taking remdesivir.
Another reason for limiting the trial to severely ill patients was that they were most in need of an effective treatment. But running a placebo-controlled trial in such a population also meant that hundreds of severely ill patients were given no antiviral medicine at all.
Many frontline physicians were frustrated at the sense of lost time for these patients, and for others who might have benefited from remdesivir. During the months of the trial — February through April — research in other countries offered indications of remdesivir’s potential effectiveness. In late March, Jesse Greenberg, a cardiologist who had developed a severe case of the disease, set up a Twitter account from his hospital bed to plead for emergency access to remdesivir. Greenberg ended a string of tweets by insisting that “we need to make trial drugs or compassionate use more accessible now please!”
The NIH ultimately stopped the trial early, as it was showing positive results for remdesivir. The decision caused controversy among researchers over whether enough data had in fact been gathered to be confident that remdesivir saves lives. At the time, the median recovery period for patients in the control group was 11 days, compared to 15 days for the placebo group, a statistically significant difference. The mortality rate was also lower for the treatment group — 8 percent, compared to 11.6 percent for the placebo group — but this difference was not statistically significant.
Anthony Fauci, head of the NIH’s National Institute of Allergy and Infectious Diseases, explained that “when you know a drug works, you have to let people in the placebo group know so they can take it. What [this trial] has proven is a drug can block this virus.” While the decision was welcomed by patients, it surely provided little comfort to people whose loved ones had hoped to receive remdesivir on the trial but instead received a placebo.
The vexed ethics of allowing placebos in research on fatal illnesses has shown up even more prominently in the heated debate over hydroxychloroquine, and in turn over the right of physicians to prescribe drugs for off-label use.
Though it has only recently become a household name, hydroxychloroquine (HCQ) enjoyed relative anonymity for decades as a conventional, FDA-approved treatment for malaria, lupus, and rheumatoid arthritis. For years many physicians had believed that the drug’s mechanism of action for preventing malaria parasites from entering cells might also work for preventing viral infections. Early indications that HCQ might be effective in treating Covid-19 came from frontline physicians across the globe who had opted to prescribe it for their patients; a survey conducted in late March showed that many doctors considered it to be the most successful available treatment. It has also been shown to inhibit the virus in lab tests.
For many doctors, then, using hydroxychloroquine seemed like common sense. Its familiarity was part of what made it attractive. In the early days of the pandemic, reaching for HCQ was for them an obvious choice.
This instinct has been borne out by a number of studies, perhaps the most significant of which was conducted by the Henry Ford Health System. The retrospective analysis of over 2,500 hospitalized patients found that about 13 percent of those treated with hydroxychloroquine died, compared with about 25 percent of those who were not. Another retrospective analysis of over 3,700 patients found that, if administered early, a combination of HCQ and the antibiotic azithromycin led “to a significantly better clinical outcome and a faster viral load reduction than other treatments.”
Most randomized controlled trials have failed to find statistically significant benefits from hydroxychloroquine, though some have. However, many of these studies may have simply been too small to reach statistical significance. And considered in aggregate, the results of these trials have been suggestive of some benefit; in particular, a list of studies maintained by the website HCQ Meta shows that the drug has been found especially effective when used in the early stages of Covid-19.
Despite its familiarity and longstanding safety record, the use of hydroxychloroquine in treating Covid was controversial almost from the start. Part of the academic opposition to the drug was based on the manner in which early evidence of its effectiveness was produced.
In March, the famously theatrical French microbiologist and physician Didier Raoult conducted a small study of thirty-six patients to test the effects of hydroxychloroquine. Twenty patients received HCQ (some combined with azithromycin) and sixteen patients were untreated. He did not need any special permission to conduct this study, as HCQ had been approved worldwide decades ago. The study took place over ten days. By day six, 70 percent of the HCQ group no longer tested positive for Covid, compared to 13 percent of the control group. Patients who received both HCQ and azithromycin seemed to fare best: By day five, all tested negative. If nothing else, this meant that that they were no longer spreading the disease.
Raoult posted information about his research on YouTube in a video that has garnered well over two million views. His triumphant claims about the efficacy of hydroxychloroquine in treating Covid were seized upon by President Trump, who tweeted in late March that the drug had “a real chance to be one of the biggest game changers in the history of medicine.”
Though Raoult has been described as the most highly cited microbiologist in Europe, he frequently bucks academic convention. For example, he adamantly refuses to set up placebo groups when studying serious illnesses, as he believes withholding treatment in such cases to be unethical. In his study of HCQ, he compared his patients with a control group consisting of people who had either refused the drug or who were being treated at hospitals that didn’t offer it. He was immediately criticized for touting findings that were not rooted in a large-scale RCT. Robert Zaretsky, writing in Slate, called Raoult a “Trumpian.”
In a March op-ed in Le Monde, Raoult attacked those who objected to his lack of a placebo group, arguing that “the doctor can and must think like a doctor, and not like a methodologist.” The structures for obtaining medical knowledge, he claimed, have increasingly come to be defined by specialists in methods, rather than by physicians treating patients — to the point where “the form has come to take over the substance.” He noted the famous “parachute paradigm,”the absurd idea of testing the effectiveness of parachutes by having one group of people jump out of a plane wearing them and a second group jump wearing only knapsacks. To Raoult, giving placebos to a group of people suffering from a potentially fatal illness reflects the same shoddy logic. Unsurprisingly, this line of criticism, especially by a scientist celebrated by Donald Trump, did not meet with a warm welcome in many academic quarters and media outlets.
Despite the controversy, the evidence of hydroxychloroquine’s promise seemed strong enough that in late March, the FDA issued an emergency use authorization to approve the drug’s use for treating Covid. This approval meant that the drug could be added to the Strategic National Stockpile for distribution to doctors treating Covid patients.
Then, in May, a study appeared in the premier science journal The Lancet, warning that HCQ posed serious risks for Covid patients. That study was swiftly found to be fraudulent, and was retracted. But the dramatic media attention around the study seemed to bolster the idea that prescribing drugs for off-label use is dangerous — the longtime position of the FDA.
Shortly after the study was retracted, the FDA revoked its emergency use authorization. They cited what they took to be a disappointing result from another study: “Recent results from a large randomized clinical trial in hospitalized patients … demonstrated that hydroxychloroquine showed no benefit on mortality or in speeding recovery…. The totality of scientific evidence currently available indicate a lack of benefit.” The agency then went further, arguing in a July update not only that the drug lacked benefit in treating Covid but that it was potentially harmful.
In August, Senators Mike Lee (R-Utah), Ted Cruz (R-Texas), and Ron Johnson (R-Wisconsin) requested that the FDA provide a justification for its warning of potential harm. The FDA’s response included a useful list of studies, which reflected inconclusive evidence for the drug’s effectiveness. But none indicated that taking it resulted in harm to Covid patients.
Despite all this, the hydroxychloroquine controversy still rages. In November, for example, the Washington Post editorial board criticized the Trump administration for distributing “millions of ineffective, potentially dangerous hydroxychloroquine pills from the Strategic National Stockpile to cities and pharmacies.”
Ultimately, the drug was found to be neither a silver bullet nor a risk for Covid patients. If administered early, it may be helpful in treating the virus, but it is not a panacea.
Yet even had hydroxychloroquine turned out to be a total dud, physicians’ legal off-label use of a familiar, FDA-approved drug to combat a virus we knew little about should not have caused the scandal that it did. HCQ has long been established as safe. A 2003 Lancet article encouraged physicians to consider it for treating various newer diseases such as HIV and SARS, noting that the drug had even been shown to be safe for pregnant women. Its affordability and low toxicity made it an appealing option for physicians searching for treatment of those illnesses. This is the kind of option doctors should be given the discretion to try, especially in a time when we are learning together in an emergency, without stoking national panic.
As Sheryl Gay Stolberg reported in the New York Times, by May the political debate had become so heated that it was actually impeding researchers trying to study the drug. Not only was the HCQ firestorm pointless, but ironically, though it was supposedly concerned with making sure we had enough good data to use the drug, it actually left us with less.
In many ways, although remdesivir may prove more effective, hydroxychloroquine was more attractive to doctors, as they have used it for years and know it to be safe and affordable. The use of medicines over the course of decades or centuries is for them one of the most trusted methods of gathering information — a method less trusted among many academic researchers who put their trust in RCTs instead.
The first published medical RCT is usually considered to have been a 1948 paper in the British Medical Journal, testing whether the antibiotic streptomycin worked for treating tuberculosis. Before RCTs became standard, researchers might test a new drug by giving it to a group of people with some condition, then comparing their outcomes to those of a control group with the same condition. That control group might be in a different hospital or might be “historical” — that is, the researchers would compare the test group to individuals with the same illness who had been observed in the past and had not received the treatment under study.
The inventors of the RCT noted that without ensuring that the two groups had received virtually identical care — the same diet and overall treatment regimen in the same hospital — researchers could not be certain that one group had not improved due to, say, more nutritious food. RCTs also address the risk that, in the interest of proving their treatment works, researchers might “load” the treatment group with healthier patients. This “allocation bias” can also be unintentional: Factors such as sex, income level, family medical history or other characteristics might skew results.
All these uncontrolled variables are known as “confounding factors.” To minimize their effects, researchers reasoned that patients should be randomly assigned to one group or the other, with the hope of canceling out differences between the groups. Finally, to ensure that control subjects and their physicians are fully “blinded,” the control subjects are often given placebos, so that they will not know whether they have received the treatment or not.
In short, the creators of RCTs were guided by the conviction that both researchers and patients might be biased in ways they do not realize, and that studies might be skewed by factors one cannot readily measure. They recognized that people are not machines, that they respond differently to medicine for reasons that are often unclear. This acknowledgment by the researchers — that they did not know what they did not know — was not meant to be taken as a dismissal of other kinds of studies or as a guarantee of their own study’s perfection.
For all these reasons, RCTs are widely considered to produce superior data. Yet many academic researchers today are quick not only to discount but to dismiss information gathered from other methods, such as observational studies. The FDA, for example, relies almost exclusively on RCTs in evaluating drugs. But this approach breaks with that of the early designers of the RCT.
Despite their obvious benefits, randomized controlled trials have their own limitations and flaws. These are not always easy to recognize and can lead to too-hasty conclusions.
The RCT has come under increased scrutiny in recent years. Physicians both inside and outside the academy agree that RCTs are an invaluable tool, but many contend that they are just one among several, and are not the sole conduit to medical knowledge. As Harvard’s Donald Berwick, a former health advisor to President Obama, put it in 2013, “Randomized trials for some purposes is the gold standard, but only for some purposes…. Context does matter. We’re learning in a very messy world, and the context that neatens up that world may make it hard to know how to manage in the real world.”
While RCTs have great internal validity, meaning that the data they provide tend to be reliable under the conditions in which the trial took place, they frequently lack external validity, meaning that their results do not always translate well outside of the rigorously controlled environment of the study. As Norman Doidge points out, RCT subjects are often not representative of the whole population of interest. To borrow an example from Doidge, a research team testing a drug for depression may select subjects who do not also suffer from alcoholism or anxiety, thereby ensuring that the drug is tested only for its effects on lowering depression. But while this approach might help to assess the drug’s reliability for a narrow subset of patients, the group does not resemble most real-world patients — many of whom do not suffer only from depression. He also notes that drug companies have at times been found to manipulate the criteria for which patients are admitted into a study to make it seem like their drugs are more effective than they actually are.
Or consider an example from the Covid crisis. An RCT conducted by the World Health Organization concluded that neither HCQ nor remdesivir had a meaningful effect on hospitalized patients. Yet at least 78 percent of patients in the study had bilateral lung lesions at the start of the trial, almost certainly meaning they were severely ill with pneumonia. While the study’s findings offer important information about the drugs’ potential effectiveness in aggressive cases, we cannot infer from this whether the drugs might help in milder cases.
As we have seen, there are also grave moral repercussions to insisting on a placebo when testing potentially lifesaving treatments. In 2016, the Wall Street Journal published a series of reports on a group of children with lethal Duchenne muscular dystrophy who waited for years for the FDA to approve a promising drug they desperately wished to try. Clinical researchers had given the treatment to twelve boys. After four years on the drug, ten of them could still walk. In the control group of eleven boys, only one could still walk. The FDA refused to allow the boys in the control group access to the drug, suggesting that the trial was too small or that something may have bugged the results. In this case, the aims of obtaining near-perfect certainty and of protecting people from unproven medicine seem to have resulted in terrible and unnecessary suffering.
RCTs also require a kind of formality that is lacking in emergency circumstances. Physicians dealing with high numbers of patients in a crisis — and in the case of Covid, scrambling for protective equipment and ventilators — will likely focus on getting people well rather than maintaining controlled environments and carefully selected subject groups.
Writing in the New England Journal of Medicine in 2017, former CDC head Thomas Frieden forcefully argued that, though RCTs are often “presumed to be the ideal source for data on the effects of treatment,” not only are their weaknesses too often discounted, but they are sometimes inferior to other data sources in important ways.
So what other options are there? What have we learned from the divergent tales of hydroxychloroquine and remdesivir? And how might we restore a common sense of purpose in medical culture?
To begin, we must recall the proper ends of medical knowledge, and the epistemological humility that guided the development of early randomized controlled trials. RCTs are frequently referred to as the gold standard for “evidence-based practice.” But medical progress often relies instead on practice-based evidence — that is, learning from what we have already done.
For example, in epidemiological studies that assess risk factors for illnesses, observational studies are almost always used in place of RCTs. Thomas Frieden cites an episode from the 1980s in which the rate of Sudden Infant Death Syndrome (SIDS) rose sharply in New Zealand. Researchers conducted a retrospective observational study of detailed case information on hundreds of infants, some of whom had died and some of whom had not. Researchers concluded that parents should not place sleeping babies on their stomachs, and launched an educational campaign warning against the practice. The incidence of SIDS markedly declined. As Norman Doidge notes, an RCT testing this SIDS “treatment” would have required a control group of infants allowed or made to sleep on their stomachs — a morally inconceivable scenario that would entail violating the Hippocratic Oath to do no harm.
Observational studies have also led to robust epidemiological data that indicate which public health approaches — taxes, laws, advertising campaigns — are most effective in reducing harms from smoking. But when it comes to actual medical interventions, observational studies are frequently ignored. The rationale here is that if a medicine turns out to be ineffective or harmful, those who approved it could be held responsible.
Understandable though this caution is, it can be taken to a counterproductive extreme, especially in cases where patients are suffering from rare or lethal illnesses. For many doctors, administering a familiar drug like HCQ, which had been associated with positive outcomes in observational studies, seemed akin to placing babies on their backs to avoid SIDS — not a surefire solution, but a sensible preventive measure. Patient wishes matter too. Many people, including highly informed patients such as the cardiologist Jesse Greenberg, would prefer to take their chances on a drug that has not yet been tested via RCT.
Case histories of individual patients can also lead to significant discoveries. These histories are extremely detailed, often providing hundreds of facts about a patient — diet, medical history, upbringing, occupation, symptoms, allergies, drugs administered, and so on. They offer a different sort of dataset from RCTs, which, by contrast, provide a few broad facts about hundreds or thousands of patients. Recall that with remdesivir, a case study of a single patient led to more in-depth research, including RCTs, that produced definitive results.
Norman Doidge’s call for an “all available evidence” approach could be taken further, toward a call for a culture that disseminates “all available information” to physicians. This culture would require a clear legal regime in which the FDA does not prevent pharmaceutical companies from sharing information about off-label uses of drugs. Arizona’s 2017 Free Speech in Medicine Act, which explicitly permits pharmaceutical manufacturers to engage in truthful promotion of off-label uses, is a promising step in this direction. Restricting the flow of information does little to advance medical progress or encourage goodwill within the broader medical community. When the credentials of the average U.S.-trained doctor do not differ all that much from those of the scientists employed at the FDA and academic institutions, the pervasive lack of trust between the two communities makes little sense.
Perhaps most important of all, academic physicians must be more willing to incorporate the perspectives of practicing physicians when designing trials, particularly those trials that are ethically fraught. There is an aphorism, widely attributed to the Canadian physician William Osler, that “the good physician treats the disease; the great physician treats the patient who has the disease.” We should hardly take away from this imperative a criticism of the RCT or the academics who carry them out. Without their work, doctors might carry a great many more duds in their arsenal, and medicine would suffer. Yet neither can we lose sight of what all this data is ultimately for.
Medical data consist of more than numbers on a page. The tidiness of the research lab does not always work well in emergency medicine, and we must never forget that the aim of medicine is to help individual patients who have varied needs and who tend to care more about their own recovery than about spotless data.
If there will always be a tension in medicine between speed and caution, caution must be understood to have a cost. It is time to move beyond the view that a medicine known to be safe, administered by professionals who believe they might help an ailing patient, should be feared more than the disease itself.
Exhausted by science and tech debates that go nowhere?