One question frontline doctors can’t answer.

Jan 14, 2024

Recently, a non-medical colleague I respect asked me a question about the latest wave of Covid-19. What was I seeing, they wanted to know. Any sign that this new variant is worse than past ones?

I thought about it. Poked around. Asked a few people I knew who were infected how they were doing. I went to the ER and worked a shift. I gave my answer.

The bigger question, though, is can clinicians ever really tell what is going on? Or are we just as prone to delusion-by-anecdote as those with less scientific minds?

I’ve been thinking about this for a while, running a number of scenarios.

My conclusions might surprise you—as they surprised me. What did I tell my colleague? And what does any of this have to do with the Boeing 737 Max?

First, I told my colleague that from what I can determine, there is no discernible difference between this latest Covid-19 variant and recent ones, in terms of severity.

But the question merited a longer discussion. Can clinicians reliably answer questions like, “What are you seeing in the hospital?”

Even if I qualify my opinion with disclaimers, no matter what I say, some people will take my answer without the “grain of salt” I insisted upon. They’ll take it as “the incontrovertible truth from the frontlines.”

But what if my experience is too dependent on randomness?

Uncertainty loves company. One defense against this is to check whether my own anecdotal experiences match those of my colleagues. If we’re all seeing the same thing, I reason, I’ll give my own observations more credence.

But there's also the risk of groupthink and the power of suggestion. I recently conducted an informal poll online where I instructed people to stay absolutely still for 10 seconds. I asked how many noticed their fingers tingling during those 10 seconds. Many did. Most hadn’t noticed that before. That sense of tingling—noticeable only when attending to it—is almost universally meaningless. Imagine though, if I told you (falsely) that this was a sign of some environmental toxin. Panic!

Math is better. Assuming I’m going it alone (i.e., not asking colleagues what they are seeing), the real answer to the “should I answer this question” debate lies in math and epidemiology. Yay. My favorite.

Whether I should answer the question hinges on:

The effect (cases, case severity, rate of hospitalization, death, etc.).
How the effect is measured. Are all hospitalized cases being detected, as they were for 2020-2022? How does one define “severity”?

But the most important variables involve epidemiology:

The baseline rate of events.
How large the change is.

Analysis #1. Death.

It’s pretty hard for most clinicians to know whether death rates are up in their hospital without seeing internal data.

Take Massachusetts General Hospital, the hospital with the most deaths per year in this state (due to its size and the complexity of cases its world-renowned experts attract).

Typical week: 30 deaths.

This week: 39 (30% increase).

Context: A 30% increase in all-cause mortality is very large. Outside of some Covid waves, it never happens on a state level.

If a patient dies anywhere on the premises, a “Code Blue” is called on the hospital-wide overhead PA system (unless that person is on hospice, in which case they die in peace without CPR). So, it’s possible that an ER doctor like me might notice a massive increase in Code Blues. But even a 30% increase really shouldn’t register.

Assuming I work 36 clinical hours per week, it would be hard to notice a difference between a typical week and this bad one. Would I really notice whether there’s a code every 5.6 hours instead of one every 4.3 hours? It could take several 8-10 hour ER shifts before I would even suspect something was up. And due to randomness, I could hear 2 or 3 codes one day, and 0 the next.

Long story short: outside of extreme circumstances, it’s hard for hospital clinicians to reliably detect even large increases in all-cause mortality, let alone from a single cause. Now, early Covid-19 was an exception. When mortality more than doubled in April of 2020 in Massachusetts, suddenly there would be more than 3 codes during an average shift, as opposed to the usual 0, 1, or maybe 2. In some New York hospitals, things were worse—with 6 times the mortality compared to usual, at the peak. Only in extreme circumstances, then, can doctors be sure that their anecdotal experience is meaningful with respect to changes in mortality rates.

Analysis #2: Hospitalization.

Can we really tell if Covid hospitalizations are up? Early in the pandemic, it was easy. Rates were off-the-charts high—with hospitalization rates exceeding 15% of infections in older people—while all other kinds of non-Covid emergencies decreased if not vanished. During the transition from Delta to Omicron it was also fairly easy to tell something was going on. The prevalence of Covid-related hospitalizations jumped 750% from early November to the peak in early January. Hospitals were full, and with far sicker patients than usual. Yes, we could reliably see a difference.

Putting numbers to this, in early December of 2021 (late Delta), around 35 patients per week were hospitalized at Mass General with Covid-19. A month later, during the peak of early Omicron, that number had swelled to 135 patients per week. That means that in December, an ER doctor working 36 hours per week would have seen around 21% of the patients (she’s there for 36 of the week’s 168 hours), translating to two patients hospitalized for Covid during each shift (assuming she either took care of all the Covid patients, or was aware of all cases in the department) By January, though, that same doctor would have seen 7 or 8 Covid-19 patients per shift. That’s noticeable and on a scale that was meaningful. But it was also extreme. More recent peaks have been far smaller, than the one in January of 2022. So, it’s hard for frontline doctors to give you a reliable answer on this most of the time—even now.

What about when rare (or non-existent) diseases suddenly appear, but aren’t as prevalent as Covid?

Mpox is a perfect example. There have been 472 Mpox cases in Massachusetts since the outbreak of 2023. There are over 600 infectious diseases doctors in this state (with around half working in a magnet hospital where the cases are likely referred by primary care). So maybe each ID doctor would expect to see a case or two, on average. It’s also important to remember that a few specialists received many Mpox referrals, while everyone else saw 0 or perhaps 1 case. What everyone might have noticed, though, were emails from hospital or practice leadership discussing cases that came through, reminders on the test and treatment algorithms, and general policies. But that is not the same as seeing cases. That’s essentially hearsay.

What about ER doctors like me? There are around 1,900 ER doctors in Massachusetts. Even if every case were diagnosed in an ER (which is untrue), on average 1 in 4 ER doctors would’ve seen a single case. Assuming even distribution, you could’ve asked 4 ER doctors whether Mpox was “really a big deal” and 3 of them would have never diagnosed a case. And yet, on a population level (specifically within high-risk behavior groups), the virus seemed to be everywhere.

In fact, you might recall that I personally diagnosed two cases out of the 474 cases statewide! How? Because I was convinced (long before it was widely accepted) that patients did not need to have a rash to spread the disease, nor have classic symptoms to warrant testing. So, I pushed for tests when others may not have (I believe I tested three people; two were positive). One of my Mpox patients had no rash. The other did, but it wasn’t “classic” Mpox. If you’d asked me, “What is going on with Mpox,” during the outbreak, my answer would have been, “It’s far more common in high-risk behavior groups than anyone realizes. But it’s still rare for everyone else.”

Analysis #3: Case severity.

Case severity trends are very hard for frontline doctors to assess. First, everyone’s definition of severe is different and doctors tend to ignore official criteria when answering these questions. Second, if you combine rates and a distribution of severity into one analysis, the numbers we each see become too small for drawing reliable conclusions.

Back to Covid as an example. Let’s say during a bad wave that I see 10 patients per shift with Covid-19. The distribution of severity for all cases might look like the green line. But let's say, by chance, the 10 cases shown as red stars show up.

*Image: Jeremy Samuel Faust and ChatGPT.*

Imagine five doctors see 2 of these cases each. One doctor would see the extremely severe case. Even if the other case that this particular doctor saw was moderate or mild, how might he answer if asked how severe this disease is? Another doctor who saw one mild and one moderate case would give a very different answer.

The point is that when our sample size is too low, it’s easy to be mislead. This is especially true in hospitals, where mild cases are less likely.

Now imagine that, by chance, 2 out of 10 cases turn out to be extremely severe. (Unlikely given the green line, but it could happen!)

Depending on the roll of the dice, a doctor might see zero or both terrible cases. Whether he does will drastically influence his assessment of current disease severity. Even if 10 doctors see 3 cases of Covid per day, around half would not see an extremely severe cases on a particular day. Does that mean the disease is not severe in the patient population? No. And if they saw 10 cases of Covid per day, they might notice a difference. But again, it would depend on chance. It would take dozens of cases to really sort this out—and then doctors would have to remember what things were like before. Not easy.

Another problem is that doctors are not comfortable telling you how many patients they have seen with a particular condition. In some instances, we over-estimate, because we might be aware of (or even involved in) severe cases that we were not directly caring for. (We try to help each other in tough cases.)

The asymmetry of anecdote. As I’ve written before, anecdotal evidence is asymmetric. If I see zero cases of a new and dangerous disease, that could be meaningful or meaningless depending on how common the disease is. If this disease supposedly affects 1 in 10,000 people in Boston, and I see 30 random ER patients in a given shift, I’m unlikely to see a case. My failure to see a case hardly means the problem isn’t real?

But if I see 3 cases of what is supposed to be a rare disease in one shift, I should be worried. Not enough to conclude that something terrible is certainly happening, but enough to be concerned. This is what I experienced when I saw my first 3 Covid cases in one night in March of 2020, and I wasn’t even allowed to test them all—they hadn’t traveled to Asia (I knew by then that didn’t matter). My level of concern for Covid skyrocketed: the disease was probably more common than anyone realized and we had nowhere near enough tests. That night changed my life.

What does this all have to do with the Boeing 737 MAX? In October 2018, one of Boeing’s new 737 MAX 8 planes crashed in Indonesia, killing everyone on board. I remember being surprised but thinking it could just be bad luck (it was early in the plane’s rollout, and the 737 had a great safety record overall). When another 737 MAX 8 crashed in March of 2019, again killing everyone on board, I knew we had to assume that there was a gigantic and systemic problem. This is not because I’m an expert on planes. This is because I am an expert on risk assessment.

So, I looked up the numbers.

Data show that the commercial aviation industry loses around one jet plane to a catastrophic accident around every 6.5 million flights or so. When the first crash happened, the MAX had flown around 260,000 times (I’m estimating based on some published numbers; if I am wrong, someone tell me.) Given that, the odds of a catastrophic loss happening this soon were around 3.8%. Put another way, if Boeing rolled out 26 entirely new categories of plane like the MAX, one of them would have an accident this soon, just on bad luck alone. So, while alarming, the crash was not a statistical impossibility.

By the time the 2nd crash happened, around 400,000 MAX flights had been completed. The odds of two crashes occurring in the first 400,000 flights was far far lower—0.18%. If Boeing rolled out 561 entirely new categories of plane, only one would be statistically “expected” to have two crashes that soon based on bad luck alone. Here’s a stat: out of the 387 new 737 MAX jets that had been delivered to commercial operators by March 2019, two had already been destroyed after plunging into the Earth. This was a statistical bombshell.

Based on the second anecdote—two fatal crashes in such a short time—the 737 MAX was grounded worldwide. The underlying problem was quickly identified and fixed. There hasn’t been a fatal incident since (including the window accident on Alaska Airlines earlier this month.)

Conclusion: When you ask doctors about what they’re seeing on the ground, think about how common the condition is and how many cases the doctor is likely to have seen themselves (and therefore, how prone to randomness their views are). Also think about what outcome you’re asking about. Sudden changes in disease severity and death rates may be hard to notice, while sudden fluxes in hospitalization rates may be more reliable (albeit, as we are seeing with Covid, it can depend on whether universal testing is in place). Also, be aware that we doctors are apt to conflate our personal clinical experience with what we’re seeing in our emails and hearing from our colleagues. Still, anecdotes can be meaningful. The problem is that by the time we have the data to confirm anecdotes early in a crisis, things might have already spiraled out of control.

Thanks to Benjy Renton for some data support on this post and to Dartmouth’s Anne Sosin for a discussion and shared insights which informed this post.

Questions? Comments? Let’s talk!

22 Comments

Jeoffry Gordon, MD, MPH

Jeoffry’s Substack

Jan 14

Good discussion, wrong conclusion. I was a family doc in (antique) solo practice in an urban area for 35 years and I have an MPH and good knowledge of epidemiology. I always thought my random biopsy of life (smaller than an ER's) would give me good insight if I paid attention. Some years I was the first in the County to report a flu case; I reported and closed a restaurant causing an infectious diarrhea out break; I stopped using the Dalkon shield long before it was taken off the market; after a case I fought and persuaded our DPH to establish a TBC control program; I have seen tertiary syphilis, and back in the day before there were any tests I was dxing HIV.

That said, we live in a third world country: the CDC should be running population surveys of COVID incidence, so docs (and people at risk) should not be guessing or using intuition about actual risk and we would have valid information about when to maskup, etc.

Your fine essay lets responsible officials and politicians off the hook.

Expand full comment

4 replies by Jeremy Faust, MD and others

Heather Allen

Jan 15

As a hospitalist, I’ve always thought I might have a better chance at answering this question - at least with rates. When half of my list was COVID during 2020 and again at Delta, that was one thing. Then I didn’t see COVID for over a year- and then now I have 2-3 cases each day on my service. Suggests to me there’s either 1) more infections or 2) increased severity. I strongly suspect it’s number 1 based on how many people were sick over the past month. But anecdotally the COVID ones I’m seeing are the immune compromised and unvaccinated, and they do have severe disease. 🤷‍♀️

20 more comments...

Inside Medicine

One question frontline doctors can’t answer.