Erik van Zwet on arg min

35 Comments

⭠ Return to thread

Erik van Zwet

Jan 8

Ben: Your criticism missed the mark by a mile.

> What happened to the remaining 11,285? No one knows.

Not true. Have a look at the online supplement of van Zwet, Schwab and Senn:

set.seed(123) # for reproducibility

d=read.csv("CochraneEffects.csv")

d=d %>% filter(RCT=="yes")

d=d[d$outcome.group=="efficacy" & d$outcome.nr==1 & abs(d$z)<20,]

d=group_by(d,study.id.sha1) %>% sample_n(size=1) # select single outcome per study

> Let me reiterate: this paper about reproducibility is itself unreproducible.

Not true. We have made the data that we used available, and you may check them against the publicly available Cochrane Database of Systematic Reviews (CDSR). All the papers in the series have an online supplement with code, so everything is fully reproducible.

> These are not the reported z-statistics, but rather the derived z-statistics by a method of other authors.

Not true. The data were compiled by Simon Schwab who is an author of both papers. For RCTs with a numerical outcome, we have the means, standard deviations and sample sizes of both groups. For RCTs with a binary outcome, we have the 2x2 table. From these, we computed the estimated effects (difference of means or log odds ratio) together with their standard errors in the usual manner. From these, we computed the z-values. There is really no basis for accusing us of trying to deceive.

> Why is it plausible that a z-score in a clinical trial is a sample from a mixture of Gaussians? It’s not. It’s ridiculous.

Just think of it as a flexible density estimator which has some convenient mathematical properties.

> no one has any idea what the content or value of all of this data is.

The CDSR is a very well known resource that is carefully curated. It's not perfect, but it's also not some random scrape.

Expand full comment

Reply (1)

Erik van Zwet

Jan 8

Aplogies - I misremembered. The NEJM Evidence paper is has an online supplement with the code, but the earlier paper with Schwab and Senn does not. However, I've just provided the code for the selction of those 23,551 trials.

Expand full comment

Reply (1)

Ben Recht

Jan 9Author

But this is my point, Erik. I know that many people are concerned with the replication crisis, but if we want to write papers critical of others replicability, then we need to hold ourselves to even higher standards of replicability. It's not great that these exclusion criteria were not in the original paper. And, moreover, *why* are you using these criteria? For instance, why exclude trials where the z-score is larger than 20?

Expand full comment

Reply (1)

Erik van Zwet

Jan 9

You're barking up the wrong tree, Ben. Our paper is not critical of other's replicability. Quite the opposite. We're simply pointing out that in a field where the signal-to-noise ratio tends to be low, people expect too much from p<0.05. It's not so unexpected when a study with p=0.01 does not "replicate" (get p<0.05 again.)

We're also not critical of the signal-to-noise ratio being low. In the paper with Schwab and Senn, we write: "The fact that achieved power is typically low does not imply that the usual sample size calculations aiming for 80% or 90% power are wrong. The goal of such calculations is to guarantee high power against a particular alternative that is considered to be of clinical interest. The fact that high power is often not achieved is merely an indication that treatments often do not provide the benefit that was hoped for."

The criteria for selecting the data are an atttempt to get the primary efficacy outcome, and to ensure that each trial occurs only once in our dataset. The selection for |z|<20 is because such large z-values are extremely unlikely for trials that aim to test if the effect is zero. It also wouldn't have made if big difference if we kept them in because our mixture model has one component with very large variance that can accomodate large z-values.

Expand full comment