The Amelioration of Uncertainty

## Randomization Doesn’t Solve the Problem

Imagine a medical researcher is conducting a drug trial on 1000 people and wishes to compare it to a placebo. The researcher randomly assigns 500 to the placebo group and the rest to the treatment group. The hope is that if there is some unknown variable influencing the efficacy of the drug, then it will evenly split between the two groups, allowing the statistician to say if the drug worked. Unfortunately, the belief that randomization removes the problem of unknown factors or influences is false.

To see the difficulty one only has to ask: how many unknown variables are there? Or to paraphrase, how many things can you measure about the human body? Thinking classically for convenience, one could measure the amount of Uranium-238 present in some portion of the body:

Or measure the total momentum in the same portion:

Since the sum is over an arbitrary subset of atoms in the body then there are at least as many factors as there atoms in a typical person, or about . One could add many insights to this estimate, but any more careful or realistic count will only increase this number. So without getting bogged down in unrelated subtleties, I’ll assume there are independent variables which can be measured in the typical human.

The upshot of a number like is that after randomization an astronomical number of factors will be systematically different between the treatment and control groups. In fact, this is true no matter what procedure is use to divide them, random or not.

One objection is that researchers are only interested in “nice” variables like total momentum of the whole body. A factor like the momentum of a single protein in a person’s brain is too messy to be of concern. But there is a selection bias here. The total momentum of a person is easy to know, while the momentum of a single protein in living tissue is usually impractical to measure. Maybe lots of variables are important but we don’t know that because we only ever see the ones measured. Certainly when we can measure “weird” variables, like the amount of Uranium-238 in a tiny portion of the brain, they sometimes turn out to be highly important medically.

A related but more subtle objection is that only “relevant” variables are important. There may be factors but only five that actually make a difference. In that case randomization will likely split the population into two groups that are balanced on those five.

Amazingly this kind of reduction does happen sometimes. You don’t need, for example, to take measurements to know if someone’s pregnant. It only takes a few. (This is probably not an accident by the way. Reproduction is a repeatable and reliable process. To be so, it has to be highly robust, which is another way of saying “unaffected by the value of most factors”).

But to simply assume this reduction for the drug trial is a huge stretch given the astronomical number of variables and our ignorance about almost all of them.  About the only time you can make this assumption is when you already understand quite a bit about the underlying mechanism involved.

Unfortunately though, randomized trials are often used in the life and social sciences precisely when there is little or no knowledge of the underlying mechanism and the danger is greatest.

Moreover, there is growing evidence this isn’t just a theoretical problem. The error rate among peer reviewed statistical studies in the life and social sciences seems so great as to make a layman doubt any effect that isn’t large and obvious.

So why do statistics and trials at all? Well if your goal is to determine the efficacy of a specific drug you might get lucky and have only a few hidden relevant factors, in which case the conclusion will be valid.  However, if you’re unlucky there will be a wealth of unknown relevant factors, which will have different values when the trial is repeated, and you’ll get your conclusions overturned.

On the other hand, if your goal is to find and understand underlying mechanisms, then things aren’t so bad. If a result is later contradicted, then that implies there were some undiscovered relevant factors that differed between the two populations. Congratulations: you’ve just found a clue which may illuminate the underlying mechanism.

In this view, randomization is good if it helps reveal unexpected relevant variables. How good a job it does at this in practice I’ll leave for you to decide.

September 19, 2011
• September 23, 2011Jan Galkowski

Random sampling assumes there’s a certain continuity in the sample space, that one sample in some sense is similar in characteristics as another. Continuity here is not a formal mathematical notion, but is more like stationarity. Rather than being seen as sampling from an easily visualized space, like a surface, it may be useful to imagine sampling from nodes on a network, where the local topology and connectivity strongly determines local properties. It’s not at all clear that the nodes across in such a network gives any kind of sensible sampling frame.

I think another way to think about this is that if across a population it makes easy sense to think of interpolating across individuals, or there are individuals “between” others, then random sampling may be informative.

I’m not saying this is rigorous or definitive, just that it may help illustrate some of the limitaions.

• November 3, 2011Corey

I really enjoy your blog. Your posts have been, almost without exception, insightful and entertaining.

Alas, this post is the exception I’m alluding to. As shown in Chapter 7 of Gelman’s text Bayesian Data Analysis 2nd ed., if the assignment mechanism is ignorable ( http://en.wikipedia.org/wiki/Ignorability ) then the unobserved factors cancel out of the posterior distribution. Bada bing bada boom.

• November 8, 2011Joseph

Corey,

Thanks for the comment. I knew there were 6 readers of this blog from the site statistics, but only 5 comments so far. I’d been wondering who the other person was.

Whether an assignment mechanism is ignorable is not a hard fact. It depends on the probability model chosen (see the top of page 204 in Gelman’s book) and is subject to all the errors, assumptions, and differing priors that selecting a probability model always comes with. In practice, confidence in the model is limited unless you already know a great deal about the underling mechanism you’re trying to observe, which agrees with what I said in the post.

Incidentally, when building a model of unobserved factors as in Gelman’s chapter 7, how do you know which unobserved factors to include? In principle, shouldn’t every unobserved factor in the universe be included? Thinking carefully about these two questions should highlight the absurdity of using this theorem to guarantee, as an experimental fact, that you can safely ignore everything in the universe that wasn’t measured in the experiment.

Before going too far down this rabbit hole though, it’s important to remain grounded in reality. Remember that after an assignment is made, the two populations either differ in important ways or they don’t. This is true independent of any probability theorems regarding the assignment mechanism. Furthermore, the only way you can know for certain whether they do differ in a given factor is to observe that factor. Again, there is no theorem of probability in Gelman’s book or elsewhere that denies this.

One thing that would shift my thinking is a proof that the assignment mechanism is ignorable when you included factors such as the “the amount of Uranium in a given subset of atoms in the human body”. There are as many such factors as there are subsets of atoms, or about . I look forward to the demonstration, ending either in the traditional “Q.E.D” or the newer “Bada Bing Bada Boom”.

• November 15, 2011JF

Interesting blog (what I’ve seen so far). Isn’t the justification for randomization that the omitted variables are both numerous and very small? (An assumption, to be sure, but at the level of individual particle momentum almost certainly true). So, by the Law of Large Nmbers, 10^27 tiny factors become a random normal with mean m absorbed into teh constant.

• December 11, 2011ezra abrams

isn’t the history of drug trials empirical data that can confirm or disprove your idea ?
YOu also make the assumption that the reason the trial is conducted is to find out if a drug “works”.
But, of course the reason the trial is conducted is to improve the ROI of the corporate sponser, or the Intellectual Standing of the non profit researcher.

PS you might attract more people to your site if you used font colors for the side stuff that have contrast with the background; there are (soso) free tools you can download as firefox add ins that will do contrast for you

• December 13, 2011Joseph

JF,

Here is how Random Experiments are justified in Gelman & Hill (page 173):

“…we can think of the control group as a group of units that could just as well have ended up in the treatment group, they just happened not to get the treatment. Therefore, on average, their outcomes represent what might have happened to the control group had they been treated.”

No mention of “numerous and very small” variables. Moreover, momentum is about the unluckiest example you could have chosen, because Physicists know far too much about it to be fooled by Statisticians. This example would make a good blog post about the folly of confusing theorems in probability with laws of nature.

Without going into all the details, the Law of Large Numbers definitely does not apply to momentum. The reason is that the laws of physics introduce correlations between different particle momentums that negate the Law of Large Numbers. Even when particle momentums are distributed like a multivariate normal (called a “Maxwell-Boltzmann distribution” in statistical mechanics) it has nothing to do with the Law of Large Numbers, and is the result of the Kinetic Energy being a quadratic function of the momentum and the neglect of all particle interactions.

Ezra,

The history of drug testing confirms my hypothesis in spectacular fashion. It is a rule of thumb among pharmaceutical companies that at least 50% of published academic peer reviewed drug trials cannot be replicated. This folklore was recently put to the test and researchers found that 65% could not be replicated (see here).

I reiterate that this isn’t just a theoretical possibility which rarely happens in practice. I believe it happens quite a bit in the life and social sciences.

• February 9, 2012fred

Well-run randomized trials produce unbiased estimates of causal effects. They do this by achieving independence between treatment assignment and absolutely all other factors — all 10^27 of them, if you like. And with a decent sample size the estimate is precise enough to reliably distinguish the treatment effect from the uncontrolled variability that’s due to the other factors… in other words, we get a small confidence interval. Exact P-values can be obtained using permutation, without any use of Laws of Large Numbers being used; see e.g. http://pluto.mscc.huji.ac.il/~mszucker/DESIGN/perm.pdf. Your critique seems a bit wayward.

Note that none of this rules out the play of chance in any particular trial; we can never rule out Type I or Type II errors, but then no decent statistician would claim otherwise. The literature you seem to be alluding to in your comments (Ioannidis etc) about error rates is critical of i) trials that are not randomized ii) investigators not doing honest analyses, e.g. data dredging iii) trial populations not reflecting populations in which treatments are actually used. While some of these are difficult to avoid in practice, none on them trouble the basic validity of well-run randomized trials.

A different problem is researchers who don’t take into account the fact that a positive/negative result may be a Type I/Type II error – but that’s just basic misunderstanding of how science works.

• February 11, 2012Joseph

The crux of the problem is your sentence:

“They do this by achieving independence between treatment assignment and absolutely all other factors”

There’s no way we could ever know, as a physical fact, that it’s actually true. You’re basically saying everything else in the universe is balanced between the two groups. We don’t even know all the factors out there. Hell we may not even know a fraction of them. And we only ever measure a tiny fraction of the ones we do know about. There isn’t even a reason to suspect this is has ever actually happened.

I suspect though, that there’s no persuading you on this. So how about a different more constructive tact. Would the following change your mind about anything?

Suppose there was a large collection of factors. Label this set of factors F. For simplicity assume each f in F takes on binary values: f=1 or 0 for each patient. And further suppose this set has the following property:

For any division of the 1000 test subjects into two groups of 500, there will be a factor f in F, such that f=1 for all patients in group A and f=0 for all patients in group B.

If such a set F existed, then no matter what randomization method you use, there will always be a factor which is perfectly correlated with the treatment. So there’s no way to tell from the experiment if any group difference is due to the treatment or f.

Could such an F exist? Well, there’s no proof in any Statistics book ever written to suggest it can’t happen, or even that it happens rarely. But consider this: the number of ways to divide the 1000 patients into two groups is going to be less than the number of subsets of 1000, or less than . On the other hand there are about “uranium subset” factors that were mentioned in the response to Corey above.

So given the enormous disparity between these two numbers, what makes you so sure such F don’t exist?

By the way, I was under the impression that the 65% of peer reviewed drugs trials from academic laboratories that couldn’t be reproduced were randomized trials (see the link in response to Ezra above). Is that not true?

• March 13, 2012Corey

You wrote, “Whether an assignment mechanism is ignorable is not a hard fact… confidence in the probability model is limited…” The whole point of randomization is to take total control of the assignment mechanism and force probabilistic independence (and hence ignorability) by fiat.

For the factors argument to go through, you need to argue strongly that factors on the scale of “uranium subsets” can have large effects in practical terms. But living organisms are, to a first approximation, homeostatic machines that by construction (that is, through the winnowing of evolution) are extremely robust to variations on that scale.

• March 13, 2012Joseph

Corey,

I make the same point as your second paragraph in the post and discuss it at length. I won’t repeat it but just say that I don’t require them to have large effects for the point I was making (which you are likely misreading). Althought I’m glad to hear that my body is extremely robust to the mount of radioactive material in say, my thyroid.

As to the first point: the fact that we don’t include every unmeasured factor in the Universe in the missing data matrix is already a massive modeling assumption.

But what I don’t understand from yours and others reactions to this post is that you actually seem to believe that we know every factor in the Universe really is balanced between the treatment group and control group. Every factor among the almost infinite unknowns of our surroundings, regardless of whether it was ever measured or even known about. This massive set of physical facts wasn’t learned from taking measurements or examining the real world, but from using a random number generator while sitting at your desk. Typically an entirely deterministic random number generator at that.

I really had no idea people thought like this and am more than a little shocked. All I can do is repeat the following:

(1) Once an assignment to the treatment and control group has been made it no longer matters what properties the assignment mechanism had. The only thing that matters is the physical properties of the actual treatment and control groups.

(2). The only way for us to know for certain whether a factor is correlated with the actual groups chosen is to measure that factor.

(3). Although the chance of a given factor being correlated is exceedingly small, there are so many factors out there that the chance of SOME factor being correlated is near certainty.

(4). The only chance we have of getting a reproducible result is if the number of “relevant” factors is small. That way there is a possibility that the relevant factors are balanced between the two groups and only irrelevant factors are correlated with them.

(5). If a result turns out to not be reproducible, which happens a surprising amount of the time in the life and social sciences, then one of the unknown factors correlated with the treatment was actually relevant.

(6). Randomization can be used in this way to find new relevant factors or to confirm that we know enough of the relevant factors. But that’s all it can do, and it’s probably not the best way to do even that.

• March 14, 2012Joseph

Corey,

Incidentally suppose a researcher uses a random number generator to “achieve ignorability by fiat” as you put it. The researcher conducts the trial and all is well. After the fact you find out that the researcher used his birth year as the seed for the random number generator and he knew how the entirely deterministic “random” number generator worked from a math course he took in college.

Is the assignment mechanism still ignorable? The researcher knew from the seed and the generator exactly who was going to be in each group. It isn’t random any more.

What if he used his birth year, but didn’t know how the entirely deterministic random number generator worked?

What if the seed was picked using a roulette wheel and it just happened to be the researcher’s birth year? Is the assignment mechanism ignorable now?

If you gave different answers to these questions, just remember that the actual physical experiment performed is exactly the same in all three cases.

Ignorability is not a hard fact of the treatment and control groups. The only hard facts you know about them are the ones you’ve measured. And yet you’re trying to use ignorability to claim as a hard fact that that every relevant factor in the entire Universe really is balanced between the two groups.

• March 16, 2012Corey

Joseph,

Lots and lots of things to talk about. I’m a bit of a slow writer — I’d rather voice-chat. Email me if you’re interested.

It seems to me that that you and I agree more than disagree about many things, but naturally, it’s our disagreements that attract our attention. Our backgrounds have a lot of overlap when it comes to statistics — most of the difference is that you have expertise in economics and physics that I lack (although I recently became deeply interested in finance+economics and I may catch up) and I have expertise in biology that you lack.

And, uh, don’t go and eat radioactive iodine or something just ’cause I said that living organisms are extremely robust to variations on that scale of uranium subsets — that would be foolish.

• April 10, 2012Erin Jonaitis

Interesting post. Calls to mind that famous Churchill quote about democracy…