## A Challenge for Frequentists

Here is a simple challenge for the Frequentists out there. How do you include information about the range of a mean when estimating it from a series of measurements?

For some background consider the following from the renowned Philosopher of Frequentist Statistics, Dr. Mayo:

If there is genuine knowledge, say, about the range of a parameter, then that would seem to be something to be taken care of in a proper model of the case—at least for a non-Bayesian.

The good Professor elaborated on her blog:

However, quite a lot of background information goes into designing, carrying out, and analyzing inquiries into hypotheses regarded as correct or incorrect. For a Frequentist, that is where background knowledge enters. There is no reason to suppose that the background required in order sensibly to generate, interpret, and draw inferences about H should—or even can—enter through prior probabilities for H itself!

Ok, suppose a person wishes to weight themselves with a scale that gives errors and the errors are related to the measurements by

(1)

The estimator will be distributed with mean and standard deviation . Given a sample mean 201.3 (simulated data given at the end) a Frequentist would claim

I constructed an interval . If we repeated these measurements over and over again then a similarly constructed interval would contain the correct weight 95% of the time.

A Bayesian using a uniform prior would give the same numerical answer but worded differently:

Odds are 19-to-1 that the true weight is in the interval .

Now suppose that the individual walked over a bridge which could only hold 205 lbs and the bridge held their weight. Further suppose the individual walked over a platform rigged to buzz if the weight is less than 195lbs and there is no buzz. Then we know as a fact that .

How would a Frequentist include this information in the analysis?

The information is not frequency related so it can’t be included as a distribution on . Presumably the Frequentist would follow Dr. Mayo’s stringent advice and change the model (1) above, but how? The model is still valid; there’s nothing wrong that needs fixing since the sampling distribution comes from the weighing device and not our prior range for .

Would the Frequentist simply restrict everything to the interval (195,205) to get ? What happens when the Confidence Interval falls outside the known range as would happen some percentage of the time?

I don’t know how a Frequentist would include this information in the model. I do know how a Bayesian would though. With a uniform prior over the interval (195,205) the Bayesian would report

Odds are 19-to-1 that the true weight is in the interval .

Using both the prior and data delivers better results than any other method previously mentioned. I amusingly await to see how a Frequentist contorts the model to do the same. Just remember though that this is the simplest application of statistics to science; it all gets much more complicated from here.

**Update**

To make the issue clearer: the Frequentist could legitimately create the interval and claim it is a 14% CI. There’s nothing stopping them from doing so. Yet we know from the prior information that this actually has 0% coverage. To avoid absurdities like this the Frequentist must include the prior information. But how? the Frequentist only uses the sampling distribution of the errors, which is totally unaffected by the prior information!

**Appendix**

The actual data points for given by R’s rnorm function were:

203.1593, 206.4241, 203.0091, 187.6194, 193.0292, 205.8849, 189.6029, 212.8243, 218.2531, 192.8713

March 1, 2012Zack

link

I have not thought about how I would answer this as a Frequentist, but the Baysian result given above is clearly wrong. The error in the scale is unrelated to the (assumed perfect) limits above. The mistake Baysians make is they shrink their error bars based on superfluous information. The error in the scale is based on the scale’s own probability of giving different values. I would expect 5% of the time for the scale to read < 195 lbs independent of the "true" value (200) – otherwise you have made a mistake in estimating the error of the scale.

There is a place for incorporating prior information into a measurement, but in real life, it is the non-statistically distributed systematic errors that tend to be important.

March 1, 2012Joseph

link • author

Zack,

I’m not sure what “clearly wrong” means since the Bayesian interval contains

the true value and is smaller than all the others.

Nor do I have any idea what “superfluous” means. If the prior information

had been that mu is in the interval would you still consider that superfluous?

Moreover there is overwhelming literature showing that including such prior information can improve estimates and forecasts. Especially when there is little data.

But you do bring up an interesting point. What would have happened if there

was a systematic error in the measuring device? Suppose the weighing

machine was always adding 6 lbs. Then in this case, adding 6 to each data

point would yield the following new intervals:

Frequentist 95% CI:

Bayesian 95%:

Remember that we know values greater than 205 are impossible and the true

value is 200. So which interval looks “clearly wrong” to you now?

The point estimates are even more striking since the Frequentist would give an impossible value while the Bayesian point estimate is only off by a couple of pounds.

The Truth contained in the prior is offsetting some of the Falsity in the error model. This would seem to be an important safety feature and is yet another strong reason to use these kinds of priors.

March 15, 2012Corey

link

“Presumably the Frequentist would follow Dr. Mayo’s stringent advice and change the model (1) above, but how? …I don’t know how a Frequentist would include this information in the model.”

Mayo got it wrong — the information doesn’t go into the model. It goes into the function that maps “random” data to an interval in parameter space. See Feldman and Cousins, A Unified Approach to the Classical Statistical Analysis of Small Signals.

Money quote: “In this paper, we use the freedom inherent in Neyman’s construction in a novel way to obtain a unified set of classical confidence intervals for setting upper limits and quoting two-sided confidence intervals. The new element is a particular choice of ordering, based on likelihood ratios, which we substitute for more common choices in Neyman’s construction. We then obtain confidence intervals which are never unphysical or empty.”

So confidence intervals don’t have a problem with incorporating boundaries in parameter space (at least the one-sided boundaries that Feldman and Cousins consider; I haven’t worked through the math, so I don’t know about your finite-interval-constrained parameter space). For me, the issue with confidence intervals is that any given confidence interval procedure is either equivalent to some Bayesian interval or it’s the right answer to the wrong question — or both, if it’s equivalent to a Bayesian interval derived from a wacky prior.

March 15, 2012Corey

link

Actually, a quick look at Figures 1 and 10 of the paper is sufficient to see that the Feldman and Cousins construction will work just fine in your finite-interval-constrained parameter space.

March 23, 2012Joseph

link • author

Corey,

I believe you’re right that a Frequentist can’t do this by modifying the model. I posted this out of frustration with Dr. Mayo, who is by all accounts the most prominent philosopher of Frequentist statistics. She was trying to convince practicing statisticians to forget their ruinous priors and modify the model instead, despite evidently never having tried it herself.

No doubt it’s possible to patch this up with some ad-hoc procedure and there’s probably more than one such procedure in the literature. I suspect though that your comment about confidence intervals holds for these new modifications: namely they are either equivalent to the Bayesian result or answering the wrong question.

It’s hard to tell from the paper which is true for their results because they apply it to the Poisson and Normal distributions. These are known to be cases in which Frequentist results most closely resemble Bayesians ones. It would be nice to see how their procedure, which relies heavily on the likelihood ratio, performs on the Cauchy distribution, which doesn’t have a monotone likelihood ratio or sufficient statistics.

I did however test the method in the paper out for this problem. For the original problem in the post:

Frequentist:

Bayesian:

Improved Frequentist:

It’s amusing that the improved Frequentist answer is practically equivalent to dropping the data and only reporting the prior ()!

For the cases were there is a systematic error in the likelihood, which Zack mentioned in the first comment was more important in practice, we get:

Frequentist:

Bayesian:

Improved Frequentist:

You would have been better off simply restricting the first Frequentist answer to the prior interval. It’s both computationally simpler and gives a less misleading expression of where the true value of lies.

Also note that if your prior information doesn’t give a hard boundary between the allowed and forbidden regions for then a Bayesian could easily include this in the prior, but the improved Frequentist method wouldn’t work at all. I guess they’d have to find an ad-hoc modification to their ad-hoc procedure.

March 27, 2012Corey

link

“an ad-hoc modification to their ad-hoc procedure”

After reading that paragraph, I thought one up pretty quickly. The original ordering principle uses the likelihood ratio . Introduce a plausibility measure that takes an element of the parameter space as input; then use the weighted likelihood ratio in the ordering principle. I wonder how that would do in practice in comparison with the equivalent posterior credible intervals…

March 27, 2012Joseph

link • author

I like it!

Since you’re a fan of Cox’s Theorem, you probably know how all this is going to end. The “weighted likelihood ratio” is proportional to the posterior distribution. So if you keep finding problems with the ad-hoc frequentist methods and keep improving them, you just keep getting closer and closer to the Bayesian answer.

March 27, 2012Corey

link

I was playing around with it a bit this afternoon using a normal model with normal plausibility weight and I realized it has to be . This is because it’s an ordering principle for data values, not parameter values, so the bits of it that don’t depend on are irrelevant.

I have an abiding interest in confidence interval methods that approach some Bayesian result. This is because I want to have interval estimates that have good calibration in the sense of Gneiting et al., but also satisfy Jaynes’s desiderata. Calibration in this sense is essentially equivalent to correct confidence coverage and hence is the defining characteristic of frequentist confidence intervals; however, Bayesian intervals are not calibrated in general. There are some priors which induce automatic calibration — these are called probability matching priors.

April 17, 2012Zack

link

I just found this link again. I think with your update you understand the point I was trying to make. Let me rephrase it this way:

What you actually have are 2 independent measurements. One is from the scale, whose measurement cannot possibly depend on what assumptions you make (Bayesian, frequentist, demonic, whatever). The scale will read what the scale reads. That is it. That was the point I was making.

Your 2nd measurement is a mystical perfect device that limits the range of results with a hard edge. (There are generally hidden underlying assumptions about smoothness, etc. of the measurement space, but let’s ignore them for now.)

Your question is really: How do I combine 2 independent measurements of a phenomenon? The solution you came up with, which is how we do it in practice, is to adjust the confidence level of the range.

My other point (which related to your 95% C.L. argument elsewhere), is that there are no pure statistical systems in reality. There are always unknown systematic errors present. The only question is whether you believe they are under control – i.e., smaller than your known errors. In physics we tend to do experiments until those systematic errors dominate. We then quote “95% C.L.” limits, but we understand that those come with this caveat. It is also why particle physicists generally say they have not “discovered” something without at least a 5 standard deviation signal. We know from hard experience that even those signals can disappear due to the unknown systematics.

Honestly, I don’t know more than a handful of people who believe any statistical approach is really absolute. Bayesians have something to contribute, but logically, you are left with an inductive assumption that could simply be wrong. In which case you _think_ you have an improved result, but it is a mirage. You’ve merely shifted the problem to one of validating your assumptions: that is often difficult in practice.

Anyway, I enjoy your blog.

March 26, 2013Fran

link • my site

Yes, We Can!

simply use the PDF

Don’t shoot first and ask later cowboy

March 28, 2013Entsophy

link

Fran,

Some cowboys shouldn’t just shoot from the hip like this. The entire point was that a frequentists isn’t free to change the PDF at all. The PDF is modeling the data generation mechanism, which in this case is the errors given off by the measuring device. The updated information doesn’t affect the measuring device at all, for a Frequentist, the PDF can’t change either. If you think it should, then welcome to the world of Bayesian statistics.

March 28, 2013Fran

link • my site

It does not change the mechanism but it limits the possible solutions which means we can translate that scenario to one where the data has this limit and yes, we do this ALL the time, for example, if you use a Normal distribution to model people weights we know there cannot possibly be negative values and we change the PDF to account for that.

And if you tell me nobody can weight less that 20Kg then just the same, and if you tell me nobody can weight more than 600Kg then again, the same. Wether this limit belongs to inherent mechanism of the generation of data or it is a new information limiting the range of outcomes it is irrelevant.

Put it this way, you generate a random with U(0,1), then you tell me is is >= 0.5, and you are telling me I can’t do U(0.5,1) without being Bayesian? come on.

data is just a shadow casted by reality, my models only try to explain how can I better explain that shadow which means I am entirely entitled to update any mechanism as long as it better explains the shadows of reality.

March 28, 2013Joseph

link • author

Fran,

I discussed what your talking about in detail in the post. There is no need to repeat it here.

March 28, 2013Fran

link • my site

Joseph,

Oh, you discussed it, so I guess you word is final, call the press… haha, you Bayesians are funny

April 1, 2013Joseph

link • author

Fran,

First, I added “latexpage” in square brackets to the beginning of one of your previous posts so that the latex elements would appear as math instead of latex.

Second, I stand by my previous remarks which I will outline again here:

(1) With enough effort any Frequetist method can be improved so that it works well enough in practice. This example shouldn’t be an exception, but you simply haven’t done the work needed to do this. Unlike most die hard Bayesians, I’d gladly read and acknowledge your Frequentist solution should it be provided. This is an interesting case because it’s very simple, yet would require some new Frequentists principles to solve in practice (in my opinion). I’d love to see how much trouble Frequentists would go through to avoid the simple and effective Bayesian solution.

(2) A Frequentist can’t change the pdf as you suggest, or in any other way, because the pdf is for the errors (and the mechanism which generated them) which is completely unaffected by any restrictions on . Truncating the pdf for the errors because of restrictions on is NOT done all the time as you suggest because it makes no sense from a frequentist point of view. This is different from the Normal Distribution/weight example you site because the distribution of weights really should be truncated, but here the distribution of errors really shouldn’t be truncated.

(3) You could truncate the resulting Confidence Interval, which probably is done all the time, but in this case results in throwing away basically all the information in the measurements. An even bigger problem is that sometimes the Confidence Interval will lie entirely outside the known region for so no truncation is possible at all unless you’re willing to report that % of the time lies in the empty set.

April 2, 2013Fran

link • my site

Thanks for the reply Joseph (and sorry for the latex thing, I could not edit it nor I knew what the sintax for latex looks like in your blog)

(3) As you explain, this is wrong, but we should not blame the tool just because there are wrongdoers.

(2) Well, you see, this a thing about Bayesians; they tell Non-Bayesians how they should do things and, once they do so, next they explain why doing things how they think they should be done prove Non-Bayesians wrong.

Let’s see, it seems you are hooked in the “error mechanism” thing like if there was any **practical** difference between the mechanism setting the limits or new information setting the limits.

Well, first, the error mechanism that you provide in your example is a mere fantasy we use because it has convenient mathematical properties, **nothing more**. Nothing in this universe follows a Normal distribution and, certainly, the error in the weights does not follow it either.

So, for some reason, you believe Non-Bayesians are not allowed to change one convenient math fantasy for another more convenient math fantasy once we have information for doing so.

The alleged advantage of Bayesians for setting these limits is rooted in the limitations and problems you get into when trying to establish a prior without limits. So this is just one more way you Bayesians have to sell a flaw like if it was a feature.

(1) I gave you the even more simple example where you generate a number with a mechanism following a uniform distribution U(0,1), then you look at the number and give me the information that the number is equal or greater than 0.5, and I model the information I have about the outcome with a uniform distribution U(0.5,1).

This is the same case, you have a mechanism, you give me information, and I update my convenient mathematical **fantasy** accordingly.

April 2, 2013Joseph

link • author

Fran,

I’m not telling Frequentists how to do anything. The post is asking for a Frequentist solution which makes sense from a Frequentist point of view. There is no attempt to impose anything on Frequentists other than what they impose on themselves. Nor was I blaming anybody in point number (3), I was merely pointing out that from a Frequentist perspective such ad-hoc methods can’t be right fundamentally, however expedient they might sometimes be in practice.

There is a serious misunderstanding here which you are failing to get. If I know “A” is restricted to a given interval then it’s fine to restrict the support of P(A) to that same interval. Your uniform example is like that and the weight example given previously is like that. I have no objection to that whatsoever, but that’s not the what’s going on here.

Here we have two terms “A” and an unrelated “B”. We know that “A” is restricted to an interval and you’re suggesting we truncate P(B) to that same interval. For example if >0 then you’re saying P(error) should equal 0 for error <0. This makes no sense! and the “errors” are unrelated to each other both in principle and in physical reality. is the true length of the object while the errors come from the measuring device. Even if were known precisley the errors would still vary from measurement to measurement the same as if you were completely ignorant of ! If you really can’t see that these are two different things and knowledge of one doesn’t impose a truncation on the distribution of the other, then there’s simply no hope.

Finally, you spew forth another temper tantrum against Bayesians. Although you complain about the poor behavior of Bayesians, pretty much 90% of everything you say about Bayesians is bad faith accusations filled with rank inaccuracies and stupid ad hominem attacks. All I can do is set the facts straight:

First, the improper uniform distribution for the prior doesn’t cause any problems in this example. In fact, numerically it leads to exactly the Frequentist answer given in every elementary statistics text book.

Second, the reason for including such information in the prior is because imposing some common sense limits on parameters like this, in this way, can significantly improve estimates. There is a massive literature backing this claim up which anyone can find easily using google.

April 3, 2013Fran

link • my site

Joseph,

Aside that you should review what “Ad hominem” means and that when it comes to temper tantrums I’m not even close to Bayesians’… or yours, I find fascinating that you insist that “Frequenstists” can’t do this when “Frequentists” tell you they can. Could you elaborate on why this is not a “Frequentist” thing to do? You keep explaining why *you* think is wrong, but I’d like you to explain under what “Frequentist” law this can’t be done.

Let me put it in a different way, maybe this time you get it. Imagine we get data from the mechanism you describe (or even one a lot more complex) but that such mechanism is unknown to us yet, we know that data coming from that mechanism behaves like a normal distribution with its mu and sigma.

So since now we do not know the mechanism and data seems to behave like a simple Normal distribution now *you* would agree we can truncate it if new information comes along limiting its values, but, for some reason, once the mechanism is known, even this knowledge is irrelevant to the behavior of the data, it seems you freak out and now we cannot truncate the distribution anymore without being bad bad “Frequestists”. Now?

And about setting facts Straight:

First, the improper uniform distribution is not a probability distribution, that’s why you call it improper instead calling it bullshit, and because it is not a probability distribution it does not always yield proper probability distributions either (even if in your example it does).

Second, The whole prior concept itself is flawed which turns Bayesianism in a a giant with feet of clay; you constantly need to do mathematical pirouettes and philosophical stunts to give the impression this concept makes sense.

Peace,

April 3, 2013Joseph

link • author

Oh for the love of God, you really have no idea what you’re talking about.

There is no knowledge limiting the values of the errors anywhere in this problem! There is nothing limiting the actual errors given by the measuring device in any way or in any sense. The correct limiting frequency for the errors starts out Normal and never changes. Nothing in the problem changes the limiting frequencies of the errors even slightly, including if some “mechanism” is known or not, or whether we know something about . It’s not OK in the Frequentist world to use a distribution P(error) which you know radically differs from the actual limiting frequencies. No other Frequentist had the slightest trouble understanding this.

You mouthed off in your initial comment without doing your homework, said something really stupid, and have been trying to defend it thereafter despite the fact that it’s obvious nonsense.

Improper priors are limits of real priors. They are a convenience no different in principle than using continuous probability distributions as a limit of discrete distributions. In real problems we can always replace an improper prior with a real one if it causes any trouble, just like we can always replace a continuous distribution by a discrete one if needed. Indeed, we probably should just stick to real priors based on real prior information since, as stated before, it’s a known fact that they improve estimates and nowadays they present no computational difficulties.

The whole prior concept itself is simple, easy to use, and effective. Just because you don’t understand it doesn’t mean everyone else is as clueless. Frankly, I don’t believe you’ve spent more than 5 minutes studying Bayesian Statistics since you’re surprisingly ignorant of so many basic technical facts. Given that you don’t seem to be aware that Frequentists define probability distributions as their limiting frequencies, one has to wonder if you’ve studied any statistics at all.