This post will take a different tack. Rather than criticize the Severity Principle, I will attempt to patch it up. But as we try to fix problems with SEV, we’ll run up against Cox’s Theorem:
A measure of evidence like SEV will either be equivalent to using probabilities or have serious problems.
The mathematics of Cox’s Theorem isn’t in doubt, but it’s not clear the conditions of the theorem apply to SEV. So this makes for an interesting struggle between two philosophies put to mathematics.
If Cox’s Theorem does apply, then as Frequentists to patch up their methods, they should move closer to posterior probabilities. Each new proposal should produce new problems, which in turn require new fixes. And with each iteration Frequentist methods get closer to being probabilities for hypothesis.
SEV is not the first iteration. SEV was created to fix previous iterations like p-values and is much closer to posterior probabilities as a consequence. Specifically we know that:
Which brings me to the first fix. Suppose we have two data points from an IID Cauchy distribution with pdf:
The values realized when are and . Given this data what can we say about ? Care must be taken because there are no sufficient statistics for the Cauchy distribution. No matter what test statistic we use there will be information in the data, not included in T, which may be relevant to . Therefore if we don’t think carefully about how to combine results from different tests, there will always be data which makes the results from a single test look absurd.
So consider the following three test statistics:
All three have exactly the same sampling distribution and none of their values is equivalent to knowing the full data . With these we get,
Let’s interpret these results according the Severity Principle (page 162):,
Severity Principle (full). Data x0 (produced by process G) provides good evidence for hypothesis H (just) to the extent that test T severely passes H with x0.
So depending on the test, we can support or or neither. If you performed all three you wouldn’t know what to think, but if you performed plus one of the others you’d wrongly say there’s strong evidence for either or .
If you performed and you’re really confused, because they directly contradict each other, but they’re perfectly symmetrical problems. Anything claimed about one test applies with equal force to the other. There’s no difference you could use to decide which is right.
To make the problem more acute suppose someone looked at two tests while someone else used . Since can be used to calculate , they should arrive at consistent conclusions, but one is incomprehensible and the other is ambiguous. Even worse, the less complete data , is the one that gives better results.
As a way out, let’s introduce a notion of “informational content”. Instead of interpreting SEV as above substitute:
Partial information is supportive evidence for H just to the extent that if we only knew then H passes a severe test based only on that information.
In this way those Severity results above aren’t in contradiction. They’re evaluating H using different states of information, so it’s no big surprise they’re sometimes inconsistent. If all you knew was then is a reasonable inference to draw.
In this way we should replace with and understand that any conclusions draw from it are actually conditional on as though were unknown. We are still left with the problem of how to combine test results, which is essential in this case because no single test static uses everything of value in . But it clears some things up.
This brings SEV one step closer to and immediately drives us towards the question “how to exploit all the information in the the data” since you can’t easily compare results based on different information. This gives Bayesians home field advantage, but I don’t see any way around that.
If Error Statisticians find this acceptable, then maybe it’s also acceptable to introduce the following principle?
Two different legitimate Severity analyses, based on the same information, shouldn’t contradict each other.
Maybe that’s the subject of another post.