Pr(A|B) = PR(B|A)Pr(A) / [ Pr(B|A)Pr(A) + Pr(B|notA)Pr(notA)]

The alternative hypothesis that Jeff mentions is Pr(B|notA).

We’re told from the setup of the problem that Pr(B|A) is low. If it turns out that all 435 members of congress are American and no member of congress is not an American, then Pr(B|notA)=0 and so Pr(A|B)=1.0.

But suppose that some members of congress aren’t Americans. In the abstract, suppose that no member of congress is American (some people argue none of them really are). Then surely Pr(B|A)=0 and Pr(B|notA)=1, and so Pr(A|B)=0. Of course, just because no member of congress is American doesn’t mean one couldn’t be. But that aside, the interesting result happens here that Pr(A|B) just ends up being the share of congress members that are American. That’s entirely intuitive, explaining the kneejerk reaction to the syllogism.

On a more esoteric note, responding to Simon, the syllogism is relevant to hypothesis testing. Of course the syllogism holds under standard Classical (with a big C) hypothesis testing, which ignores the likelihood of the data under alternative hypotheses. This is why some folks advocate for Bayesian hypothesis testing instead, which gets you to Pr(A|B), just as Simon points out.

]]>I think the point is this. If I write down a statistical model with a null hypothesis and I know the distribution of data generated by the null, then I will reject the null if according to that distribution the sample I observed has low probability. That is problematic since as you point out it does not follow from the laws of conditional probability that the null hypothesis is (likely to be) invalid. It could be that the sample that has even lower probability under the alternative hypothesis.

Obviously the syllogism is logically flawed, but it represents an example of a deduction that hypothesis testing would lead you to.

]]>