How do you assess whether a probabilistic forecast was successful? Put aside the question of sequential forecasts updated over time. That’s a puzzle in itself but on Monday night each forecaster will have its final probability estimate and there remains the question of deciding, on Wednesday morning, which one was “right.”
Give no credibility to pronouncements by, say 538, that they correctly forecasted X out of 50 states. According to 538’s own model these are not independent events. Indeed the distinctive feature of 538’s election model is that the statewide errors are highly correlated. That’s why they are putting Trump’s chances at 35% as of today when a forecast based on independence would put that probability closer to 1% based on the large number of states where Clinton has a significant (marginal) probability of winning.
So for 538 especially (but really for all the forecasters that assume even moderate correlation) Tuesday’s election is one data point. If I tell you the chance of a coin coming up Armageddon Tails is 35%, you toss it once and it comes up Tails you certainly have not proven me right.
The best we can do is set up a horserace among the many forecasters. The question is how do you decide which forecaster was “more right” based on Tuesday’s outcome? Of course if Trump wins then 538 was more right than every other forecaster but we do have more to go on than just the binary outcome.
Each forecaster’s model defines a probability distribution over electoral maps. Indeed they produce their estimates by simulating their models to generate that distribution and then just count the fraction of maps that come out with an Electoral win for Trump. The outcome on Tuesday will be a map. And we can ask based on that map who was more right.
What yardstick should be used? I propose maximum likelihood. Each forecaster on Monday night should publish their final forecasted distribution of maps. Then on Wednesday morning we ask which forecaster assigned the highest probability to the realized map.
That’s not the only way to do it of course, but (if you are listening 538, etc) whatever criterion they are going to use to decide whether their model was a success they should announce it in advance.
4 comments
Comments feed for this article
November 6, 2016 at 4:18 pm
ZC
The problem is that state outcomes are highly correlated–after all, most news that pushes voters in the last days of the election is going to be broad news, e.g. the latest email scandal, that pushes voters in the same direction, rather than, say, news about the candidates’ economic policy stances, which might be differential across states. So a comparison like this probably isn’t *that* much better than comparing nationwide predictions. Andrew Gelman makes a good point along these lines in his blog post from today.
November 7, 2016 at 10:58 am
Anonymous
Just using binary outcome from each state is throwing out a lot of information. How about asking forecasters for vote shares in each state and using Euclidean distance to vote-share outcome? This doesn’t test whether the model of variance is correct at all, but I think that testing for the correct variance with one run is quixotic. And for vote-share the incentives are correct. For the proposed likelihood competition, if I know your model I can easily beat you .999 of the time. (Though this effect isn’t as bad with more than two competitors.)
Interestingly, with 538 sort of the acknowledged leader right now, everyone’s incentive is to act more confident than 538 about Blue; they are then favorites to “beat” 538. Incidentally, if Blue wins by much more than projected, will people remember that 538 was predicting high variance, with regular reminders that variance is symmetric? I think 538’s reputation would, unfairly, go down because the headline number is the win probability, and having a very unconfident-looking number will not look good if there is a landslide.
November 8, 2016 at 1:23 pm
brunosalcedo
I have a more basic question. How can there be so much uncertainty about the outcome so close to the election?
Sure, there are correlated shocks and what not, but I doubt there are many people still making up their mind about who to vote for or whether to vote. And, if there where, then the question is how is it that so many people are drawn so close to being indifferent?
With so many polls around, I find it shocking that the best we can get is a 2:1 odds ratio.
November 8, 2016 at 1:23 pm
brunosalcedo
were*