Wednesday, November 9, 2016

How I Predicted The Election

Preface: I take no position on which candidate is best, or who deserved to win. This is purely objective.

Last Sunday I gave Trump a 71% chance of winning, with an expected electoral vote total of 301. He of course did win and if the current counts hold up, he'll get 304 electoral votes.

"The triumphant vindication of bold theories - are these not the pride and justification of our life's work?" - Sherlock Holmes

None of the famous pollsters or pundits predicted this. On the very weekend I made my prediction, the New York Times gave Trump a 15% chance of winning and the Huffington Post gave him a mere 2% chance. Nate Silver, who knows something about uncertainty, was giving him a 35% chance, and he got excoriated for it. The Huffington Post said Silver was "putting his thumb on the scales" by adjusting the raw poll numbers (Silver doesn't do polls himself; he aggregates other polls) and "making a mockery of the very forecasting industry he popularized."

The Huffington Post has a naive idea of how forecasts are done. They seem to think you just call a thousand people at random, record their answers, and publish the number. But that would lead to very bad predictions. You have to correct your sample to match population characteristics such as likelihood of voting, geography, party affiliation, and a bunch of other stuff. Then, you have to account for the fact that you can only do the poll ahead of the election, and things are always changing. If your poll showed 25%, 35%, 45% in the three weeks preceding an election, you'd be kind of stupid to just run with the 45%.  Finally, there are individual decisions as to whether to accept or reject a data point - did the person answer all the questions that were asked, did he sound like he was giving obviously misleading answers, did he tell the truth about his age, party, and so on?

All of those corrections get influenced by human nature. People have a personally desired outcome, and also they get cold feet if their numbers come out too far from the other polls. It's a form of groupthink.

So what did I do? I took Nate Silver's "polls plus" predictions of the margin of victory in each state (accounting for DC and the weird split votes in Nebraska and Maine). Then I did 24 separate projections, for all possible integer combinations of a 0-7 point adjustment factor (call it "x") in Trump's favor and an 0-2 point "tossup margin". The states that were within the tossup margin I split evenly between Trump and Clinton. 301 electoral votes is Trump's average over those 24 projections. 

I didn't just invent the "x factor". There was talk of a "shy Trump" effect, which I thought had credibility. People didn't want to admit to a stranger over the phone that they were voting for Trump, because the media made it sound like voting for Trump was worse than armed robbery. They pounded on it all summer and into the fall. Further, they made it seem like Trump's chances were much worse than they really were. They were constantly saying Trump would quit, that his campaign was imploding, that there would be a credible spoiler candidate and so on. None of this would have stood up to the least bit of journalistic investigation, so do yourself a favor and ignore those people from now on.

There were other clues as well. Trump drew more primary voters than any Republican candidate in history, and his rallies were (sorry) HUGE. But those were only clues, not hard numbers.

I wanted something to back up the "x factor." My prior guess was 3 points, but I found two analyses to see whether Trump did better in anonymous online polls than in phone polls. One was from Nate Silver who aggregated a bunch of polls from the primary (that's his thing, aggregating stuff) and came up with a negligible difference. That, I ignored, because it was from the primary (whole different set of voters) and comparing aggregated numbers is a poor way to determine bias. The other was a very recent one by Politico. This one was useful and demonstrated everything that went wrong with pollsters.

First, the Politico headline was "Shy Trump Voters Are A Mirage" when the actual results said the opposite, so right away you can see some wishful thinking. (To be fair, the pollster probably didn't write the headline.) They compared a single phone poll to a single online poll and showed a 2% "x factor." That was dismissed as not statistically significant; in science, you would do a further study but this pollster didn't. But then things get interesting. They revealed that among voters with a household income of more than $50,000 a year, the "x factor" was 10%! About half of all voters fit into that category, so even if there was zero "shy Trump" effect for the other half of voters, there would be about a 5% "x factor."  - more than enough to flip the election. I suppose there could've been a negative "x factor" for voters with incomes less than $50,000 to cancel it out, but that seems really farfetched and they no doubt would have reported it.

A similar thing happened when they broke out the data for people with a B.S. degree or higher. So something obviously went wrong in the analysis. Somehow they adjusted away the "x factor" that was staring them in the face.

Now I admit my own bias. I got cold feet, too. I thought that even the conservative estimate of 5% from the Politico poll was too big, and I didn't have a lot of insight into what this Politico poll had actually done. So I set my "x factor" range to an average of only 3.5%. I figured if I got the basic election call right, nobody would care about the exact numbers, and 3.5% was enough to flip it. So my near-exact prediction of electoral votes owes something to luck.