Polls Gone Wild -

Perhaps the biggest story to come out of yesterday’s primaries and caucuses–if we’re willing to exclude anything involving Donald Trump–is the massive upset Bernie Sanders achieved in Michigan. Not a single major poll predicted anything other than an easy Clinton win in that state. Instead, Sanders won by a small yet comfortable margin.

If you aren’t the sort of person who pays much attention to political polling, it may not sound like a big deal, but it is. Polls that evaluate two-person races and survey likely voters tend to be pretty reliable. They aren’t exact, of course–polling never is–but true results tend to be within the margin of error.

Before going deeper into what this might mean, it helps to put this outcome in perspective:

  * While a win for Sanders in terms of votes, he and Hillary Clinton will get the same number of primary delegates from Michigan, so by that measure it's a tie.
  * Sanders still lags behind Clinton in overall delegates and will have to do much better than tie her in order to secure the nomination. Given the remaining primaries and delegate counts, there are currently no plausible predictions that he will win the nomination. (That said, it's not impossible--just statistically unlikely.)
  * The other big contest of the day was in Mississippi, which Clinton won with a whopping 83% of the vote.
  * Where Sanders is right now, delegate-wise, is more or less where Clinton was in 2008--which is to say, a respectable showing, but not a trajectory that results in nomination.

So, how did things go so terribly wrong? Even Nate Silver’s 538 blog, which famously predicted the outcome of the 2012 Presidential election through the use of a sophisticated polling analysis model, completely blew it. 538’s own navel-gazing tries to offer up some explanations:

> > * **Pollsters underestimated youth turnout.** * **Pollsters underestimated Sanders’s dominance among young voters.** * **Pollsters underestimated the number of independent voters who would participate in the primary.** * **Pollsters underestimated Sanders’s support among black voters.** * **Pollsters missed a late break to Sanders by not polling after Sunday.** * **Some Clinton supporters chose to vote in the Republican primary.** * **Pollsters had little recent history to work with.** * **This is an outlier, a perfectly rotten combination of bad luck and bad timing.**

A few of these are particularly worth unpacking. A bad call on youth turnout is understandable–young voters are notoriously unreliable, showing up in droves to one election and staying home for the next. This makes interpreting their polling results extremely difficult.

Independent voters present a similar problem. Partisan voters are relatively easy to predict. Independents, by virtue of not having a party affiliation, may vote in one primary or the other or not at all.

As for Sanders’ surprisingly high black support in Michigan, Charles Pierce offers some intriguing thoughts as to why:

The UAW members I talked to clearly considered HRC's use of the auto bailout against Sanders to be at best a half-truth, and a cynical attempt to win their support, and they were offended by what they saw as a glib attempt to turn the state's economic devastation into a campaign weapon. These were people who watched the auto industry flee this city and this state, and they knew full well how close the country's remaining auto industry came to falling apart completely in 2008 and 2009. They knew this issue because they'd lived it, and they saw through what the HRC campaign was trying to do with the issue.

In short: since Sanders voted against the auto bailout bill because it contained Wall Street bailouts, Clinton cynically tried to use it against him even though he had already voted for a standalone auto bailout. In trying to distort the truth to turn voters away from Sanders, she may in fact have alienated them from herself. On top of that, her recent high-profile gaffes involving black activists help solidify the idea that she’s cold and impersonal, and even that she takes black votes for granted. This is unfortunate considering her careful, long-term outreach to and cooperation with black communities, particularly in the South. The optics are, at the very least, not good.

Regarding party crossover votes: they usually make up a small percentage, but sometimes this can be enough to produce a different outcome than expected. Generally speaking, it is assumed that crossovers cancel each other out as neither party will engage in it dramatically more than the other. It’s hard to say how much of that influenced Tuesday’s outcome, but it’s a factor to consider.

The note about not having much recent history to work with may explain much more than one would assume:

Michigan’s Democratic primary [was weird in 2008](http://fivethirtyeight.com/live-blog/michigan-mississippi-idaho-hawaii-primaries-presidential-election-2016/?#livepress-update-12403656) (Barack Obama wasn’t on the ballot), and the state party held caucuses in [2000](http://uselectionatlas.org/RESULTS/state.php?fips=26&f=1&year=2000&elect=1) and [2004](http://uselectionatlas.org/RESULTS/state.php?fips=26&f=1&year=2004&elect=1) that weren’t really competitive. So relying on voter history could lead pollsters astray. “Remember, we haven’t had a real Democratic presidential primary in Michigan lately,” said [Matt Grossmann](http://polisci.msu.edu/index.php/people/faculty/item/faculty/matt-grossmann), director of the Institute for Public Policy and Social Research at Michigan State University, which showed the tightest race of any late polls, with [Clinton leading by 5 percentage points](http://msutoday.msu.edu/news/2016/trump-leads-gop-field-in-michigan-democratic-race-close/?utm_campaign=media-pitch&utm_medium=email).

Michigan’s system is therefore too new and doesn’t have enough history established for reliable polling. Usually, polling models are adjusted after an election to account for how real returns differ from polled predictions. People change their minds, pollsters make bad assumptions, good sampling can be difficult, and so on. Baking in the difference between what was polled and what the real outcome was helps make for more accurate polling in the future–but there was no useful data for Michigan, so pollsters had much less to work with. This may account for a large share of the discrepancy.

The last point is well worth considering, too. Sometimes polls are just wrong and there is no good explanation for it. This is frustrating for people who care about rigorous polling–it’s better to have mistakes to learn from than mistakes that make no sense at all–but it happens. All polls come with a margin for error which usually doubles as a confidence interval. For instance, if a poll has a margin of error of +/- 2.5%, the confidence interval is double that proportion, deducted from 100%–in this case, 95%. A 95% confidence interval means you would expect the real results to fall outside the margin of error about 5% of the time (and for them to be accurate 95% of the time). That’s how confident the polls were, in this case, of a Clinton win. Did we just hit that odd 5%? It’s entirely possible, though if it happens again during this cycle there may be bigger problems with the polling models in play.

I have seen some Sanders supporters hail the Michigan results as proof that polling is bunk, but that’s throwing the baby out with the bathwater. Some polling is bunk, but good polling–which exists!–is essential. I’m not ready to dismiss the 538 team just yet, though they certainly need to avoid failing so badly in the future if anyone is to see their modeling as useful.

And if you want to read some more about all this, check out Bob Cesca’s defense of Silver–and math. One thing’s for sure: Sanders’ performance in Michigan last night will be studied by pollsters and data scientists for months if not years to come. This is a once-in-a-generation polling upset and it’s essential to determine why it happened, if we can.

Photo by jimmiehomeschoolmom