BLOG

THE SCIENCE OF PREDICTION

29.05.21

Can you believe it’s been more than six months since that nail-biting US presidential election? For many of us “outsiders”, the choice between the two candidates seemed clear cut, but the incoming results proved anything but. 


Onlookers around the globe endured a slow, tortuous few days glued to CNN and the like before the rightful victor was called.

For anyone with only a passing interest in US politics, such a complex voting system can be incredibly confusing. The Guardian explains:


“The winner of the election is determined through a system called the electoral college. Each of the 50 states, plus Washington DC, is given a number of electoral college votes, adding up to a total of 538 votes. More populous states get more electoral college votes than smaller ones. A candidate needs to win 270 electoral college votes (50% plus one) to win the election.


“In every state except two – Maine and Nebraska – the candidate that gets the most votes wins all of the state’s electoral college votes.


“Due to these rules, a candidate can win the election without getting the most votes at the national level. This happened at the last election, in which Donald Trump won a majority of electoral college votes although more people voted for Hillary Clinton across the US.”


Thankfully, the predictions – a victory for Democrats Joe Biden and Kamala Harris – proved correct.


As early as 27th October, artificial intelligence had predicted a Biden win, but we’d have to wait until 7th November for confirmation – four days after the in-person polls closed.


Hernan Makse – a statistical physicist at City University of New York – runs the Complex Networks and Data Science Lab at the Levich Institute in Manhattan. Using social media traffic – including over one billion mined tweets per project – Makse’s lab utilises AI to predict the outcomes of international elections. In late October, Makse’s AI models had shown Biden with a healthy lead over Trump in the national vote, but only a “very, very small” advantage in the electoral college.


The Independent reported:


“In some ways, AI suffers the same flaws as traditional polling and surveying options surrounding elections and political campaign cycles nationwide.


“The same voters who live in rural communities and are otherwise unlikely to be reached by a pollster are the same people who do not often access social media platforms like Twitter, Mr Makse said, effectively excluding them from the data.”


The article continues:


“Predicting elections in other countries is a somewhat easier process since the national vote often determines the outcome. In the US, however, machines must be trained to learn different models for the electoral college that coincide simultaneously with the national vote predictions.


“Mr Makse said AI models like his are still in the process of learning how to rescale and predict outcomes with sampling biases and other such limits, and that, one day, machines would be able to easily make up for that lack of knowledge.”


Makse contends that one day AI will be able to successfully predict the outcome of the electoral college in any given scenario. With AI learning from every new data point, it’s surely a matter of when, not if.


It certainly seems that predictive analytics have evolved since the 2016 election. Quite how much remains to be seen.

In the wake of an unexpected Trump victory, The New York Times published an eye-opening editorial entitled “How Data Failed Us in Calling an Election”:


“It was a rough night for number crunchers. And for the faith that people in every field - business, politics, sports and academia - have increasingly placed in the power of data.


“Donald J. Trump’s victory ran counter to almost every major forecast - undercutting the belief that analyzing reams of data can accurately predict events. Voters demonstrated how much predictive analytics, and election forecasting in particular, remains a young science.”


The article continues:


“…data science is a technology advance with trade-offs. It can see things as never before, but also can be a blunt instrument, missing context and nuance. All kinds of companies and institutions use data quietly and behind the scenes to make predictions about human behavior. But only occasionally - as with Tuesday’s election results - do consumers get a glimpse of how these formulas work and the extent to which they can go wrong.”


Erik Brynjolfsson, a professor at the Sloan School of Management at MIT, is quoted as saying “the key thing to understand is that data science is a tool that is not necessarily going to give you answers, but probabilities.” In the run-up to the 2016 election, Hilary Clinton’s chances of winning were determined to be in the 70 to 99 percent range, leaving a 1 – 30% chance of Trump taking power. And he took it.


The New York Times pointed to a combination of the shortcomings of polling, analysis and interpretation as reasons for the forecasting error. Data scientists further blamed the inherent weakness of election models, with Dr. Mutalik of Yale claiming: “Even with the best models, it is difficult to predict the weather more than 10 days out because there are so many small changes that can cause big changes. In mathematics, this is known as chaos.” 


When it comes to weather predictions, however, historical data dates back a lot further than 1972 – the starting point used to calibrate FiveThirtyEight’s election model. On 24th October 2016, that model put Clinton’s chances of winning at a whopping 85 percent.


In a February 2017 article, Forbes shared their take on the faulty forecasting:


“In the days after the election, some analysts speculated that the faulty poll findings had failed to heed weak signals that, in retrospect, should have been loud and clear. Instead, the results were interpreted selectively to fit a cognitively biased model that the pollsters assumed to be correct. Cognitive psychologists call this a confirmation bias: the tendency to interpret new evidence as confirmation of what one already believes is true.


“For example, many pollsters were convinced that Trump wouldn't capture votes from the educated white middle class, which compelled them to discount evidence to the contrary. Another weak signal was Secretary Clinton’s uncertain support in key districts that President Obama had won in the previous two elections. Similarly, the apparent fact that a larger proportion of the country was shifting toward conservatism and populism also escaped the attention it merited.”


The article continues:


“In data science, such weak signals are referred to as ‘dark data’ - data that is collected and stored but not properly identified for analysis. For predictive modeling to achieve its intent, the dark data must be illuminated. Otherwise, confirmation bias can creep in and businesses, too, will get run over.”


In Oscar Wilde’s ‘An Ideal Husband’, readers are told: “To expect the unexpected shows a thoroughly modern intellect.” On that basis, to overlook or ignore the “unexpected” signifies the opposite. But pollsters weren’t going to be burned twice. Trump’s 2016 success represented more than just a change at the top – it changed the game entirely, leading to much less bullish predictions this time around.


What’s more, firms such as Advanced Symbolics have already identified areas for their model Polly to improve in 2024:


“We need to include more ethnic and regional ‘factors’ for the next election. Amplifying errors make them easier to uncover - finding where Polly went astray, issue by issue, state by state.”


Venture Beat explains:


“Firms like KCore Analytics claim their AI models are superior to traditional polling because they can be scaled up to massive groups of potential voters and adjusted to predict outcomes with sampling biases (like underrepresented minorities) and other limits. They correctly predicted the U.K. would vote to leave the European Union in 2016, and they correctly predicted about 80% of the winners in Taiwan’s parliamentary elections, as well as close regional races in India and Pakistan.


“But they aren’t infallible… none of these models takes into account the way legal challenges, faithless electors (members of the electoral college who don’t vote for the candidate they’d pledged to), or other confounders might affect the outcome of a race. And with Polly as a case study, these approaches - like traditional polls - appear to have underestimated voter enthusiasm for Trump in 2020, particularly among Black and Latinx voters and members of the LGBTQ community.”


If nothing else, these AI models serve to illustrate our incredibly high expectations of technology – often far higher than our expectations of the politicians about whom we predict in the first place.