by Arndt Leininger, Research Fellow at the Chair for Political Sociology of Germany at Freie Universität Berlin
The world is looking forward to the US presidential election in anticipation. This, of course, also applies to political science, where colleagues have been working on election forecasting models for many years. In the past, such forecasts often succeeded in predicting the election with surprising accuracy weeks or even months before it happened. But in 2016, forecasters, just as the general public, were taken by surprise. Most forecasters correctly predicted that the Democratic candidate, Hillary Clinton, would win the so-called Popular Vote. However, the Republican candidate, Donald Trump, collected a majority of the votes in the Electoral College and was elected 45th President of the United States. Against the backdrop of the surprising 2016 election, in addition to the corona pandemic and ensuing economic downturn, the 2020 US Presidential election seems to be most challenging to forecast yet. In this blog post, guest author Arndt Leininger (FU Berlin) presents the scientific election forecasts for the upcoming election, which have been published in the October issue of the journal ‘PS Political Science and Politics.’
++++German Teaser +++
Die Welt sieht den US-Präsidentschaftswahlen gespannt entgegen. Dies gilt natürlich auch für die Politikwissenschaft, wo Kolleg*innen seit vielen Jahren an Modellen zur Wahlprognose arbeiten. In der Vergangenheit ist es mit solchen Prognosemodellen oft gelungen, den Wahlausgang Wochen oder sogar Monate im Voraus mit überraschender Genauigkeit vorherzusagen. Doch im Jahr 2016 wurden die Politikwissenschaft, wie auch die breite Öffentlichkeit, überrascht. Die meisten sagten richtig voraus, dass die Kandidatin der Demokraten, Hillary Clinton, den sogenannten Popular Vote gewinnen würde. Der republikanische Kandidat, Donald Trump, erhielt jedoch die Mehrheit der Stimmen im Electoral College und wurde zum 45. Präsident der Vereinigten Staaten gewählt. Vor dem Hintergrund der überraschenden Wahl 2016, zusätzlich zur Corona-Pandemie und dem darauf folgenden wirtschaftlichen Abschwung, scheint die US-Präsidentschaftswahl 2020 die bisher schwierigste für Wahlprognosen zu werden. In diesem Blog-Beitrag stellt unser Gastautor Arndt Leininger (FU Berlin) wissenschaftliche Wahlprognosen für die bevorstehende Wahl vor, welche in der Oktoberausgabe der Zeitschrift ‘PS Political Science and Politics’ veröffentlicht wurden.
Forecasting represents a small but continually growing field of research in political science. Forecasting, broadly speaking, refers to statistical models aimed at predicting phenomena relevant to political science before they happen. Forecasting models can be found in many subdisciplines, where they are used to predict, for instance, armed conflicts or court decisions, among others. But election forecasting models are among the most widespread and most fully developed forecasting models in political science. Scientific election forecasts originated in the late 1970s in the United States, just as vote intention polls had decades before. Since then, more and more political scientists have constructed election forecasting models, which are now regularly published by the journal “PS Political Science and Politics” in the run-up to an election.
Election forecasting is different from polling in that polls, like the well-known “Sonntagsfrage” in Germany, only capture a snapshot in time of public opinion. As such, they are strictly speaking, not a forecast, especially when the election is still far away. Pollsters themselves never tire of emphasizing this, but their warnings are often ignored by the media and a wider public. In contrast, scientific election forecasts explicitly aim to predict the actual event before it happens. This contrasts with the dominant retrospective explanatory perspective in the discipline. In fact, Michael Lewis-Beck, one of the pioneers of election forecasting, sees no significant difference between constructing a forecasting model and classical research work: Both consist of consulting theories, expressing them in an estimation equation, collecting data, estimating the equation, and determining the empirical fit. However, as Philipp Schrodt argues, predictive validity is more difficult to achieve than the (retrospective) explanation and is, therefore (still) neglected in the social sciences.
And indeed, in light of the surprising outcome of the 2016 election, the 2020 Corona pandemic and the concomitant economic downturn, simply achieving accuracy appears to be particularly difficult this time around. The “2020 Presidential Election Forecasting” symposium in the most recent issue of the journal ‘PS Political Science and Politics,’ featuring twelve different forecasting models, was published on October 15. However, some of the models included were completed weeks or even months before. Most but not all of them predict that Donald Trump will have to leave office after only one term.
Correlating the past, to predict the future
The majority of the forecasting models presented in the symposium are so-called structural models. In its simplest form, a structural model is an OLS regression model estimated on a time-series of national election results and further covariates. Structural models determine correlations between election results and explanatory factors and extrapolate these correlations into the future. Such models usually forecast the proportion of the national vote allocated to the party’s candidate in office, based on some measure of the state of the economy in the election year and the popularity of the sitting president.
Erikson and Wlezien’s model is typical of this approach: it includes two variables capturing economic conditions and one variable capturing aggregate values from national polls, such as the share of respondents approving of the US president, and regresses them on the popular vote. Based on their model, they forecast the incumbent president Donald Trump to receive 45% of the vote nationwide. Brad Lockerbie also uses political and economic variables, but for the latter, he relies on the populations’ subjective assessment of the state of the economy as expressed Survey of Consumer Attitudes and Behavior. He forecasts 55% in the popular vote for Donald Trump. Andreas Graefe ‘Issues and Leaders’ Model in contrast, solely relies on polling aggregates based on questions about the candidates’ issue-handling competence and leadership skills. Based on his model, he forecasts a national vote share of 48.6% for Trump.
Alan Abramowitz’s ‘Time for Change’ model is one of the longest-running forecasting models, which he used to forecast US presidential elections since the 1980s. 2020, he argues, is unlike any election before. Therefore, he changed his model specification, the sample on which the model is estimated, as well as the dependent variable, forecasting the electoral vote instead of the popular vote. Given the latest figures from Gallup’s presidential approval poll, his model predicts about 205 electoral votes for Donald Trump – over 60 votes short of a majority in the Electoral College, which comprises 538 votes in total. In contrast, Lewis-Beck and Tien believe that any effect of the pandemic and the economic downturn are factored into their model through their economic growth and presidential popularity measures. Opposite to Abramowitz, they conclude that their ‘Political Economy’ model does not require any modifications. They forecast 43.3% for Trump in the popular vote. In 2016, their forecast for the popular vote was off by only a tenth of a percentage point. However, it was the electoral vote, which was not in line with the popular vote, that decided the election.
The forecast of the two-party vote share is only meaningful as long as the winner of the popular vote also achieves a majority in the Electoral College. This has almost always been the case. However, in two of the past five elections, George W. Bush in 2000 and Donald Trump in 2016, the winning candidate lost the popular vote. In 2016, Hillary Clinton won a majority of the votes cast nationwide. While Clinton won by a clear margin in populous states such as California, Trump won by a very narrow margin in several smaller states, thereby collecting all the votes from those states and securing a majority in Electoral College. In this respect, Norpoth’s forecast based on his “Primary Model” in 2016 that Trump would win the election was not entirely correct, as he critically notes himself, as he wrongly saw him as the winner of the popular vote.
This is why Abramowitz replaced the dependent variable of his model. This is also why Lewis-Beck and Tien include a forecast of the electoral vote based on the correlation between the electoral vote and the popular vote. According to their model, Trump will garner only 68 electoral votes. This would imply a landslide for Biden, reminiscent of Ronald Reagan’s 1980 victory over Jimmy Carter, the latter only collected 49 electoral votes. Norpoth, in the 2020 update of his ‘Primary Model’, applies the same methodology as the models mentioned above but uses solely political and no economic variables. And, for the first time, Abramowitz forecasts the electoral vote instead of the popular vote. He forecasts 362 electoral votes for Donald Trump and, thereby, a comfortable win for the incumbent president.
Going down to the state-level to get the electoral vote right
Another way to deal with a possible disconnect between the popular vote and electoral vote in 2020 is state-level forecasting. Rather than using a single national time-series, state-level forecasting models are estimated on a panel of states. They forecast the result of the presidential election in each of the 50 states as well as the District of Columbia, which also contributes two votes to the Electoral College. This approach also addresses a common problem of structural models: the small sample size due to the limited number of previous elections. In state-level forecasting models, more variables can be considered. Furthermore, by aggregating over the forecasts for individual states, state-level forecasts can estimate both the electoral and popular vote.
Bruno Jérôme, Véronique Jérôme, Philippe Mongrain, Richard Nadeau construct such a state-level structural model, which relies on both economic and political predictors at the state level only. They predict that Trump will receive only 230 out of 538 votes in the electoral college. His share of the popular vote will be 48.31%, according to their model. Jay DeSart provides an early forecast with his Long-Range State-Level Forecast. His forecast, which has Trump at 188 vote in the Electoral college and 45.2% in the popular, was a full year before the election. Because state-level polling is not available for all states that far in advance of an election, he relies on national polling aggregates in addition to state-level variables. Enns and Lagodny, in contrast, did collect state-level presidential approval data for all states, however, at the expense of shorter lead time. Their State Presidential Approval/State Economy Model, produced 104 days before the election, forecasts a much closer result than DeSart does: Trump will win 248 votes, just 22 votes short of a majority in the Electoral College, but only 45.5% of the vote nationwide.
More information through aggregation
All models presented so far use regression models to estimate correlations over past elections, which are then extrapolated to the future to produce a forecast for the upcoming election. Yet, regression models are not all there is to forecasting. In the remaining paragraphs I present three further approaches to scientific election forecasting: predictions markets, citizen forecasting and forecast aggregation.
After the Iowa Presidential Stock Market successfully predicted Bush’s victory in the 1988 US presidential election, prediction markets also became a topic in political science. Thomas Gruca and Thomas Rietz report on the Iowa Electronic Market, which allows any member of the public to trade futures contracts on the outcome of the presidential election. The fact that participants invest real money in the market incentivizes them to think hard about who they think will win not who they want to win. In the “vote-share” market, a “UDEM20_VS” contract pays the vote share received by the democratic candidate in cents. In the “winner-takes-all” market, a “REP20_WTA” contract pays a Dollar if Donald Trump wins reelection and nothing at all if Joe Biden wins. Current prices on the “vote-share” market can thus be interpreted as a forecast of the popular vote. Prices on the “winner-takes-all” market can be interpreted as the predicted probability of the respective candidate winning the election. On August 26, 2020, the Republican futures contract traded at 49.95%, essentially predicting a toss-up.
Andreas Murr and Michael Lewis-Beck polled 2483 Americans, and at least 30 per state, and asked them not who they will vote for but who they think will win the presidency and who will win their state. They call their approach “citizen forecasting.” Unlike vote intention polls, this approach does not require a representative sample of the population if respondents can be assumed to possess a relatively informed understanding of their fellow citizens’ partisan leanings. Interestingly, when aggregating the raw data, they predict a sort-of replay of the 2016 election: Biden will win the popular vote with 51.3%, but Trump will win in the Electoral College with 334 votes. When extrapolating the correlation between citizens’ expectations in past elections and the actual outcomes, their forecast becomes even less optimistic Biden, now also forecasting a popular vote win for Trump.
Finally, J. Scott Armstrong and Andreas Graefe incorporate all of the above forecasts and further forecasts in their PollyVote model. The PollyVote is a simple average of all forecasts, giving equal weight to each forecasting method. The reasoning behind the PollyVote is that no method will always outperform all the others, and therefore each approach should be given equal weight. Furthermore, averaging over all available forecasts should serve to cancel out random errors in individual forecasts. The average forecast produced by the PollyVote at the time of writing was 47.9% of the popular vote for Donald Trump. As the PollyVote is continuously updated as new polls and forecasts come in, it is worthwhile to visit its interactive website.
It’s easy to be wise after the event
The forecasts presented here are not just diverse in approaches but also in predicted outcomes. Figure 1 summarizes the different forecasts. They range from a Democratic landslide – only 69 electoral votes for Donald Trump – predicted by Lewis-Beck and Tien’s ‘Political Economy’ model to a decisive victory for the incumbent president with 362 electoral votes predicted by Helmut Norpoth’s ‘Primary Model.’ This variety of approaches and predicted outcomes inevitably leads to the question, which forecast model is the most trustworthy. And, some critical readers might want to add: What’s the point of scientific forecasting at all if it produces wildly divergent forecasts.
Which forecast should one trust? This is not an easy question to answer. We could look at the different models’ past performance to form an idea about their potential future performance. Here, Enns and Lagodny, as well as Erikson and Wlezien, fared quite well, missing the 1996-2016 national election results by on average less than two percentage points, as the former document themselves. However, it is very unlikely than a single forecast will always outperform all other forecasts as Andreas Graefe warns us. Political scientists will certainly want to have a closer look at methodology before choosing their preferred forecast.
Even if we will all know after November 3rd which forecast was the most accurate, we must be attentive to the fact that accuracy is not everything. Consider the following example: You have one forecast, which was delivered six months before the election and that was off by three percentage points. And, you have another forecast, which missed the actual result by only two percentage points but was made only two weeks before the election. Is the second forecast really better? Well, it all depends on how you value accuracy versus lead time. For instance, in conflict research, forecasters aim for lead times of at least one year, since otherwise, their predictions would have little value for foreign and security policy.
Finally, regarding the great variance in forecasts, political scientists will certainly not be surprised by different scientists coming with different and competing findings. When interpreted as predictive tests of theories of voting behavior, forecasts can make a direct contribution to our scientific knowledge about elections. Structural models, in particular, are explicitly theory-based. They are less affected by specific developments during the election campaign than opinion polls. Hence, they run the risk of being led ad absurdum by idiosyncratic events shortly before the election. But, forecasts provide a kind of expected typical election result based on which the actual election can be assessed. Thereby allowing us to learn what was special about this particular election. Arguably, forecasts can have an explanatory value and contribute to our scientific knowledge base even if they are wrong in some cases. Hence, while we still wait for the votes to be cast and counted, one thing is for certain, we will all be wiser come November 3rd.
Arndt Leininger is a research fellow at the Chair for Political Sociology of Germany at Freie Universität Berlin. In the summer term of 2020, he served as interim professor at the Center for Data and Methods at Universität Konstanz, assuming the duties in teaching of the Professorship for Survey Research. Arndt’s areas of research are in political behavior and applied quantitative methods. He is interested in direct democracy, turnout, youths in politics, election forecasting, economic voting, and electoral studies more broadly.