Introduction
This week’s post builds on last week’s and introduces the structure followed in each post leading up to the election. I will develop two models to predict the national two-party popular vote and forecast the Electoral College outcome. These models will be updated with new data, such as recent polls, and enhanced with additional variables and prediction methods to fine-tune our projections.
National Two-Party Vote Predictions
I decided to shift away from using a weighted polling average and instead focus on a simple average of polls from weeks 5 to 8. The main reason for this change is that the weighted polling average, constructed by regressing historical data on each week, resulted in an artificially high R-squared of around 0.95 when recent poll data was added. This high correlation occurred because the weighted polling average was based on the same outcome factor being regressed again, leading to a model that seemed highly accurate but was likely over fitting to past data. By switching to an average of polls from weeks 5 to 8, I aim to use a more straightforward metric, which should offer better predictive power without inflating the model’s performance. I will later explain why I chose this specific period.
As a result, this week’s linear regression model, which is otherwise identical to last week’s, has a lower adjusted R-squared. However, I believe this model is a better predictor for future outcomes. The share it predicts for Harris has increased to 53% which is more compared to last week, likely due to recent polling trends in her favor over Trump. I am unsure why Q2 RDPI growth is no longer statistically significant in this model. My main theory is that polls closer to the election are more strongly correlated with the actual two-party vote share, which may reduce the overall impact of Q2 RDPI growth in the model.
Democrat National Two Party Vote Share | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | 20.36 | 5.09 | 8.85 – 31.87 | 0.003 |
weighted_avg_poll | 0.60 | 0.11 | 0.34 – 0.86 | <0.001 |
RDPI_growth_quarterly | 0.35 | 0.27 | -0.27 – 0.97 | 0.237 |
incumbent_party | 3.56 | 1.66 | -0.19 – 7.30 | 0.060 |
Observations | 13 | |||
R2 / R2 adjusted | 0.788 / 0.718 |
Prediction | Lower.Bound | Upper.Bound |
---|---|---|
53.30% | 46.56% | 60.04% |
For comparison, I also created a random forest model using the same covariates and time period but through a different methodology. The in-sample fit of the random forest model produced an MSE of around 7%, which is higher than the linear regression’s MSE of around 5%.
The random forest model also predicts a popular vote victory for Harris, but by a much smaller margin. In fact, her predicted lead of 1.63% in the popular vote suggests that losing the Electoral College could be a real possibility. To fully assess the outcome of this election, we must look more closely at individual states.
year | Actual | Predicted |
---|---|---|
1968 | 49.60% | 47.63% |
1972 | 38.21% | 43.61% |
1976 | 51.14% | 50.76% |
1980 | 44.84% | 47.47% |
1984 | 40.88% | 44.91% |
1988 | 46.17% | 47.26% |
1992 | 53.62% | 50.76% |
1996 | 54.80% | 51.98% |
2000 | 50.26% | 49.90% |
2004 | 48.73% | 47.52% |
2008 | 53.77% | 49.79% |
2012 | 51.92% | 51.63% |
2016 | 51.16% | 50.19% |
year | Prediction |
---|---|
2024 | 51.63% |
State Level Predictions
I would like to begin by discussing the states I will be covering in my predictions. As we discovered in week 1, there are only seven states that could realistically be won by either party in this election based on historical trends. This is further supported by the fact that these seven states are the only ones currently listed as toss-ups by both the Cook Political Report and Sabato’s Crystal Ball. As a result, it is impractical to predict the outcome in any other state, where the winner is almost certain. This means we are starting with Harris holding 226 electoral votes and Trump with 219 votes. The remaining 95 electoral votes are in play across Arizona, Georgia, Nevada, Pennsylvania, Wisconsin, and Michigan.
To predict the outcomes in these swing states, I built a linear regression model using polling averages, Q2 RDPI growth, the incumbent party, and state-level differentiation. The decision to focus on polling averages from 8 weeks left till the election to 5 weeks left in the national model comes from the fact that this was the smallest, most recent time frame in which every state had available polls for all elections measured from 1968 to 2016. Another choice I made was to use national RDPI growth instead of state-level data. While state-specific economic indicators could offer additional granularity, many voters tend to view the economy in national terms. Even if their state is doing well, hearing about broader economic struggles in the media may influence their perception of the national economy negatively. For this reason, I decided to stick with national Q2 RDPI growth in the model.
Democrat Two Party Vote Share | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | 27.19 | 3.61 | 19.94 – 34.43 | <0.001 |
avg_poll_8_5 | 0.43 | 0.10 | 0.23 – 0.62 | <0.001 |
stateGeorgia | -0.79 | 2.05 | -4.89 – 3.31 | 0.700 |
stateMichigan | 4.13 | 1.94 | 0.24 – 8.03 | 0.038 |
stateNevada | 0.18 | 2.06 | -3.95 – 4.32 | 0.929 |
stateNorth Carolina | -2.23 | 1.92 | -6.08 – 1.61 | 0.250 |
statePennsylvania | 3.94 | 2.05 | -0.17 – 8.05 | 0.060 |
stateWisconsin | 3.37 | 2.00 | -0.64 – 7.39 | 0.098 |
RDPI_growth_quarterly | 0.32 | 0.24 | -0.15 – 0.80 | 0.175 |
incumbent_party | 2.30 | 1.36 | -0.42 – 5.02 | 0.096 |
Observations | 65 | |||
R2 / R2 adjusted | 0.561 / 0.489 |
This model saw Harris winning Michigan, Nevada, Wisconsin, and Pennsylvania which would push her over 270 and result in her wining the presidency. The adjusted R-squared was less that 0.5 suggesting this model does not account for much of the variance in state elections.
state | year | Prediction | Lower.Bound | Upper.Bound |
---|---|---|---|---|
Arizona | 2024 | 49.82% | 40.89% | 58.75% |
Georgia | 2024 | 49.18% | 40.26% | 58.09% |
Michigan | 2024 | 54.49% | 45.77% | 63.21% |
Nevada | 2024 | 50.20% | 41.37% | 59.04% |
North Carolina | 2024 | 47.80% | 39.07% | 56.54% |
Pennsylvania | 2024 | 54.21% | 45.47% | 62.95% |
Wisconsin | 2024 | 53.96% | 45.23% | 62.69% |
Similar to national two party vote, I also created a random forest model that predicted the state level outcomes. This model resulted the same state level prediction except for Arizona which flipped back to Harris by a extremely thin margin.
state | year | Prediction |
---|---|---|
Arizona | 2024 | 50.001% |
Georgia | 2024 | 49.822% |
Michigan | 2024 | 52.429% |
Nevada | 2024 | 51.497% |
North Carolina | 2024 | 49.798% |
Pennsylvania | 2024 | 52.190% |
Wisconsin | 2024 | 52.410% |
Conclusion
Overall, all of my models predict a Harris victory in the Electoral College and national two party popular vote though by slim margins. I look forward to refining these models further as we get closer to election day.