Introduction

This week’s post builds on last week’s and introduces the structure followed in each post leading up to the election. I will develop two models to predict the national two-party popular vote and forecast the Electoral College outcome. These models will be updated with new data, such as recent polls, and enhanced with additional variables and prediction methods to fine-tune our projections.

National Two-Party Vote Predictions

I decided to shift away from using a weighted polling average and instead focus on a simple average of polls from weeks 5 to 8. The main reason for this change is that the weighted polling average, constructed by regressing historical data on each week, resulted in an artificially high R-squared of around 0.95 when recent poll data was added. This high correlation occurred because the weighted polling average was based on the same outcome factor being regressed again, leading to a model that seemed highly accurate but was likely over fitting to past data. By switching to an average of polls from weeks 5 to 8, I aim to use a more straightforward metric, which should offer better predictive power without inflating the model’s performance. I will later explain why I chose this specific period.

As a result, this week’s linear regression model, which is otherwise identical to last week’s, has a lower adjusted R-squared. However, I believe this model is a better predictor for future outcomes. The share it predicts for Harris has increased to 53% which is more compared to last week, likely due to recent polling trends in her favor over Trump. I am unsure why Q2 RDPI growth is no longer statistically significant in this model. My main theory is that polls closer to the election are more strongly correlated with the actual two-party vote share, which may reduce the overall impact of Q2 RDPI growth in the model.

Linear Regression Model for Democrat Two Party Vote Share (In-sample MSE: 4.9637 )
	Democrat National Two Party Vote Share
Predictors	Estimates	std. Error	CI	p
(Intercept)	20.36	5.09	8.85 – 31.87	0.003
weighted_avg_poll	0.60	0.11	0.34 – 0.86	<0.001
RDPI_growth_quarterly	0.35	0.27	-0.27 – 0.97	0.237
incumbent_party	3.56	1.66	-0.19 – 7.30	0.060
Observations	13
R² / R² adjusted	0.788 / 0.718

2024 Democrat Two Party Vote Share via Linear Regression
Prediction	Lower.Bound	Upper.Bound
53.30%	46.56%	60.04%

For comparison, I also created a random forest model using the same covariates and time period but through a different methodology. The in-sample fit of the random forest model produced an MSE of around 7%, which is higher than the linear regression’s MSE of around 5%.

The random forest model also predicts a popular vote victory for Harris, but by a much smaller margin. In fact, her predicted lead of 1.63% in the popular vote suggests that losing the Electoral College could be a real possibility. To fully assess the outcome of this election, we must look more closely at individual states.

In-sample Fit of Democrat National Two Party Vote Share via Random Forest (In-sample MSE: 7.0798 )
year	Actual	Predicted
1968	49.60%	47.63%
1972	38.21%	43.61%
1976	51.14%	50.76%
1980	44.84%	47.47%
1984	40.88%	44.91%
1988	46.17%	47.26%
1992	53.62%	50.76%
1996	54.80%	51.98%
2000	50.26%	49.90%
2004	48.73%	47.52%
2008	53.77%	49.79%
2012	51.92%	51.63%
2016	51.16%	50.19%

2024 Democrat National Two Party Vote Share via Random Forest Model
year	Prediction
2024	51.63%

State Level Predictions

I would like to begin by discussing the states I will be covering in my predictions. As we discovered in week 1, there are only seven states that could realistically be won by either party in this election based on historical trends. This is further supported by the fact that these seven states are the only ones currently listed as toss-ups by both the Cook Political Report and Sabato’s Crystal Ball. As a result, it is impractical to predict the outcome in any other state, where the winner is almost certain. This means we are starting with Harris holding 226 electoral votes and Trump with 219 votes. The remaining 95 electoral votes are in play across Arizona, Georgia, Nevada, Pennsylvania, Wisconsin, and Michigan.

To predict the outcomes in these swing states, I built a linear regression model using polling averages, Q2 RDPI growth, the incumbent party, and state-level differentiation. The decision to focus on polling averages from 8 weeks left till the election to 5 weeks left in the national model comes from the fact that this was the smallest, most recent time frame in which every state had available polls for all elections measured from 1968 to 2016. Another choice I made was to use national RDPI growth instead of state-level data. While state-specific economic indicators could offer additional granularity, many voters tend to view the economy in national terms. Even if their state is doing well, hearing about broader economic struggles in the media may influence their perception of the national economy negatively. For this reason, I decided to stick with national Q2 RDPI growth in the model.

Linear Regresion Model for Swing States
	Democrat Two Party Vote Share
Predictors	Estimates	std. Error	CI	p
(Intercept)	27.19	3.61	19.94 – 34.43	<0.001
avg_poll_8_5	0.43	0.10	0.23 – 0.62	<0.001
stateGeorgia	-0.79	2.05	-4.89 – 3.31	0.700
stateMichigan	4.13	1.94	0.24 – 8.03	0.038
stateNevada	0.18	2.06	-3.95 – 4.32	0.929
stateNorth Carolina	-2.23	1.92	-6.08 – 1.61	0.250
statePennsylvania	3.94	2.05	-0.17 – 8.05	0.060
stateWisconsin	3.37	2.00	-0.64 – 7.39	0.098
RDPI_growth_quarterly	0.32	0.24	-0.15 – 0.80	0.175
incumbent_party	2.30	1.36	-0.42 – 5.02	0.096
Observations	65
R² / R² adjusted	0.561 / 0.489

This model saw Harris winning Michigan, Nevada, Wisconsin, and Pennsylvania which would push her over 270 and result in her wining the presidency. The adjusted R-squared was less that 0.5 suggesting this model does not account for much of the variance in state elections.

2024 Democrat Two Party Vote Share via Linear Model in Swing States
state	year	Prediction	Lower.Bound	Upper.Bound
Arizona	2024	49.82%	40.89%	58.75%
Georgia	2024	49.18%	40.26%	58.09%
Michigan	2024	54.49%	45.77%	63.21%
Nevada	2024	50.20%	41.37%	59.04%
North Carolina	2024	47.80%	39.07%	56.54%
Pennsylvania	2024	54.21%	45.47%	62.95%
Wisconsin	2024	53.96%	45.23%	62.69%

Similar to national two party vote, I also created a random forest model that predicted the state level outcomes. This model resulted the same state level prediction except for Arizona which flipped back to Harris by a extremely thin margin.

2024 Democrat Two Party Vote Share via Random Forest Model in Swing States
state	year	Prediction
Arizona	2024	50.001%
Georgia	2024	49.822%
Michigan	2024	52.429%
Nevada	2024	51.497%
North Carolina	2024	49.798%
Pennsylvania	2024	52.190%
Wisconsin	2024	52.410%

Conclusion

Overall, all of my models predict a Harris victory in the Electoral College and national two party popular vote though by slim margins. I look forward to refining these models further as we get closer to election day.

Post #5: State Level Predictions

Avi Agarwal

2024/10/07

Introduction

National Two-Party Vote Predictions

State Level Predictions

Conclusion