08-Final Prediction
Overview of Method
Similar to Bafumi et al. (2018), my model consists of two steps. First, I predict the Democratic popular vote share and run simulations to obtain a set of 5,000 potential outcomes at the nation-level (Model 1). Second, I predict the Democratic performance in each district relative to the nation-level Democratic performance (Model 2a, 2b). In the second step, the dependent variable of my model is the ratio between the Democratic vote share in a district in election t and the Democratic popular vote share in election t. In other words, I build a model to predict the extent to which the Democratic candidate in each district overperforms or underperforms relative to the nation-level Democratic vote share. To predict the Democratic vote share in each district, I multiply the predicted Democratic performance relative to the Democratic popular vote share by each one of the 5,000 simulated Democratic popular vote shares obtained in Step 1.
This method allows me to fully take into account the impact of redistricting. Especially in districts whose composition of the electorate changed greatly due to redistricting, it is inappropriate to rely on district-level historical data to predict the outcomes in 2022. In the second step of my model, I use a pooled model whereby I fit one regression across all districts instead of fitting a regression for each district. This means that the prediction for a certain district is based on historical data from different districts instead of the historical data in that particular district. Thus, even if the characteristics of a certain district changed greatly due to redistricting, we can rely on historical data from all districts to predict how well a Democratic candidate will perform in that district relative to the nation-level Democratic performance.
Step 1: Predict the Nation-Level Outcome
Model Validation
In-Sample Model Fit
The adjusted R-squared of Model 1 is 0.805. As seen in Figure 1, the predicted values based on Model 1 roughly match the actual values. The root-mean-squared error (RMSE) is 1.45.
Out-Of-Sample Testing
I test how well Model 1 predicts the Republican vote share in 2018 after holding out the 2018 election. The error is 1.005. I also conduct 1000 runs of cross-validation by randomly withholding 8 observations in each iteration, fitting the model on the rest of the observations, and evaluating how well the model predicts the observations that were held out. Figure 2 shows the distribution of the mean of the residual in each run. The mean of the residual tends to be between -1 and 1.
Prediction for 2022
Based on Model 1, the predicted Democratic popular vote share in 2022 is 48.52. The prediction interval is [45.36, 51.68] . Data on the latest generic ballot polls was retrieved from FiveThirtyEight (2022) on November 5. The polls are weighted based on recency in the same way as described above.
Lastly, I obtain a set of 5,000 potential Democratic popular vote share by drawing a sample of size 5,000 from a normal distribution whose mean is the predicted Democratic vote share and whose standard deviation is the RMSE of Model 1. Democrats are predicted to win the popular vote in only 749 of the 5,000 simulations (Figure 3).
District-Level Prediction for 2022
To predict the district-level outcomes, first, I use Models 2a and 2b to predict the extent to which the Democratic candidate in each district is predicted to overperform/underperform relative to the nation-level Democratic vote share. Data on expert ratings is based on Ballotpedia (2022) as well as the data provided in class. Data on Joe Biden’s vote share in each post-redistricting district was retried from Dave’s Redistricting (2022). Next, I obtain a set of 5,000 potential Democratic vote shares in each district by multiplying the predicted ratio in each district by the 5,000 potential Democratic popular vote shares obtained in Step 1. In doing so, I add a disturbance that is randomly drawn from a normal distribution whose mean is 0 and whose standard deviation is the in-sample RMSE of Model 2a and Model 2b in order to take into account the uncertainty in Step 2 of my model.
Through these steps, I obtain a set of 5,000 simulated outcomes, and for each of the 5,000 simulations, I count the number of districts where Democrats are predicted to win (Figure 8). The median of the predicted number Democratic seats is 198. Democrats are predicted to retain their majority in the House in 427 out of the 5,000 simulations, meaning that they are likely to lose their majority in the House.
Competitive Districts
The advantage of my model is that I can obtain a set of potential district-level vote shares for each district, taking into account the uncertainty in both Model 1 and Model 2a/2b. In California’s 22nd Congressional District, the median of the predicted Democratic vote share is 49.78 and Rudy Salas (D) is predicted to win in 2300 out of the 5000 simulations (Figure 7). This suggests that the race is a pure toss-up.
Figure 10 shows the distribution of the predicted Democratic vote shares in 30 competitive districts. The median of the predicted Democratic vote share is larger than 50 in 7 out of the 30 districts, meaning that Republicans may be slightly favored to win in most of the competitive districts. However, in all 30 districts, a Democratic two-party vote share of 50 is well within the range of the simulated Democratic vote shares, which confirms that the 30 districts are all competitive races that could go either way. Note that the range of the distribution is wider for districts where polling data was not readily available (e.g., AK-01, AZ-02). This makes sense because Model 2a yields larger errors, meaning that the size of the randomly generated disturbance tends to be larger.
References
Abramowitz, A. (2018). Will Democrats Catch a Wave? The Generic Ballot Model and the 2018 US House Elections. PS: Political Science & Politics, 51(S1), 4-6. doi:10.1017/S1049096518001567
Achen, C. H. & Bartels, L. M. (2017). Democracy for realists :why elections do not produce responsive government. Princeton University Press.
Bafumi, J., Erikson, R., & Wlezien, C. (2018). Forecasting the 2018 Midterm Election using National Polls and District Information. PS: Political Science & Politics, 51(S1), 7-11. doi:10.1017/S1049096518001579
Ballotpedia. (2022). United States Congress elections, 2022. https://ballotpedia.org/United_States_Congress_elections,_2022
Campbell, J. (2018). Introduction: Forecasting the 2018 US Midterm Elections. PS: Political Science & Politics, 51(S1), 1-3. doi:10.1017/S1049096518001592
Dave’s Redistricting. (2022). https://davesredistricting.org/maps#home
FiveThirtyEight. (2022, November 5). Do Voters Want Republicans or Democrats in Congress? https://projects.fivethirtyeight.com/polls/generic-ballot/
Gelman, A., & King, G. (1993). Why Are American Presidential Election Campaign Polls So Variable When Votes Are So Predictable? British Journal of Political Science, 23(4), 409–451. https://doi.org/10.1017/S0007123400006682
Nir, D. (2020, November 19). Daily Kos Elections’ presidential results by congressional district for 2020, 2016, and 2012. Daily Kos. https://www.dailykos.com/stories/2012/11/19/1163009/-Daily-Kos-Elections-presidential-results-by-congressional-district-for-the-2012-2008-elections