Tracking America’s Industrial Renaissance

Laraib Kamal
AlphaGeo Insights
Published in
8 min readMay 28, 2024

Introduction

In August 2022, the Biden Administration enacted the Inflation Reduction Act (IRA), marking a significant milestone in American policy, focusing on energy security and climate resilience. This comprehensive bill introduced over 20 tax incentives aimed at boosting clean energy and manufacturing, enhancing domestic supply chains, reducing household energy expenses, and cutting greenhouse gas emissions while creating job opportunities. The IRA also extended clean energy tax incentives to tax-exempt entities and included measures to strengthen supply chain resilience.

As a result of the IRA, the US economy has seen a massive uptake in private and public investment in clean energy, semiconductors, EV batteries and biomanufacturing to name a few. Large organizations like General Motors started building plants and investing more in clean energy and semiconductors. There was also an increase in foreign investment in the US, targeting these industries. The White House started tracking these private investments, highlighting the investment amount, company, and industry (The White House). Figure 1 shows the investment flow by the top 15 companies (selected by frequency) as tracked by the White House.

Figure 1. Top 15 companies’ investment flow as of 1st April 2024

Research Objectives

AlphaGeo, a pioneering company in risk and resilience, has embarked on an in-depth analysis of private investment trends using the White House data, focusing on three fundamental questions.

Question #1: How can AlphaGeo’s data and techniques be applied to assess the financial performance of publicly listed companies based on resilience factors (per our framework) of investments with respect to specific geographic locations?

Question #2: Do public companies that are deploying capital in resilient locations perform better than others, across short- and medium-term investment durations?

Question #3: Are there certain thematic drivers that can be generalized as a result of the data we are collecting coupled with the methods we are applying?

‘Resilient locations’ is a broad term encompassing many dimensions. At AlphaGeo, we define resilient locations as geographies that may benefit from disruption or changes in the natural or human-caused environment.

Methodology

We structured our research into two distinct phases. This section will briefly outline the experiments and results in each phase.

Phase 1.0: Correlation and Regression Analyses

We began our analysis by identifying the geographic coordinates for the top 15 companies as selected by a Natural Language Processing (NLP) algorithm applied to data tracking planned capital allocation resulting from IRA (and other related) capital commitments. We used the NLP algorithm to count the number of times a company’s name appeared in the White House list and sorted by frequency to generate the top 15 list (Table 1). We then pulled select financial information for the top 15 firms as defined by our NLP algorithm.

Financial indicators included share price, earnings per share (EPS) and earnings before interest, taxes, depreciation, and amortization (EBITDA) by quarter and year (2015–2023).

Table 2 contains all socio-economic indicators used in our analysis and Table 3 contains climate risk data collected by AlphaGeo. Indicators in Table 2 are obtained from various sources including the White House, ACS Census data, Bureau of Transportation, and others.

Table 3 indicators are curated by AlphaGeo after extensive research, data analysis and Machine Learning (ML) modeling. Data in table 3 reflects historical trends in climate change but it is not a time-series data, thus limiting its use at this point.

The original socio-economic data was on different geographic levels, mostly on county and census tracts. The climate data curated by AlphaGeo was on H3 (a discrete global grid system that consists of a multi-precision hexagonal tiling of the sphere with hierarchical indexes). However, all data was aggregated to ZIP Code level for analysis. We used either a python library or the crosswalk files provided by Housing and Urban Development (HUD) to aggregate the data to the same geography. We used the most recent year data (not older than 2020) to run the regression and correlation analysis.

Based on the indicators above, we ran correlation analyses and cross-sectional Ordinary Least Squares (OLS) regression models to understand the relationship between the climate and socio-economic indicators with actual EPS.

Correlation Analyses:

We observe significant positive correlation between investment and percentage of population privately insured, percentage of population with at least a bachelor’s degree or above, household median income, and somewhat a significant positive correlation with availability of renewable energy sources. These relationships indicate that investment is flowing into areas that have more favorable socio-economic profiles.

When we assess financial performance (via EPS), we observe a significant positive correlation with wind and household debt, while other factors demonstrate weaker — but still significant — correlations.

Cross-Sectional OLS Regressions:

We began our regression analysis with only social indicators against Actual EPS as the dependent variable. The second experiment included the deconstructed climate risk indicators, and the third experiment applied both socio-economic indicators and aggregated climate scores for the regression analysis.

The third model (combining socio-economic indicators and climate scores) was the most promising. With all variables, the OLS model exhibited a high R² (0.989) but was ineffective due to multicollinearity and an excessive number of variables. To address this, we used a Random Forest Feature Importance model to create Model 2 with the top 50% of indicators, resulting in a lower but more appropriate R² (0.74). Model 2 identified statistically significant variables at the 90% confidence interval (CI), including non-correlated climate and socio-economic indicators, which correlate well with Actual EPS and financial performance.

Further analysis showed minimal multicollinearity (VIF (Variance Inflation Factor) < 5) and improved Mean Squared Error values, indicating Model 2’s effectiveness in predicting outcomes beyond the training data. This model can be a foundation for building more robust predictive models and applicable to other sectors and regions.

Phase 2.0: Location Analytics, Timeseries OLS Regression Model and Predictive Modeling

In phase 2 we turned our attention to only time-series data to analyze historical trends (Table 4). Quarterly data for socio-economic indicators was calculated by dividing the yearly value by 4 for each year.

Using indicators in Table 4, we conducted three experiments: location analytics, timeseries OLS regression model and random forest regressor model.

Location Analytics:

If we consider domain knowledge and state policies of these ZIP Codes spread across the US, we know that states perform differently. We selected 4 ZIP Codes in 4 different states and ran a basic correlation between financial indicators and socio-economic indicators. The locations selected were Greenville, NC, Kalamazoo, MI, Pearl River, NY, and Gresham, OR.

Based on the correlation analysis and independent OLS regression models for each of the 4 unique locations we saw that indicators interacted quite differently in each location. For example, if we look at income growth rate and actual EPS, Greenville, NC, observes a positive correlation of 0.3, while Kalamazoo, MI, sees a negative correlation of -0.31. Pearl River, NY, exhibits a positive correlation of 0.43 while in Gresham, OR, there is a negative correlation of -0.09. Similar variations were found in other indicators interaction.

Timeseries OLS Regression:

Our analysis of actual EPS, using 916 out of 1200 observations, yielded a R² of 0.21 and identified several statistically significant indicators, including investment, median household income, income growth rate, employment rate, education level, GINI index, public insurance coverage, household debt, and percentage of well-maintained bridges (a proxy for infrastructure). Refining our model by selecting the top 50% of indicators reduced R² to 0.17 but highlighted the most influential factors more clearly. The significant indicators in the refined model included median household income, house price to income ratio, income growth rate, employment rate, public insurance coverage, and percentage of well-maintained bridges.

Random Forest Regressor:

For the top 15 companies (Table 1) identified, we conducted random forest regressor, an ML algorithm that predicts numerical outcomes by combining multiple decision trees. We train the model on 80% of the data and test it on 20%. Using location-specific time-series data from 2016–2021, we fill missing values with the variable’s mean. For testing, we predicted the actual EPS for Q4,2021 and compared it to the publicly reported value.

In evaluating the predicted values for each company, we conducted three key comparisons:

  1. First, we examined the direction of predicted values to discern trends and patterns. For most selected companies, the direction of predicted EPS values was the same as the EPS value reported for Q4, 2021, showing promise in using these variables to analyze future financial trends.
  2. Second, we scrutinized the variance between the reported estimated and actual EPS values, as well as between the reported EPS and our average predicted actual EPS shedding light on the accuracy of our predictions. Once again for most companies in the top 15 list, these differences came out to be minute, encouraging us in our findings (Figure 2).
  3. Last, we assessed the quarterly match rate of the top 15 companies to give us a sense of how precise our model is, given the limitations. The overall match rate for all companies was quite low, and thus, this finding, when integrated with our broader analysis, indicates that our estimates are quite decent.
Figure 2

These analyses offered a full understanding of our predictive framework’s performance and reliability.

Conclusion

The findings provide valuable insights into the interplay between climate, socio-economic factors, and financial performance. AlphaGeo’s analysis demonstrates statistically significant skill in developing models and financial tracking indices that incorporate climate and socio-economic data, underscoring their relevance to share prices.

One of the key conclusions drawn from our study is the significance of location on the financial performance of companies. Our analysis suggests that geographies highlighted in committed or planned deal flow, particularly in regions leading the energy transition, play a pivotal role in shaping financial outcomes. This reinforces the importance of considering location-specific factors in investment decision-making processes.

Our prediction model shows promise, evidenced by trends in hit rate, beat/miss performance, and estimated EPS reported over the study period. However, limitations such as the lack of time-series climate scores and the potential impact of recent high-impact events like the pandemic and climate volatility suggest that past trends may not fully capture future realities.

Despite the challenges and limitations encountered in our study, the findings provide an optimistic outlook for the future of sustainable investing. By uncovering the significant impact of climate and socio-economic factors on financial performance, our research illuminates new pathways for asset managers to steer towards more resilient and profitable investment portfolios. While acknowledging the uncertainties of the future, the insights gleaned from this study offer hope, suggesting that with informed decision-making and adaptive strategies, investors can not only mitigate risks but also capitalize on the vast opportunities presented by the transition to a more sustainable economy.

--

--