Stock Analysis Methods For Amateurs

Ethan Meyer
INST414: Data Science Techniques
6 min readDec 16, 2023

There is a fundamental problem for the amateur investor when it comes to analyzing data and understanding intricate patterns and correlations from stock market data that are crucial for informed decision-making. The amateur investor understands technical indicators that personificates a stock’s health but lacks the proper resources to utilize these metrics to influence their decision-making skills. These resources often require sophisticated tools and software that the amateur investor is not willing to pay, or does not need a complex algorithm to influence their annual trades. In most cases, websites like AlphaGo and Polgon.io require a monthly or yearly subscription to use their data and tools when in many cases the amateur investor only needs access to this crucial data once or twice a year or a day trader who is starting does not want to invest a portion of their capital into one of these software.

The investing world is already inherently volatile and is shaped by an array of economic indicators, company performances, global events, and government regulations, that have been tried to be summarized through technical indicators that can influence the health of a stock. This is why companies try and market their sophisticated tools to the amateur investor, but in some cases for amateur investors, this route is not optional. This is where my project comes in, the solution to the amateur investing problem. The project does this by retrieving technical stock data through a variety of APIs like Athena API and YahooFinance and employs statistical methods to find correlations and patterns between the different technical factors. These factors are measured compared to the rate of change to the price of the stock, but a full comprehensive analysis of the script will be explained later.

The dataset that is used in this analysis consists of data retrieved using Athena and YahooFianace API. Both of these APIs were essential to getting the complete technical factors that were necessary for the analysis, ie YahooFianace API only produced volume, high, low, open, and close data for the stocks while Athena API was used to retrieve the more unique technical factors such as moving average, volume, resistance levels, Bollinger Bands, Relative Strength Index (RSI), Price to Earnings (P/E), Price to Book (P/B), and the Moving Average Convergence Divergence (MACD) . The data from these two APIs were merged and normalized based on weekly reports, getting the frozen technical data every Friday for the past decade. I chose to choose my sample size of stocks to be companies that are currently in the S&P 500, regardless of how long the company has been in the S&P500. Joining the data based on weekly timeframes made the normalization streamlined, for both API’s used the same timestamp format making the process much easier.

The next phase of the project involves data cleaning, a critical step to ensure the integrity and reliability of the dataset. This process is divided into three key sections: handling missing values, detecting outliers, and identifying duplicated data across APIs. Handling missing values was the most imperative for a null value or zero would completely skew averages or not allow the script to run properly.
After storing the weekly technical factors for 500 United States S&P500 stocks, I stored this file as a CSV file and these stocks were used to conduct a sample analysis to see which technical factors had a significant influence on the price of a stock, and to what degree. This sample comprehensive analysis serves as a compass for amtuer investors, unraveling the significant influencers and their respective degrees of impact on stock prices. This analysis will be broken down to unravel obvious and non-obvious insights, considering correlation and rate of change to discover patterns between complicated figures. Firsty, the regression analysis revealed several technical factors that wield substantial influence over stock prices. Unsurprisingly, the ‘Close’ price emerged as a pivotal determinant of future stock prices. The coefficient associated with ‘Close’ represents the estimated change in the dependent variable for a one-unit change in this factor while holding other variables constant. Additionally, the ‘Volume’ of trading and the ‘Moving Average’ exhibited statistically significant coefficients, underlining their significance in predicting stock price movements.

A particularly fascinating aspect of the analysis involves unraveling the complex interaction among different technical factors. Take, for example, the ‘Bollinger Bands,’ which act as a measure of price volatility. They revealed an inverse correlation with stock prices, indicating that as volatility increased, stock prices underwent more pronounced fluctuations. This underscores the significance of market stability in predicting price movements. Likewise, the ‘Relative Strength Index (RSI)’ displayed a non-linear relationship with stock prices, shedding light on the subtle effects of overbought or oversold conditions on market dynamics.

The regression study probed the complex relationship between stock prices and macroeconomic data, going beyond the use of individual technical indicators. In particular, subtle effects on stock prices were seen for the “Price to Earnings (P/E)” and “Price to Book (P/B)” ratios, which are commonly used to assess a business’s valuation. Higher stock prices were positively correlated with a lower P/E ratio, which is frequently a sign of undervaluation. But the state of the market as a whole and investor sentiment are closely related to this relationship. The contextual subtleties highlight how important it is to pay closer attention to the overall dynamics of the market. As such, a thorough comprehension of these macroeconomic variables becomes essential for investors, giving them insights to skillfully navigate the everchanging markets

The code acts as the bridge between statistical abstraction and practical insights, translating mathematical intricacies into tangible understanding. By inspecting the coefficients in the code output, analysts and investors can discern the quantitative impact of each technical factor on stock prices. A positive coefficient, as demonstrated in the code, signifies a positive correlation — when the technical factor increases, stock prices tend to increase as well. Conversely, a negative coefficient implies an inverse relationship, indicating that an increase in the technical factor is associated with a decrease in stock prices.

The limitations of this project can be divided into two sections, external and internal limitations. The stock market is oriented around human interpretation and speculation, therefore opening up the whelm for the external factors that can manipulate the market outside of technical factors. External factors, especially geopolitical events, present formidable challenges. Sudden political shifts, regulatory changes, or global crises can exert profound influences on financial markets, transcending the predictive power of quantitative models. Human behavior in response to geopolitical events adds another layer of complexity, introducing unpredictable dynamics that quantitative analysis may struggle to fully comprehend. Continuing, human behavior plays a significant role in market sediments. Emotions, speculative activities, and irrational decision-making by market participants introduce an element of unpredictability that quantitative models prove challenging to capture. The intricate interplay of psychological factors, herd behavior, fear, and confidence ratios, all underly the quantitive analysis performed by this project and many like it.

As mentioned, the inherited market complexity makes quantitative models excel in speculation and prediction but fall short of capturing the entirety of the market dynamics.

In terms of limitations that I experienced internally, the most challenging would be working around the action calls for the API’s. Free APIs can only handle a certain amount of bandwidth for calls, thus I had to structure my database in portions so that the call could be executed. Another limitation that I experienced would be the graphical complications that arose during this project. I was able to see the linear regression between the relative strength index but failed to visually represent any of the relationships and correlations between the variables. On another platform, performance Bloomberg, better visualization of the data and analysis could more aptly be completed.

In conclusion, this project tackles a crucial issue faced by amateur investors — the accessibility and usability of technical indicators. By leveraging APIs like Athena and Yahoo Finance, the dataset spans a decade of weekly stock data for S&P 500 companies. The regression analysis uncovers valuable insights into how technical factors influence stock prices. Serving as a linchpin, the code translates statistical complexities into practical understanding.

Acknowledging limitations, external factors such as geopolitical events and human behavior, along with internal challenges like API action calls and graphical complexities, add layers of complexity. Despite these challenges, the project stands as a solution to democratize financial data, offering amateur investors actionable insights for informed decision-making in the dynamic world of the stock market.

Link to github
https://github.com/EthanMeyer41/INST414Final

--

--