A Real-World Experiment with ChatGPT-Generated Strategies

Showcasing the true potential of AI and optimization in trading

Austin Starks
7 min readOct 10, 2023

When I posted my article on the ChatGPT-generated strategy outperforming the market, reactions were predictably mixed. Redditors in particular on subreddits like /r/OpenAI and /r/Quant were extremely skeptical. “ChatGPT is just a language model!” some pointed out. “If this ChatGPT-based strategy is such a game-changer, why aren’t you sipping champagne on a yacht somewhere?”

NexusTrade — AI-Powered Finance

While the internet is a tough crowd to please, what got lost in the chatter was the very essence of what I aimed to showcase: the utility of Large Language Models (LLMs) like ChatGPT in the formulation of trading strategies. See, while it’s interesting that ChatGPT can outperform Buy and Hold on a backtest, what’s more interesting is how fast I was able to create such a strategy.

But nevertheless, I am curious… Is ChatGPT a better trader than human-traders?

Psst! The original article was posted on Aurora’s Insights. Subscribe to stay up-to-date with AI, Finance, and the intersection between the two!

So today, I’m here to take the experiment to its next logical step — deploying it into the real-world, or at least, as real as paper trading can get. We are running an experiment comparing 3 fundamentally-different approaches: Buy and Hold, a ChatGPT-generated strategy, and optimized versions of those strategies. In this experiment, we’ll show how simple it is to configure 5+ different portfolios and deploy all of them live to the market. We’ll then follow-up, and see how well those portfolios do live in the market.

For more context on how I developed a strategy that beat the market with ChatGPT, check out this article: ChatGPT generated my algorithmic trading strategy. It beat the market.

A Tool, But A Powerful One

It’s natural that people were skeptical — I mean, the idea that ChatGPT is secretly a Wall Street veteran is a little absurd. Nonetheless, the utility that GPT provides for algo-trading is a little more subtle. Let me break it down.

ChatGPT can produce text that serves as ready-to-go configurations for trading platforms like NexusTrade. This means that it can generate backtest settings, optimization variables, and even core strategy configurations. In other words, ChatGPT is an invaluable tool for traders that speeds up the testing and deployment process exponentially.

To put simply, ChatGPT is nothing more than a tool. An extremely powerful tool, but a tool nonetheless. Similar to how a calculator doesn’t transmute you into a mathematician, LLMs aren’t going to transform you into a Wall Street Wizard overnight. But rather, a savvy trader using these tools can generate strategies faster, test them more meticulously, and bring them to the market more efficiently. A process that would have taken a trader months can now take them minutes. That’s impressive.

Want to learn more about NexusTrade? Check out this article: I created an open-source automated trading platform. Here’s how much it’s improved in a year.

The Road to Deployment: A Detailed Walkthrough

So let’s demonstrate the true utility of ChatGPT — the ability to configure and test multiple different trading strategies with ease.

For this venture, we are pitting three distinct methodologies against each other: Classic Buy and Hold, a ChatGPT-designed strategy, and optimized variants of these strategies. All strategies will exclusively use technical indicators, steering clear of company fundamentals for this round.

The Experiment

Our experimental design pits a ChatGPT-generated strategy against traditional methods such as Buy and Hold, with a twist: We’re also including several optimized versions of the original strategy for good measure. After all, what’s an experiment without a few variables to shake things up?

All strategies will be performed using technical indicators. Company fundamentals, which are often utilized to formulate trading strategies, will not be used for this experiment. A future experiment could incorporate this type of data to see if it would improve the results.

The Control Groups

Control 1: Buy and Hold of SPY

This is our baseline — a straightforward purchase and hold strategy for SPY, the ETF that mirrors the S&P 500.

Control 2: Buy and Hold of TQQQ

Here we hold TQQQ, a leveraged ETF that tracks the NASDAQ-100. This is our ‘high-stakes’ control.

The Underdog: ChatGPT-Generated Strategy

ChatGPT generating a trading strategy by itself

The core strategy here is generated by ChatGPT. Based on it’s recommendations, we’ve designed a strategy that trades TQQQ, offering a leveled playing field against our ‘high-stakes’ control.

The Optimized Portfolios

The optimization process generates several optimized portfolios

Optimized Portfolio 1 — One and Done Optimization

This strategy takes the original ChatGPT-generated approach and optimizes it once, based on historical data. The objective is to assess whether a one-time optimization can improve performance.

Optimized Portfolio 2 — Expanding Window Optimization

Here, we take the portfolio from One and Done Optimization and continually optimize it based on an expanding window. Periodically, we’ll re-optimize the portfolio keeping the same start date but expanding the end date. The goal is to see if incorporating more historical data over time can lead to better results.

Optimized Portfolio 3— Sliding Window Optimization

This strategy also involves continual optimization of the portfolio from One and Done Optimization, but the window of past data slides forward in time, disregarding older data. This will test if focusing on more recent data offers an advantage.

The statistics for the optimized portfolio we selected

Evaluation — How will we compare out results?

At the end of the experiment, we’ll be comparing our portfolios on several key metrics including percent change, sharpe ratio, and the maximum drawdown.

Initial Impressions: The Backtest Results

Even though the purpose of this experiment is to see how well the ChatGPT-generated portfolios do, it wouldn’t be fun if we didn’t have some initial guesses! First, let’s see how each of these portfolios perform during backtests.

Unoptimized ChatGPT Portfolio vs SPY

The ChatGPT-generated portfolio underperformed compared to Buy and Hold of SPY

The Unoptimized portfolio is profitable, but significantly underperforms Buy and Hold for SPY. With this one example, we can see that ChatGPT isn’t a Wall Street Pro. Nevertheless, let’s see how the optimized versions fare.

Optimized Chat-GPT Portfolio vs SPY

The ChatGPT-generated portfolio vastly overperformed compared to Buy and Hold of SPY

Now we’re talking! The optimized portfolio performs significantly better than buying and holding the S&P 500 (SPY). While these results look spectacular, let’s take a look at a comparison vs TQQQ.

Optimized Chat-GPT Portfolio vs TQQQ

The ChatGPT-generated portfolio did comparable to Buy and Hold TQQQ

The optimized portfolio does slightly better than Buy and Hold on TQQQ. This is still a pretty exciting result to see; we can see that the optimization process at least leads to a better portfolio from the baseline during backtests.

Want to generate algorithmic trading algorithms using natural language? Check out the NexusTrade AI-Powered Chat! It’s fast, powerful, and free to try!

Caveats of the AI-Generated Portfolios

Before we get ahead of ourselves, there are some important caveats of the AI-Generated portfolio. One of these is that the ChatGPT-generated portfolio was overtly simple. It consisted of exactly two rules: a buy condition and a sell condition for a single asset. These strategies weren’t intricate either, and used very basic technical indicators. One could likely retrieve better results if they used an algorithm that was a bit more sophisticated.

Want to see a more sophisticated portfolio? Subscribe to get exclusive access to the latest insights of AI. Aurora’s Insights discusses Artificial Intelligence, Finance, and the intersection between the two.

Additionally, another caveat is that the portfolio was only optimized for an hour. If we ran the optimization overnight on more sophisticated hardware, we could potentially converge to an even better solution. However, for cost considerations, we stopped the optimization early.

Will AI Supersede Tradition?

We’re exploring whether the ChatGPT-crafted strategy, particularly its optimized incarnations, can transcend the venerable Buy and Hold method. Given that 80–90% of traders struggle to outperform the S&P 500, even a modest win for any of the ChatGPT portfolios would be groundbreaking. It would underscore the game-changing role of generative AI in modern finance.

Whatever the results are, it’ll be exciting to see the outcomes of this experiment. As it stands, there are no comparable studies that test the effects of genetic optimization on portfolios for real-time trading.

If nothing else, this experiment has indisputably demonstrated one thing: OpenAI’s technology has drastically reduced the time and effort required to formulate, test, and optimize trading strategies. Just think of the labor it would take to manually code those seven strategies in Python, deploy them to the market, and present them in a user-friendly interface!

Thanks for reading! If you want to see the results of this portfolio, be sure to subscribe to Aurora’s Insights — NexusTrade blog. To subscribe, you can create an account on NexusTrade or provide your email on the subscription page. If you’re interested in AI, trading, or the intersection between the two, reach out to me on my social media!

📸 Catch us on Instagram

🎵 Dive into our TikTok

📚 Follow our thought pieces on Medium

🤝 Connect with me on LinkedIn

--

--

Austin Starks

https://nexustrade.io/ Highly technical and ambitious. Building a no-code algotrading platform and an ecosystem of AI applications. https://nexusgenai.io