I changed my Analytical Data Storage to BigQuery and Holy Hell!

The Difference was Night and Day

Austin Starks
Technology Hits
6 min readJul 12, 2024

--

NexusTrade – AI-Powered Trading

I’m a big advocate of “move fast”, “break things”, and “make it simple stupid”. For my SaaS project, where I’m the sole engineer and employee, I focus on two key quality attributes: simplicity for the user and maintainability for the dev.

Other attributes like scalability have to take a backseat. Don’t get me wrong – I always design with scalability in mind. But when it comes to implementing a queue system, I will use MongoDB over a Kafka Queue 10/10 times. This mindset has served me very well.

Unfortunately, you will eventually come into bottlenecks, just like I did. When I first designed my AI-Powered Stock Screener, I didn’t know it could work at all. The concept was extremely novel, and I haven’t seen anything like it before.

And so I implemented it using MongoDB. And to my surprise, it worked!

Want to read the technical details of the MongoDB Solution? Check out the following article:

Want to read about 5 fundamentally-strong AI stocks as identified by Aurora? Check out this article.

However, this solution suffered a number of huge drawbacks.

  • Inaccurate queries: the MongoDB Aggregation Pipeline is much more complex than a SQL query. Large Language Models think the same
  • Slow response times: Sometimes, queries took seconds or even minutes to finish executing. This is unacceptable for a real-time chat application
  • Difficulty Debugging: Because of the complexity of aggregation pipelines, debugging and improving the prompt engineering was a difficult task

And thus, due to the potential of this feature as a differentiator, I decided to rebuild it using BigQuery.

And holy hell, is this solution much better.

NexusTrade’s AI-Powered Stock Screener

While companies slap the “AI” label onto everything, NexusTrade’s stock screener is actually pretty cool.

It works by taking the criteria the user specifies and searching through the database for them. For example, if you want the find the AI stocks with the highest market cap, you can simply ask the model for it, and it will give you the results!

From there, you can take the stocks you found and instantly perform historical analysis. For example, if you found 5 stocks based on their historical free cash flow, you can instantly create a portfolio with those stocks.

Read: Mr. Wonderful’s Favorite Trading Strategy: Investing in companies with the highest cash flow

You can then create a backtest, and see how these stocks performed during paper-trading

See the backtest for this portfolio here

However, due to the issues I mentioned above (Inaccurate queries, Slow Response Times, and Difficulty Debugging), this feature was not as useful as it could’ve been. The worst thing that could happen to an investor is they receive wrong information because the AI generated an inaccurate query!

So I decided to fix this.

From Leaf to Lightning: Upgrading from MongoDB to BigQuery

Costs and Complexities: The Fine Print of our Data Upgrade

I’ll first discuss the cons because this article will be overwhelmingly positive. Setting up Google Cloud (particularly with navigating through permissions) is the biggest headache for a developer like me.

Don’t get me wrong – for an enterprise product, having the power to control what organizations have permission to what APIs is extremely important.

But holy hell is it a nightmare!

My next big complaint is the complexity of BigQuery. If I didn’t use it everyday for my day-job, my god would I be lost!

Like what the heck is a dataset? What is a project? Who knows that!

(Of course I could just read the documentation, but as a “move fast, break things” engineer, who really has time for that? 🙄)

Lastly, I’ll briefly discuss the errors I encountered, and (boy were there a lot). The most common error I saw was insertion errors, because the data I was trying to insert didn’t match the schema provided.

As all MongoDB users know, this just doesn’t happen! We insert whatever we want whenever we want. SQL-based databases do not agree!

As you can see, these aren’t super serious cons. I wouldn’t let my nitpicks deter you from BigQuery; In fact, I wish I had considered it as a solution much earlier, and here’s why.

Lightning in a Bottle: BigQuery’s Performance Paradigm

BigQuery is Google’s powerful, serverless analytical database. It excels in allowing you to perform big data processing. I thought it would be a suitable replacement for the following reasons:

  • Relatively Easy to Setup: I know I complained about this in the beginning, but compared to setting up a Postgres/Timescale database instance (that wasn’t guaranteed to solve my performance problems), being able to quickly try BigQuery was a massive plus
  • Lightning Fast: From my day job, I saw how BigQuery was able to process millions of rows of claim data in a few seconds (or less). I knew it was lightning fast
  • Easy to Maintain: Unlike a Postgres database, I don’t have to fiddle with indices to get BigQuery performance to be acceptable. It will just be fast out of the box.

And while I haven’t finished the integration entirely, initial testing is blowing my mind!

The Request/Response for my prompt engineering

To manage my prompts, I’ve created my own, personal prompt management platform called NexusGenAI. This platform enables me to integrate with LLMs and manage my prompts without writing them in the source code.

The end result of this (extremely complicated) request is a query that just works!

The result of this query in BigQuery

In all honesty, this integration has made me appreciate the beauty of SQL. Trying to do the same thing with the MongoDB Aggregation Pipeline is an absolute nightmare. With BigQuery, it just works.

And it works very well.

The BigQuery Breakthrough: Transforming NexusTrade’s Data Future

Now, the BigQuery solution isn’t perfect. Sometimes, the model generates basic syntax errors. Other times, it generates the wrong response for complicated requests (particularly when using weaker models like GPT-3.5).

But the difference is undeniably night and day. Right or wrong, BQ queries terminates in seconds, compared to the minutes (or more!) it would take for an equivalent query in MongoDB.

And most importantly, Large Language Models are very good at reading and writing SQL. These results were obtained with limited prompt engineering. Layering on additional LLM techniques like integration with RAG pipelines or fine-tuning is bound to give even better accuracy than these initial tests!

Combined with the improved speed and the ease of debugging, I’m overall extremely excited. I’m getting ready to build a feature that I would personally use everyday – A Siri-like financial assistant. An accurate, extremely fast, large language model-based stock screener.

How cool is that?

Thank you for reading! If you’re interested in algorithmic trading and AI subscribe to Aurora’s Insights! Want to be among the first to know when this feature is released? Create an account on NexusTrade today!

Follow me: LinkedIn | X (Twitter) | TikTok | Instagram | Newsletter

--

--

Austin Starks
Technology Hits

https://nexustrade.io/ Highly technical and ambitious. Building a no-code algotrading platform and an ecosystem of AI applications. https://nexusgenai.io