Introducing ArcticDB: Powering data science at an active investment management firm with US$ 143.3bn* AUM

Matthew Hertz
ArcticDB
Published in
3 min readMar 16, 2023

Back in 2012 we created Arctic, a datastore for numeric data that was designed to address the numerous data challenges we faced at Man Group.

Traditional relational databases simply couldn’t scale to the multi-terabyte datasets we were dealing with; NoSQL databases didn’t work well with time-series data, and Hadoop based map-reduce solutions were clunky and hard to use. We needed a solution.

We had recently migrated our front office research and trading platforms to Python and wanted a database that could integrate with the the data science stack that our quants and engineers were using, day in day out.

Enter Arctic. Arctic solved all of those problems for us and we’re proud to say it has been successful even beyond the firm, with a global user community outside Man Group and nearly 3,000 stars in GitHub. Fast forward 11 years though, and the data demands of our business have grown — we needed something even better than Arctic. Today we are excited to release its successor — ArcticDB.

We started working on ArcticDB in 2018 and it has rapidly become the de facto database for the front office at Man Group, operating at petabyte scale for live trading, risk and research systems.

Our quants and engineers love the fact that they can collaborate on data problems without ever having to leave the Python data science eco-system. The versatility of ArcticDB means it is equally appropriate to use for small batch data loads, as well as large scale streaming pipelines — eliminating the cognitive load required to switch between database technologies. With bi-temporality built in, users can rewind time to assess the impact of data restatements or to align with previous versions of their models. Because ArcticDB is in active development we are continually adding new features to meet the needs of the wider ArcticDB community.

Written in C++ and optimised for modern cloud-oriented object storage, ArcticDB is fast. This means it can work with massive datasets consisting of hundreds of thousands of tables, millions of columns and billions of rows.

What makes ArcticDB different is that this level of performance is achieved without a database server, so it’s possible to scale-out horizontally to fully realise the performance of storage and compute. Much like its predecessor, ArcticDB has a simple yet powerful Pythonic API that makes storing and retrieving data using the Python and Pandas ecosystem incredibly easy, but with all the power of a fully featured database — it is a DataFrame database. All of this means that ArcticDB is really easy to setup and use. How easy? Just run pip install arcticdb point at your S3 storage and go!

Over the coming weeks we will release a series of posts looking at ArcticDB in detail, covering topics such as architecture, performance and the roadmap for the product. So if this all sounds interesting then please watch this space.

*as at 31 December 2022

If you want to join the ArcticDB community, simply follow us on Medium and Twitter, and join our Slack workspace. If you want to contribute to the continued development of the platform, check out our GitHub repository.

--

--