Six Months Later | Proper Tangible Unique kdb Technology

Steve Wilcockson
6 min readJan 3, 2023

--

I joined KX in May 2022. KX develops kdb+ and kdb Insights time series database and analytics technology in conjunction with the ultra-optimized analytics programming language q. Unless you’ve worked in capital markets, you may not have come across it, but with my capital markets background, I know it as mainstream high frequency data analytics tech.

I left a great open source outfit to join KX, and my move delighted and worried colleagues and friends in equal measure. Guidance veered from positive “you’ll work with proper tangible unique tech” to less positive “isn’t kdb+ an anachronism in an open source, Python and cloud-centric world?.” Truth be told, I joined KX with my eyes somewhat open with considerable doubts about the tech, albeit a positive gut feel. Now, six difficult months in, I’m more excited than ever about the unique tech and now feel compelled to discuss why. But first, I’ll explore my early doubts, which, to my mind, reflect market reality, perceptions and dynamics around programming languages, industry verticals, analytics and open source.

My Two Initial Doubts

Doubt 1: I love open source. Counterintuitively, I left an open-source dedicated organization to join KX, reputedly a proprietary tech. In addition, at my prior organization, I’d spent time with many open-source environments servicing time-series in some form or other — Cassandra, Hazelcast, Ignite, Druid, Kafka, Pulsar, Hadoop, Spark, and more.

Doubt 2: I’d tracked the impressive Confluent-led Kafka developer circuit. Kafka for those of you who don’t know it, is the world’s leading event streaming platform and it plays very closely with the big data, time series and real-time analytics stack, including kdb. I’d attended April’s awesome Kafka Summit in London, just before joining KX, one of my last gigs for my prior open source organization. Everyone who was anyone in the time-series, database and streaming analytics world exhibited with one exception — KX. Why weren’t they there? I was surprised and I actually reconsidered my joining them because of this.

It got worse; only a few attendees at the conference I spoke with had even heard of KX/kdb+. I counted two volunteered instances, a one-time Morgan Stanley q developer, a pub/sub specialist, and I interrogated another mention from an alleged KX competitor, “nah, don’t see it. I heard about it once when we visited a bank.” Why had so few at this big data and event streaming event heard of kdb or KX?

This didn’t bode well for my intended move. Yet I saw also many exhibiting organizations at the Kafka Summit, my own at the time included, demonstrating sticking plaster solutions to cover performance, capability, support and infrastructure inefficiencies, inadequacies and incurred workflow costs of mainstream big data infrastructures. My perception of kdb was its ultra-efficiency — my prior company proudly shouted 40% performance and cost improvements, KX quietly up to 100x for critical time-series queries — yet I still hadn’t clocked why KX was neither at the event nor giving its perspectives. The event needed them there.

Some Answers Six Months On.

First, that kdb+ technology has been predominant in capital markets is a blessing and a curse. Like laddish Seann William Scott as Stifler in American Pie or affable Martin Freeman in The Lord of the Rings franchise or The Office, kdb tech got typecast as the preserve of fast capital markets data analytics. Capital markets and kdb interdependency was convenient, while beyond capital markets, with some exceptions (e.g., manufacturing/IoT), kdb just was less prevalent. Most firms at the Kafka Summit did not focus on capital markets, much more so manufacturing, public sector, retail, payments, defense, travel, online platforms, core banking. In American Pie typecasting terms, Seann William Scott was probably neither invited to, nor even aware of, the Nomadland audition.

Second, KX is built differently to the majority of the tech on display at Kafka Summit. Kdb is not built on general purpose enterprise languages like Java or C++ as their foundation, and as I discovered in my prior (Java) work life, that is what partly binds this big streaming data developer network together. Indeed, this Kafka Summit was pretty much a homage to the uneasy marriage between Java/OpenJDK — most tech built around the Kafka ecosystem utilizes Java and the Java Virtual Machine — and Python, the predominant language for the data science that delivers business impact.

Kdb integrates with those languages and technologies, yes, but as kdb is not quite from the same Java or C++ stable, it’s not automatically been part of that party, and it’s been tucked away on Wall Street, in Singapore and the City of London.

And that is an important point. Kdb tech is built on an interpreted hyper efficient language q which runs on smaller footprints, kilobytes not megabytes unlike its peers, to power computationally and memory-intensive analytics processes. Unlike Java and C++, it does not carry the overheads of object-orientation and other protocols not needed for data analytics, vector mathematics or columnar data processing. By all means, it can, should and does interoperate with those languages and technologies, but big data queries, whether rudimentary or complex, are so compute- and memory- intensive, it makes sense to not run unnecessary bloat. That’s why it has underpinned capital markets for so long. For cloud in particular, that matters. Small, efficient kdb instances can combine real time and historical data and run anywhere, whether in or alongside a centralized cloud data warehouse, or right at the heart of the institution, where data enters or exits the organization, no waste, less cost, and fast.

I know from my prior experience the challenges Java and OpenJDK has had with cloud adoption for big data. Complex Java Virtual Machines bring compute and memory overheads, making technologies like Kafka, Spark, Cassandra, Druid and others difficult to maintain in-house and optimize resource. Garbage collection anyone? That is partly why commercial organizations like Confluent, Databricks, Aiven, Datastax, ImplyDB and others provide managed services alongside support— they conveniently abstract away the complexity of transitioning older enterprise applications built in the pre-cloud era with traditional languages — like Kafka, Spark, Cassandra, Druid — to modern cloud data pipelines, but instead they carry the additional overheads in the costs of their services.

Why use kdb in a cloud-first world?

With slimline low-overhead kdb instances on and off-cloud, you can get up to 100x more efficient time-series analytics meaning:

  • In-memory analytics for both real-time and historical queries for data scientists and key lines of business, e.g., predictive healthcare, risk management, predictive maintenance
  • Ultra-fast performance for speed-dependent lines of business in capital markets, payments fraud, real-time embedded markets such as medical devices, and Formula 1
  • Considerable cost savings for “big” data analytics running in the cloud for CFOs, Architects, DevOps & FinOps
  • Reduced carbon footprints for sustainability officers. No need for expensive Snowflake cloud data warehouse time-series computations when kdb technology runs orders of magnitude more efficiently, right where the data is.

And Open Source?

KX company culture matters and has not disappointed. First, I love its dedication to Python. Python at KX is increasingly THE user interface (UI) for kdb. Think of kdb as a time-series plug-in for Python, like Pandas in many respects but lightning-fast, so great for production use and testing. Run your Notebooks as before, but turbocharge them with larger data-sets, managed and analysed at speed by kdb.

Second, the corporate culture at KX is open and developer-centric, both for those dedicated to q and new joiners like myself who advocate open source technologies like Python and R, and community and collaborative development. Does this mean kdb will ship with a permissive open source license like OpenJDK, R or Python? Who knows? I personally hope so, but that goes above my paygrade. In lieu of that, kdb+ is freely available for personal use, which has not been the case for other commercial applications I’ve worked with.

While KX wasn’t at the Kafka Summit and isn’t yet an ever-present on the big data developer circuit, look out for us at future events! If the Kafka community will have us, that is. And yes, KX should have been there in April 2022. Imagine a Kafka broker streaming into a kdb instance running an inferred machine learning algorithm on live production data in real time, exporting output to a Pythonic training algorithm powering a centralized Snowflake warehouse or a DataBricks data lakehouse.

So despite my initial doubts, after six difficult months working at KX and a long road yet to travel, I’m really excited to work with hyper-efficient kdb technology, built differently for massive computational efficiency and speed. Given its unique situation in the market, it’s ideal for cloud first analytics, and, except for the great unwashed of capital markets, remains yet-to-be-discovered by the many.

Did I make the right move? Let me know your thoughts.

https://kx.com/kdb-personal-edition-download/

--

--

Steve Wilcockson

Loves the intersection of quantitative tech, data science and society, also all things postcolonial as a First Nations-sympathetic white guy perspective.