TimescaleDB vs. InfluxDB: purpose built differently for time-series data

Mike Freedman
Timescale
Published in
22 min readAug 14, 2018

An in-depth look into how two leading time-series databases stack up in terms of data model, query language, reliability, performance, ecosystem, operational management, and company/community support

Time-series data is emerging in more and more applications, including IoT, DevOps, Finance, Retail, Logistics, Oil and Gas, Manufacturing, Automotive, Aerospace, SaaS, even machine learning and AI. Most recently, the focus of time-series databases has been narrowly on metrics and monitoring; today it’s become clear that software developers really need a true time-series database designed for a variety of operational workloads.

If you are investing in a time-series database, that likely means you already have a meaningful amount of time-series data piling up quickly and need a place to store and analyze it. You may also recognize that the survival of your business will depend on the database you choose.

How to choose a time-series database

There are several factors to consider when evaluating a time-series database for your workload:

  1. Data model
  2. Query language
  3. Reliability
  4. Performance
  5. Ecosystem
  6. Operational management
  7. Company and community support

In this article, we compare two leading time-series databases, TimescaleDB and InfluxDB, to help software developers choose the time-series database best suited for their needs.

Typically database comparisons focus on performance benchmarks. Yet performance is just a part of the overall picture; it doesn’t matter how good benchmarks are if a database can’t be used in production because it has an incompatible data model or query language, or if it lacks reliability. With that in mind, we begin by comparing TimescaleDB and InfluxDB across three qualitative dimensions, data model, query language, and reliability, before diving deeper with performance benchmarks. We then round out with a comparison across database ecosystem, operational management, and company/community support.

Yes, we are the developers of TimescaleDB, so you might quickly disregard our comparison as biased. But if you let the analysis speak for itself, you’ll find that it tries to stay objective — indeed, we report several scenarios below in which InfluxDB outperforms TimescaleDB.

Also, this comparison isn’t a purely theoretical activity for us. Our company began as an IoT platform, where we first used InfluxDB to store our sensor data. However, owing to most of the differences listed below, we found InfluxDB unsatisfactory. So we built TimescaleDB as the first time-series database that satisfied our needs, and then discovered others who needed it as well, which is when we decided to open source the database. Today, less than 1.5 years later, TimescaleDB has already been downloaded hundreds of thousands of times and is deployed in production all around the world. (More on the genesis of TimescaleDB.)

In the end, our goal is help you decide which is the best time-series database for your needs.

What about “scalability”?

If you look closely at the list above, you’ll see that “scalability” or even “clustering” is missing. What we’ve found is that when developers ask for either of these requirements, what they are really asking is some combination of a) performance metrics b) high availability and/or c) storage capacity. We find it more helpful to speak about these three topics independently, instead of lumping them into a catch-all term, which is what we’ve done below.

Data model

Databases are opinionated. The way a database chooses to model and store your data will determine what you can do with it.

When it comes to data models, TimescaleDB and InfluxDB have two very different opinions: TimescaleDB is a relational database, while InfluxDB is more of a custom, NoSQL, non-relational database. What this means is that TimescaleDB relies on the relational data model, commonly found in PostgreSQL, MySQL, SQL Server, Oracle, etc. On the other hand, InfluxDB has developed its own custom data model, which, for the purpose of this comparison, we’ll call the tagset data model.

Relational data model

The relational data model has been in use for several decades now. With the relational model in TimescaleDB, each time-series measurement is recorded in its own row, with a time field followed by any number of other fields, which can be floats, ints, strings, booleans, arrays, JSON blobs, geospatial dimensions, date/time/timestamps, currencies, binary data, or even more complex data types. One can create indexes on any one field (standard indexes) or multiple fields (composite indexes), or on expressions like functions, or even limit an index to a subset of rows (partial index). Any of these fields can be used as a foreign key to secondary tables, which can then store additional metadata.

An example is below:

The advantage of this approach is that it is quite flexible. One can choose to have:

  • A narrow or wide table, depending on how much data and metadata to record per reading
  • Many indexes to speed up queries or few indexes to reduce disk usage
  • Denormalized metadata within the measurement row, or normalized metadata that lives in a separate table, either of which can be updated at any time (although it is easier to update in the latter case)
  • A rigid schema that validates input types or a schemaless JSON blob to increase iteration speed
  • Check constraints that validate inputs, for example checking for uniqueness or non-null values

The disadvantage of this approach is that to get started, one needs to generally choose a schema, and explicitly decide whether or not have indexes.

Note: In the past several years it’s been popular to criticize the relational model by claiming that it is not scalable. However, as we have already shown, this is simply not true: relational databases can indeed scale very well for time-series data.

Tagset data model

With the InfluxDB tagset data model, each measurement has a timestamp, and an associated set of tags (tagset) and set of fields (fieldset). The fieldset represents the actual measurement reading values, while the tagset represents the metadata to describe the measurements. Field data types are limited to floats, ints, strings, and booleans, and cannot be changed without rewriting the data. Tagset values are indexed while fieldset values are not. Also, tagset values are always represented as strings, and cannot be updated.

An example is below:

The advantage of this approach is that if one’s data naturally fits the tagset model, then it is quite easy to get started, as one doesn’t have to worry about creating schemas or indexes. Conversely, the disadvantage of this model is that it is quite rigid and limited, with no ability to create additional indexes, indexes on continuous fields (e.g., numerics), update metadata after the fact, enforce data validation, etc. In particular, even though this model may feel “schemaless”, there is actually an underlying schema that is auto-created from the input data, which may differ from the desired schema.

Data model summary

If your data fits perfectly within the tagset data model, and you don’t expect that to change in the future, then you should consider using InfluxDB as this model is easier to get started with. However, the relational model is more versatile and offers more functionality, flexibility, and control. This is especially important as your application evolves. And when planning your system you should consider both its current and future needs.

Query language

Generally in the world of database query languages, there have been two extremes: full SQL support on one end, and completely custom languages (sometimes known as “NoSQL”) on the other.

For more, please read our recently published detailed comparison on SQL vs. Flux.

From the beginning, TimescaleDB has firmly existed at the SQL end of the spectrum, fully embracing the language from day 1, and later further extending it to simplify time-series analysis. This has enabled TimescaleDB to have a minimal learning curve for new users, and allowed it to inherit the entire SQL ecosystem of 3rd party tools, connectors, and visualization options, which is larger than that of any other time-series database.

In contrast, InfluxDB began with a “SQL-like” query language (called InfluxQL), placing it in the middle of the spectrum, and has recently made a marked move towards the “custom” end with its new Flux query language. This has allowed InfluxDB to create a new query language that its creators would argue overcomes some SQL shortcomings that they had experienced. (Read the Flux announcement, the Hacker News reaction, and our comparison of SQL vs. Flux.)

At a high-level, here’s how the two language syntaxes compare, using the computation of an exponential moving average as an example:

TimescaleDB (SQL)

SELECT time,
exponential_moving_average(value, 0.5) OVER (ORDER BY time)
FROM metrics
WHERE measurement = 'foo' and time > now() - '1 hour';

InfluxDB (Flux)

from(db:"metrics")
|> range(start:-1h)
|> filter(fn: (r) => r._measurement == "foo")
|> exponentialMovingAverage(size:-10s)

For more, please read our recently published detailed comparison on SQL vs. Flux.

To summarize: For most use cases, we believe that SQL is the right query language for a time-series database.

While Flux may make some tasks easier, there are significant trade-offs to adopting a custom query language like it. The fact is that new query languages introduce significant overhead and reduce readability. They force a greater learning curve onto new users and possess a scarcity of compatible tools.

And they may not even be a viable option: rebuilding a system and re-educating a company to write and read a new query language is often not practically possible. Particularly if the company already is using SQL-compatible tools on top of the database, e.g., Tableau for visualization.

This is also why SQL is making a comeback as the query language of choice for data infrastructure in general.

Reliability

Another cardinal rule for a database: it cannot lose or corrupt your data. This is a dimension where there is a stark difference in the approaches TimescaleDB and InfluxDB have taken, which has implications for reliability.

At its start, InfluxDB sought to completely write an entire database in Go. In fact, it doubled down on this decision with its 0.9 release, which again completely rewrote the backend storage engine (the earlier versions of Influx were going in the direction of a pluggable backend with LevelDB, RocksDB, or others). There are indeed benefits from this approach, e.g., you can build domain-specific compression algorithms that are better suited for a particular use case, as InfluxDB has done with its use of Facebook’s Gorilla encoding.

Yet these design decisions have significant implications that affect reliability. First, InfluxDB has to implement the full suite of fault-tolerance mechanisms, including replication, high availability, and backup/restore. Second, InfluxDB is responsible for its on-disk reliability, e.g., to make sure all its data structures are both durable and resist data corruption across failures (and even failures during the recovery of failures).

Due to its architectural decisions, on the other hand, TimescaleDB instead relies on the 25+ years of hard, careful engineering work that the entire PostgreSQL community has done to build a rock-solid database that can support truly mission-critical applications.

In fact, this was at the core of my co-founder’s launch post about TimescaleDB: When Boring is Awesome. Stateless microservices may crash and reboot, or trivially scale up and down. In fact, this is the entire “recovery-oriented computing” philosophy, as well as the thinking behind the new “serverless” design pattern. But your database needs to actually persist data, and should not wake you up at 3am because it’s in some broken state.

So let us return to these two aspects of reliability.

First, programs can crash, servers can encounter hardware or power failures, disks can fail or experience corruption. You can mitigate but not eliminate this risk, e.g., robust software engineering practices, uninterrupted power supplies, disk RAID; it’s a fact of life for systems. In response, databases have been built with an array of mechanisms to further reduce such risk, including streaming replication to replicas, full-snapshot backup and recovery, streaming backups, robust data export tools, etc.

Given TimescaleDB’s design, it’s able to leverage the full complement of tools that the Postgres ecosystem offers and has rigorously tested, and all of these are available in open-source: streaming replication for high availability and read-only replicas, pg_dump and pg_recovery for full database snapshots, pg_basebackup and log shipping / streaming for incremental backups and arbitrary point-in-time recovery, WAL-E for continuous archiving to cloud storage, and robust COPY FROM and COPY TO tools for quickly importing/exporting data with a variety of formats.

InfluxDB, on the hand, has had to build all these tools from scratch. In fact, it doesn’t offer many of these capabilities even today. It initially offered replication and high availability in its open source, but subsequently pulled this capability out of open source and into its enterprise product. Its backup tools have the ability to perform a full snapshot and recover to this point-in-time, and only recently added some support for a manual form of incremental backups. (That said, its approach of performing incremental backups based on database time ranges seems quite risky from a correctness perspectness, given that timestamped data may arrive out-of-order, and thus the incremental backups -since some time period would not reflect this late data.) And its ability to easily and safely export large volumes of data is also quite limited. We’ve heard from many users (including Timescale engineers in their past careers) that had to write custom scripts to safely export data; asking for more than a few 10,000s of datapoints would cause the database to out-of-memory error and crash.

Second, databases need to provide strong on-disk reliability and durability, so that once a database has committed to storing a write, it is safely persisted to disk. In fact, for very large data volumes, the same argument even applies to indexing structures, which could otherwise take hours or days to recover; there’s good reason that file systems have moved from painful fsck recovery to journaling mechanisms.

In TimescaleDB, we made the conscious decision not to change the lowest levels of PostgreSQL storage, nor interfere with the proper function of its write-ahead log. (The WAL ensures that as soon a write is accepted, it gets written to an on-disk log to ensure safety and durability, even before the data is written to its final location and all its indexes are safely updated.) These data structures are critical for ensuring consistency and atomicity; they prevent data from becoming lost or corrupted, and ensure safe recovery. This is something the database community (and PostgreSQL) has worked hard on: what happens if your database crashes (and will subsequently try to recover) while it’s already in the middle of recovering from another crash.

InfluxDB had to design and implement all this functionality itself from scratch. This is a notoriously hard problem in databases that typically takes many years or even decades to get correct. Some metrics stores might be okay with occasionally losing data; we see TimescaleDB being used in settings where this is not acceptable. In fact, across all our users and deployments, we’ve had only one report of data being corrupted, which on investigation turned out to be the fault of the commercial SAN the user was employing, not TimescaleDB (and their recovery from backup was successful). InfluxDB forums, on the other hand, are rife with such complaints: “DB lost after restart”, “data loss during high ingest rate”, “data lost from InfluxDB databases”, “unresponsive due to corruption after disk disaster”, “data messed up after restoring multiple databases”, and so on.

These challenges and problems are not unique to InfluxDB, and every developer of a reliable, stateful service must grapple with them. Every database goes through a period when it sometimes loses data because its really, really hard to get all the corner cases right. And eventually, all those corner cases come to haunt some operator. But PostgreSQL went through this period in the 1990s, while InfluxDB still needs to figure these things out.

These architectural decisions have thus allowed TimescaleDB to provide a level of reliability far beyond its years, as it stands on the proverbial “shoulders of giants”. Indeed, just one month after we first released TimescaleDB in April 2017, it was deployed at the operator-facing dashboards in 47 power plants across Europe and Latin America. And so while InfluxDB (2013) was released several years before TimescaleDB (2017), we believe it still has many years of dedicated engineering effort just to catch up, specifically because it was built from scratch.

Performance

Now, let’s get into some hard numbers with a quantitative comparison of the two databases across a variety of insert and read workloads.

Note: We recently released all the code and data used for the below benchmarks as part of the open-source Time Series Benchmark Suite (TSBS) (Github, announcement).

We used the following setup for each database:

  • TimescaleDB version 0.10.1, InfluxDB version 1.5.2
  • 1 remote client machine, 1 database server, both in the same cloud datacenter
  • Azure instance: Standard DS4 v2 (8 vCPU, 28 GB memory)
  • 4 1-TB disks in a raid0 configuration (EXT4 filesystem)
  • Both databases were given all available memory
  • Dataset: 100–4,000 simulated devices generated 1–10 CPU metrics every 10 seconds for 3 full days (~100M reading intervals, ~1B metrics)
  • 10K batch size was used for both on inserts
  • For TimescaleDB, we set the chunk size to 12 hours, resulting in 6 total chunks (more here)
  • For InfluxDB, we enabled the TSI (time series index)

Insert performance

On inserts, the results are fairly clear: For workloads with extremely low cardinality, InfluxDB outperforms Timescale by over 2x. However, as cardinality moderately increases, InfluxDB performance drops dramatically due to its reliance on time-structured merge trees (which, similar to the log-structured merge trees it is modeled after, suffers with higher-cardinality datasets). This of course should be no surprise, as high cardinality is a well known Achilles heel for InfluxDB (source: Github, Forums). In comparison, TimescaleDB only sees a moderate drop off as cardinality increases, and very quickly surpasses InfluxDB in terms of insert performance.

That said, it is worth doing an honest analysis of your insert needs. If your insert performance is far below these benchmarks (e.g., if it is 2,000 rows / second), then insert performance will not be your bottleneck, and this comparison becomes moot.

Note: These metrics are measured in terms of rows per second (in the case of InfluxDB, defined as a collection of metrics recorded at the same time). If you are collecting multiple metrics per row, then the total number of metrics per second can be much higher. For example, in our [4,000 devices x 10 metrics] test, you would multiply [rows per second] by [10], resulting in 1.44M metrics/sec for TimescaleDB and 0.56M metrics/sec for InfluxDB.

Insert performance summary

  • On inserts, for workloads with very low cardinality (e.g., 100 devices sending 1 metric), InfluxDB outperforms TimescaleDB.
  • As cardinality increases, InfluxDB insert performance drops off faster than that with TimescaleDB.
  • For workloads with moderate to high cardinality (e.g., 100 devices sending 10 metrics), TimescaleDB outperforms InfluxDB.
  • Be aware of your insert needs; these limits may not be your bottleneck.

Read latency

On read (i.e., query) latency, the results are more complex. This is because, unlike inserts which primarily vary on cardinality size (and perhaps also batch size), the universe of possible queries is essentially infinite, especially with a language as powerful as SQL. With that in mind, we’ve found that the best way to benchmark read latency is to do it with the actual queries you plan to execute.

That said, internally we use a broad set of queries to mimic the most common query patterns. The results are below, using the same workloads we used for inserts. Latencies in this chart are all shown as milliseconds, with an additional column showing the relative performance of TimescaleDB compared to InfluxDB (highlighted in orange when TimescaleDB is faster, in blue when InfluxDB is faster).

SIMPLE ROLLUPS
For simple rollups (i.e., groupbys) aggregating metrics by time: When aggregating one metric across one host for 1 or 12 hours, or multiple metrics across one or multiple hosts (either for 1 hour or 12 hours), TimescaleDB generally outperforms InfluxDB at low-to-medium cardinality, but at high-cardinality this reverses. The one exception is when aggregating multiple metrics across one host for one hour, where TimescaleDB outperforms InfluxDB regardless of cardinality. When aggregating one metric across several hosts, InfluxDB outperforms TimescaleDB, although the delta decreases as cardinality increases.

DOUBLE ROLLUPS
For double rollups aggregating metrics by time and another dimension (e.g., GROUPBY time, deviceId): When aggregating one metric, InfluxDB outperforms TimescaleDB. But when aggregating multiple metrics, TimescaleDB outperforms InfluxDB.

THRESHOLDS
When selecting rows based on a threshold, TimescaleDB outperforms InfluxDB, except in the one case of computing threshold on one device with a high cardinality dataset.

COMPLEX QUERIES
For complex queries that go beyond rollups or thresholds, there really is no comparison: TimescaleDB vastly outperforms InfluxDB here (in some cases over thousands of times faster). The absolute difference in performance here is actually quite stark: While InfluxDB might be faster by a few milliseconds or tens of milliseconds for some of the single-metric rollups, this difference is mostly indistinguishable to human-facing applications.

For complex queries that go beyond rollups or thresholds, there really is no comparison: TimescaleDB vastly outperforms InfluxDB here (in some cases over thousands of times faster).

Yet for these more complex queries, TimescaleDB provides real-time responses (e.g., 10–100s of milliseconds), while InfluxDB sees significant human-observable delays (tens of seconds). It is also worth noting that were several other complex queries that we couldn’t test because of lack of support from InfluxDB: e.g., joins, window functions, geospatial queries, etc.

Read latency performance summary

  • For simple queries, the results vary quite a bit: there are some where one database is clearly better than the other, while others depend on the cardinality of your dataset. The difference here is often in the range of single-digit to double-digit milliseconds.
  • For complex queries, TimescaleDB vastly outperforms InfluxDB, and supports a broader range of query types. The difference here is often in the range of seconds to tens of seconds.
  • With that in mind, the best way to properly test is to benchmark using the queries you plan to execute.

Stability issues during benchmarking

It is worth noting that we had several operational issues benchmarking InfluxDB as our datasets grew, even with TSI enabled. In particular, as we experimented with higher cardinality data sets (100K+ tags), we ran into trouble with both inserts and queries on InfluxDB (but not TimescaleDB).

While we were able to insert batches of 10K into InfluxDB at lower cardinalities, once we got·to 1M cardinality we would experience timeouts and errors with batch sizes that large. We had to cut our batches down to 1–5K and use client side code to deal with the backpressure incurred at higher cardinalities. We had to force our client code to sleep for up to 20 seconds after requests received errors writing the batches. With TimescaleDB, we were able to write large batches size at higher cardinality without issue.

Starting at 100K cardinality, we also experienced problems with some of our read queries on InfluxDB. Our InfluxDB HTTP connection would error out with a cryptic ‘End of File’ message. When we investigated the InfluxDB server we found out that InfluxDB had consumed all available memory to run the query and subsequently crashed with an Out of Memory error. Since PostgreSQL helpfully allows us to limit system memory usage with settings like shared_buffers and work_mem, this generally was not an issue for TimescaleDB even at higher cardinalities.

Stability issues summary

  • Even with TSI, InfluxDB has stability and performance issues at high (100K+) cardinalities.

Ecosystem

The database can only do so much, which is when one typically turns to the broader 3rd party ecosystem for additional capabilities. This is when the size and scope of the ecosystem make a large difference.

TimescaleDB’s approach of embracing SQL pays large dividends, as it allows TimescaleDB to speak with any tool that speaks SQL. In contrast, the non-SQL strategy chosen by InfluxDB isolates the database, and limits how InfluxDB can be used by its developers.

Having a broad ecosystem makes deployment easier. For example, if one is already using Tableau to visualize data, or Apache Spark for data processing, TimescaleDB can plug right into the existing infrastructure due to its compatible connectors.

Here is a non-exhaustive list of 1st party (e.g., the components of the InfluxData TICK stack) and 3rd party tools that connect with either database, to show the relative difference in the two database ecosystems.

For the open-source projects below, to reflect the popularity of the projects, we included the number of GitHub stars they had as of publication in parentheses, e.g., Apache Kafka (9k+). For many of the unofficial projects for InfluxDB, for example, the unofficial supporting project was often very early (very few stars) or inactive (no updates in months or years).

View full spreadsheet for project links.

Operational Management

Even if a database satisfies all the above needs, it still needs to work, and someone needs to operate it.

Based on our experience, operational management requirements typically boil down to these categories: high availability, resource consumption (memory, disk, cpu), general tooling.

High availability

No matter how reliable the database, at some point your node will go down: hardware errors, disk errors, or some other unrecoverable issue. At that point, you will want to ensure you have a standby available for failover with no loss of data.

TimescaleDB supports high availability via PostgreSQL streaming replication (as explained in this tutorial). At one point, open source InfluxDB offered high availability via InfluxDB-relay, but it appears to be a dormant project (last update November 2016). Today InfluxDB HA is only offered by their enterprise version.

Resource consumption

Memory usage
For memory utilization, cardinality again plays a large role. Below are some graphs using the same workloads we used earlier for measuring insert performance.

At low cardinality (100 devices sending one metric), InfluxDB requires less memory than TimescaleDB:

Note: Both databases are inserting the same volume of data but take different amounts of time, which is why both line plots above and below don’t end at the same time.

However, as cardinality increases (100,000 devices sending 10 metrics), InfluxDB memory consumption far outpaces that of TimescaleDB (and with more volatility):

In particular, as far as we could tell, there was no way to limit total memory consumed by the InfluxDB TSI. So at higher cardinalities, InfluxDB would run out of memory on inserts, which would lead to the database crashing and restarting.

Disk usage
InfluxDB, like most databases that use a column-oriented approach, offers significantly better on-disk compression than PostgreSQL and TimescaleDB.

With the dataset used for the performance benchmarks, here’s how the two databases fared at the varying cardinalities:

  • 100 devices x 1 metric x 30 days: InfluxDB (12MB) vs. TimescaleDB (700MB) = 59x
  • 100 devices x 10 metrics x 30 days: InfluxDB (113MB) vs. TimescaleDB (1400MB) = 12x
  • 4,000 devices x 10 metrics x 3 days: InfluxDB (769MB) vs. TimescaleDB (5900MB) = 8x

Note: Disk size benchmarks were run using ZFS. Numbers do not include WAL size, as that is configurable by the user.

If minimizing disk storage is a primary requirement for your workload, then this is a big difference, and you may want to consider InfluxDB.

However, as we saw earlier, depending on your workload InfluxDB may also require much more memory. Given that memory is typically 100x-1000x more expensive than disk, trading off high disk usage for lower memory may be worthwhile for certain workloads.

TimescaleDB also allows one to elastically scale the number of disks associated with a hypertable without any data migration, which is another option to offset the higher disk consumption, particularly in SAN or cloud contexts. Users have scaled a single TimescaleDB node to 10s of TB using this method.

The other cost of InfluxDB’s better on-disk compression is that it required the developers to rewrite the backend storage engine from scratch, which raises reliability challenges.

CPU usage
According to an external comparison by DNSFilter, using TimescaleDB resulted in 10x better resource utilization (even with 30% higher requests) when compared to InfluxDB:

Source: DNSFilter Comparison (March 2018)

General tooling

When operating TimescaleDB, one inherits all of the battle-tested tools that exist in the PostgreSQL ecosystem: pg_dump and pg_restore for backup/restore, HA/failover tools like Patroni, load balancing tools for clustering reads like Pgpool, etc. Since TimescaleDB looks and feels like PostgreSQL, there are minimal operational learning curves. TimescaleDB “just works”, as one would expect from PostgreSQL.

For operating InfluxDB, one is limited to the tools that the Influx team has built: backup, restore, internal monitoring, etc.

Company and Community Support

Finally, when investing in an open source technology primarily developed by a company, you are implicitly also investing in that company’s ability to serve you.

With that in mind, let’s note the differences between Timescale and InfluxData, the companies behind TimescaleDB and InfluxDB, when it comes to company size and maturity, starting with funding.

In January of this year, Timescale announced it raised $16M (combined of Series A and Seed financing). Meanwhile, this past February, InfluxData announced it closed a $35M Series C round of financing, increasing their total amount raised to $59.9M.

These levels of fundraising directly correlate with each organization’s respective history. TimescaleDB was officially launched on April 4, 2017 (a year and four months from time of publishing this post). InfluxDB was initially released back in September 2013 (around five years from this publishing date).

These varying fundraising amounts and histories also correspond to a large difference in the two companies’ technical and product approaches.

InfluxData needed to raise a large amount of money and build a large team to develop all of the internals required to build a production-ready database. In contrast, because TimescaleDB is developed on top of PostgreSQL, its engineering team needed to spend less effort creating basic database building blocks. Instead, Timescale has been able to focus on more advanced features directly related to time-series workloads, as well as user support, despite having the smaller engineering team.

This difference can also be seen in how much less time it took TimescaleDB to reach production-level maturity than InfluxDB (and perhaps even greater reliability by some metrics).

Additionally, sometimes database support comes not from the company, but from the community. InfluxData is building their community from scratch, while Timescale is able to inherit and build on PostgreSQL’s community.

And the PostgreSQL community is quite large. One can see its size by comparing the 77x difference in StackOverflow articles for “PostgreSQL” (88,245 as of this publication) versus those for “InfluxDB” (1,141). Because TimescaleDB operates just like PostgreSQL, many of these PostgreSQL questions are relevant to TimescaleDB. So if you are new to TimescaleDB (or PostgreSQL), there are many resources available to help get you started. Alternatively, if you are already a PostgreSQL expert, you will already know how to use TimescaleDB.

At the end of the day, both Timescale and InfluxData are stable companies dedicated to their users.

Summary

One of the worst mistakes a business can make is investing in a technology that will limit it in the future, let alone be the wrong fit today. That’s why we encourage you to take a step back and analyze your stack before you find your database infrastructure crumbling to the ground.

In this post, we performed a detailed comparison of TimescaleDB and InfluxDB. We don’t claim to be InfluxDB experts, so we’re open to any suggestions on how to improve this comparison. In general, we aim to be as transparent as possible about our data models, methodologies, and analysis, and we welcome feedback. We also encourage readers to raise any concerns about the information we’ve presented in order to help us with benchmarking in the future.

We recognize that TimescaleDB isn’t the only time-series solution and there are situations where it might not be the best answer. And we strive to be upfront in admitting where an alternate solution may be preferable. But we’re always interested in holistically evaluating our solution against that of others, and we’ll continue to share our insights with the greater community.

Like this post? Please recommend and/or share.

Want to learn more? Join our Slack community, follow us here on Medium, check out our GitHub, and sign up for the community mailing below.

--

--

Mike Freedman
Timescale

Co-founder/CTO of @timescaledb. Professor of Computer Science, Princeton University.