Why you should as a developer care about Azure Cosmos DB

Syed Hassaan Ahmed
Jul 30, 2017 · 5 min read

I recently had the opportunity to speak at Big Data, Helsinki organized by Dataconomy and Microsoft Flux (If you’re ever in Helsinki, it pays to visit them and check out the amazing community they’ve built in such a short time). I deliberately chose to tell the story of Cosmos DB because its relatively fresh in the market and is rich with capabilities that ignite data folks. Plenty of questions were asked at the end of the talk so I decided, why not summarize all that and turn it into my first tech blog post :)

#BigDataHelsinki Microsoft Flux

What is Cosmos DB?

Modern applications (Web, Mobile, Gaming, IoT, AI, ML, <insert_more_buzzwords_here>) are increasingly focused on reaching a geographically broad audience. These global ambitions bring a multitude of technical challenges such as responsiveness, high availability, elasticity of compute and storage. Most cloud providers are well-equipped with tools that solve the application aspect of these challenges, yet today’s popular databases (relational and NoSQL alike) were never designed to run in the cloud!

Cosmos DB is a born-in-the-cloud globally-distributed, multi-model, fully managed database service which provides elastic scale out (throughput and storage independently!), guaranteed low latency at 99th percentile, multiple well-defined consistency models, SSD-backed storage and comprehensive SLA (four 9s).

These are big words! but to give you a bit of context, Cosmos DB started out in 2010, internally known as Project Florence was designed with feedback from demanding customers such as Bing, Office 365 and Skype. So even though its a relatively new offering in public, its been internally battle-tested on petabyte scale. To support this argument let’s further dive into some of the above promises.


Global Distribution

Cosmos DB is one of the foundation services in Azure, which means every time a new region lights up (and soon it will be the case in South Africa), it will automatically be there! Not to forget that Azure has more regions globally than AWS and Google!

Its easy to misunderstand this capability for disaster recovery but Geo-DR is a strict subset of the overall picture. Client SDKs accept an ordered list of preferred regions to read from without the need to change connection string in app config. Furthermore, not only does Cosmos DB support automatic failover, it even permits manually simulating a regional failover!


Multi-model Multi-API

A positive trend which came with Microservices is Polyglot persistence. Developers are increasingly selecting type of data stores on a need basis (e.g one service stores JSON documents, another uses key-value). Cosmos DB allows developers the flexibility to choose from a variety of familiar APIs: SQL, JavaScript, Mongo, Gremlin (Graph), Azure Table Storage and very recently Cassandra (CQL). This way developers can bring existing apps to the cloud and continue developing those apps with skill-sets already possessed.

This is usually as simple as (though not always due to some APIs currently in preview) spinning up a new Cosmos DB instance in Azure portal, and replacing connection string in your existing App! In order to facilitate the above natively, Cosmos DB engine uses Atom Record Sequence (ARS) as its core type system.


Well-defined Consistency models

When it comes to consistency, most databases limit us with binary choices: strong consistency i.e linearizability (typically RDBMS) or eventual consistency i.e bit of anarchy (typically NoSQL). Cosmos DB capitalizes on PACELC theorem and makes sure you don’t have to take the red/blue pill by allowing intermediate choices with useful trade-offs: Bounded Staleness, Session (most widely used) and Consistent prefix. In fact Microsoft’s telemetry data shows that more than 90% of customers are NOT using strong or eventual consistency.


Q&A

Q: I wanna try it out without paying for it!

A: Develop locally with the high-fidelity emulator. There is even a Docker for windows image!

Q: How do I migrate from XYZ data store?

A: On top of supporting standard tools (e.g mongoimport, mongorestore), Cosmos DB provides a visual migration tool. Have json/csv files? Relational data in SQL Server? Data in another cloud? (e.g AWS DynamoDB), Graphs in Neo4j? bring it on!

Q: How do I monitor requests, usage and storage?

A: Through metrics blade in Azure portal which allows you to compare actual value with guaranteed SLA side-by-side. Would like to do it programatically? Use Azure management REST APIs!

Q: What is this RU you speak of?

A: RU (Request Unit) aka Cosmos DB bitcoin, is the measure of throughput. (e.g 1 RU = GET of a 1 KB document). Every operation (reads, writes, SQL queries and stored procedure executions) has a deterministic RU value that’s based on the throughput required to complete the operation. Want to estimate how many RUs your application will consume? Try out this online capacity planner. (Thomas Weiss has covered RUs in a much more comprehensive blog post here)

Q: Whats the difference between Azure Table Storage and the Table APIs provided by Cosmos DB?

A: While Table Storage is aimed at high capacity in a single region and storage-optimized pricing; Cosmos DB Tables aim for single-digit millisecond latency, global distribution, SLA-backed predictive performance with automatic indexing, hence a pricing model focused on throughput.

Q: There has to be something this database can’t do right?

A: Yes! even though it provides ACID guarantees within the transaction scope of a single partition, its not an RDBMS. If you need relational capabilities, Azure offers SQL Server, MySQL, Postgre and MariaDB as fully managed services.

Q: Its too damn expensive!!!

A: In Cosmos DB you don’t have to rent VMs, deploy software, manage updates or monitor databases. You don’t need to reserve read/write capacities or provision CPU, Memory and IOPS. You pay for only what you use. Still not convinced? This Total Cost of (non) Ownership whitepaper explains in depth.


Call to action

Interested in Cosmos DB? Go to cosmosdb.com

Want to accelerate real-time big data analytics? There is a Spark connector at your disposal.

Want to see whats coming next? Check out Azure roadmap.

Tried it out and have feedback for the team? Give a shout-out to askcosmosdb@microsoft.com

Syed Hassaan Ahmed

Written by

Senior Software Engineer @microsoft I enjoy Data Engineering on @azure. Opinions are my own.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade