DynamoDB vs. Hadoop vs. MongoDB

Are all NoSQL systems the same?

SeattleDataGuy
Oct 6 · 8 min read
Photo by Campaign Creators on Unsplash

The best database for your current business needs is usually dependent on the skill set of your dev team and the applications in place already.

Understanding which database system will best fit your company’s current and future needs is an important step. Databases play a crucial role in all industries and organizations.

Thus, picking the system that is the best fit both from a requirements stand-point as well as price-point can be the difference between a failed project and a successful strategy implementation.

With the ever-expanding landscape of ways your company can store data, we wanted to compare some of the more modern database systems companies are using.

Understanding what DynamoDB, Hadoop, and MongoDB offer will help you make a better decision for your business model. All of these systems are not necessarily interchangeable, and in some cases are more like comparing apples and oranges. However, because they all generally fall under the NoSQL umbrella, they often get clustered together.

So, let’s start with an introduction for each of these systems, followed by comparing them.


What Is DynamoDB?

Amazon DynamoDB (from AWS Database Blog)

Amazon DynamoDB (from AWS database blog)

Created by Amazon, DynamoDB is an exclusive NoSQL database service obtainable as part of the portfolio on Amazon Web Services (AWS). The term originates from Dynamo, a highly accessible key-value store established in response to Amazon’s e-commerce holiday outages in 2004.

At first, only a few teams adopted Dynamo within Amazon due to its high operational complication and trade-offs required between data consistency, performance, query flexibility, and reliability.

Also, during this period, Amazon developers preferred SimpleDB, its primary NoSQL database, which relieved users of database administration tasks. But SimpleDB faced several limitations which eventually limited its use.

Launched in 2012, DynamoDB is a database service on AWS created to tackle the barriers of both Dynamo and SimpleDB.

What Is Hadoop?

Apache Hadoop is a framework that allows for distributed processing of large data sets through computer clusters via simple programming models. Hadoop is designed to expand from single servers to multiple machines, with individual devices contributing local computation and storage.

Instead of relying on hardware to distribute high availability, Hadoop itself is designed to detect and handle failures at the application layer.

What is Hadoop (from IBM big data and Analytics Hub)

An in-depth look shows even more magic as Hadoop is practically modular. This concept implies you can exchange almost any parts for various software tools. This process enables an incredibly flexible architecture, that is also effective and robust.


What Is MongoDB?

MongoDB is a non-tabular and open database created by MongoDB Inc. The originators initially focused on creating a platform that uses completely open-source parts, but with the struggle to get an existing database to meet their requirements for building services in the cloud, led them to start creating a personal database system.

MongoDB (from MongoDB Sharded Cluster)

After realizing the possibilities of creating database software, the team shifted focus to creating MongoDB. Released in 2009, MongoDB is intended to create a technological foundation that enables development teams through distributed systems design, document data models, and unified experience.

In 2016, MongoDB announced its fully managed cloud database service, MongoDB Atlas. MongoDB Atlas provides genuine MongoDB which allows users to get rid of specific operational tasks.

Now, the differences.


Ease of Use, Setup, Admin

DynamoDB

There is no need to bother about operational concerns or additional hardware provisions. This approach makes DynamoDB very easy to get started.

Hadoop

Of course, this means you need to be comfortable with command-line, as well as understanding how to set up hardware. Due to the complexity, there have been multiple companies, such as Cloudera, that help you manage Hadoop with less heavy lifting.

If done well, using a third-party could save you hundreds of thousands in personal costs (because hiring a Hadoop engineer is often upwards of 150k for one).

MongoDB


Quality of Support

DynamoDB

The DynamoDB community offers sample applications, drivers, extensions, and tools. In addition, since DynamoDB is part of AWS, depending on the size of your business, you might get further support directly from Amazon.

Hadoop

Personally, we do feel it can be one of the more difficult systems to get support for if you are only referring to the original software.

However, there are so many third-parties that have stepped in to abstract you away from this, we think most large organizations are fine to consider Hadoop as a data storage system.

MongoDB

The MongoDB community also provides information about events, MongoDB University, user groups, and webinars.


Database Structure

DynamoDB

  • The table involves a collection of items, and the individual item is an assembly of attributes.
  • Also, DynamoDB employs primary keys in exclusively identifying the individual item in a table.
  • The use of secondary indexes offers more flexibility in querying.

MongoDB

The collection of documents in MongoDB does not entail predefined columns and structures that can differ for various documents. Several features of MongoDB in relational databases include:

  • Easy-to-read query language.
  • Strong consistency.

As it’s schema-free, MongoDB permits the creation of documents without the need to create the document structures first.

A key contrast with Relational Database Management Systems (RDBMS) includes:

Table | Column | Value | Records

When compared to MongoDB, it includes:

Collection | Key | Value | Document

This approach implies collections and tables are similar for MongoDB and RDBMS respectively. Also, Documents bear resemblance to Records.

Hadoop

All data in Hadoop is stored as a file system, and other techs like Hive and Impala add schema to objects which enables viewing of the underlying data in table format.

If you are managing Hadoop in itself from the original software, this can become very complex because the filetypes you pick and encode play a huge role in everything, from speed to space. It can also be very difficult to reverse specific decisions.


Right for Your Business

DynamoDB

Bear in mind; you may not have access to embedded data structures like you do on MongoDB.

Hadoop

Hadoop can also play useful roles in building future enterprise data hubs. It can be difficult to manage (depending on how you decide to manage it, with or without a third-party) but it also provides a lot of advantages.

MongoDB

It also plays a great role in web development because it can make passing document style data easy from the back end to front end. This makes it an easy option for companies that create content management systems.


Performance Issues

DynamoDB

  • DynamoDB’s pricing model is very expensive.
  • Low latency reads.
  • Geo-distribution issues.
  • Not so easy to set up a CI/CD.
  • Identification of the exact key that leads to partition is complicated.
  • Durable consistency is not widely available.
  • No ACID transactions and secondary indexes.

Hadoop

  • DataNode and NameNode slowdowns.
  • Map reduce data locality.
  • TaskTracker performance and effects on shuffle time.

MongoDB

  • It is vital to design indexes in conjunction with access patterns and schema.
  • Problems with large objects, and unusually large arrays.
  • Settings for security and durability remains a concern.
  • There is no query optimizer.

Besides these differences, it is always interesting to see what tools are floating around to help further support each of these systems.

So, let’s take a look at a few:

Rockset

That’s the big benefit of Rockset. Using this tool, your team doesn’t need to be familiar with another query language.

NoSQLBooster

So, not only does it make managing your databases easier (think using SQL Server Management Studio) but it also can make it easier for analysts to run queries to answer business questions.

Sqoop


Conclusion

The key points highlighted above are intended to help you make better decisions about these database systems. Depending on your organizational size, adopting any of these database systems offers highly diverse data types, efficient application management, and more.

Better Programming

Advice for programmers.

SeattleDataGuy

Written by

#Data #Engineer, Strategy Development Consultant and All Around Data Guy #deeplearning #machinelearning #datascience #tech #management http://bit.ly/2uKsTVw

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade