Lindsey Patterson
4 min readJan 12, 2016

Comparing Top 5 Big Data Platforms for Your Business Use

Image by Wikipedia

Businesses collect huge quantities and varieties of data. Traditional relational databases do not scale well for storing and analyzing very large data sets. Apache Hadoop is an open-source data storage and processing technology that scales well for storage and processing of very large data sets at a relatively lower cost. This is because it uses a highly distributed architecture for storage and processing. Apache Hadoop is free to use, but it is very difficult to set up OLAP on Hadoop, as it is very general and not customized for a specific job or industry. In order to implement Hadoop in your organization without going through a lot of hoops, you will need a commercial platform that you can use. There are several commercial Hadoop platforms available in the market, and each has its advantages and disadvantages.

So here is a comparison of the top five big data platforms in the market. You must evaluate them carefully and choose the one that meets your needs.

Amazon Web Services (AWS)
Amazon has been implementing Hadoop on the cloud since the beginning of Hadoop. It has a very good success rate in Hadoop implementations. AWS offers Elastic MapReduce service which is an easy to use big data platform that is built on the HDFS(Hadoop Distributed File System) architecture. AWS Elastic MapReduce services has one of the highest shares of the global Hadoop commercial platform market. It is very well suited for organizations that are looking for public cloud hosted Hadoop platforms. The advantage of this is that they do not have to manage thousands of servers directly, but rent out this infrastructure from Amazon.

Amazon also offers DynamoDB, which is a NoSQL database that was also deployed to run its huge Amazon.com website. Another offering from Amazon is the Redshift Data Warehousing service, which is based on the ParAccel Database Management System. Redshift Data Warehousing service is very cost effective and has costs as low as $1000 per terabyte annually.

Hortonworks
Hortonworks platform is completely built from open source code based on the Apache Software Foundation. They are focused on offering all their innovations through the open source data platform and in helping organizations adopt Hadoop. They offer professional services to enterprises in setting up their Hadoop platform based on open source framework. Its professional services revenue has been rising quite rapidly and have been able to attract new customers.
One of its offerings is Apache Ambari, which is a Hadoop cluster management console that is used for provision, managing and monitoring Hadoop clusters. Hortonworks has corporate partnerships with companies like Microsoft, RedHat, SAP, and Teradata.

Coudera
Cloudera Hadoop Platform has been one of the top Hadoop vendor for businesses since 2008. It currently has the highest market share of 53% of the Hadoop market. It has a high success rate in its implementations and has many big customers. It was founded by a group of engineers from Yahoo, Google and Facebook. It has corporate partnership with organizations like Oracle, IBM, HP, NetApp, and MongoDB. Their implementations make use of their Impala analytics engine. In addition to providing engineering solutions, Cloudera also provides customer support and training.

MapR
MapR has 11% of the Hadoop market and along with Hortonworks, and Cloudera is one of the big three in Hadoop market. MapR platform’s strengths are enterprise grade reliability, data protection, ease of use in integrating with existing environment, and support for real time operations.

IBM Infosphere BigInsights
IBM Infosphere BigInsights is a Hadoop platform that is designed by IBM to be compatible with enterprise level features. BigSheets and BigInsights are provided as a service by IBM through its smartcloud architecture.

As we have seen, there are a number of big data platforms to choose from, in order to implement Hadoop in your organization. If you do not want to maintain thousands of servers, then a platform like Amazon Web service, which provides infrastructure on the cloud, might be suitable for you. Some of these vendors have corporate partnerships with other big corporations and you can even opt for a hybrid approach by mixing and matching offerings from different vendors. For example Cloudera has corporate partnerships with corporations like Oracle, IBM, HP, NetApp and MongoDB. Similarly Hortonworks has corporate partnerships with companies like Microsoft, RedHat, SAP, and Teradata. The advantage of these arrangements is that these companies offer broad-based data management systems that businesses can use as these services as a viable means of business insurance.