Companies Shaping the Future of Big Data

You’ve likely heard about big data. It’s been a thing for a while now. If you haven’t, it’s high time to slide out from under that rock because “Big Data 2.0” is already here.

Big data guides decisions across every industry, even those far from the realm of Silicon Valley. It helps UPS to optimize its operations, Amazon to sell more stuff, doctors to fight cancer and the US Government to track down terrorists. Large tech companies like Tableau have already been built around helping companies better understand data through visualization. And the innovations have only just begun. Whether the goal is to “save the shire” or just improve the bottom line, firms in the financial services to consumer products industries have been able to make better decisions using data. Companies like Cloudera and Hortonworks have built their business models around helping these firms employ open source frameworks and make sense of firehoses of data.

What is big data?

Big data means petabytes or even exabytes of data. It is commonly defined by its high volume, high velocity and high variety. Now that big data has become well established and widely utilized, new companies are beginning to emerge with the goal of continuing to enable companies to derive competitive advantages from big data. Before moving on to what’s next, let’s review what is already here.

To Cloudera and beyond

According to Cloudera, their services “[give] you the power to ask bigger questions — and that makes anything possible.” This bold statement serves to epitomize big data and the value it has brought thus far to the business world and beyond. Currently ranked #22 in Fortune’s Unicorn List, Cloudera has a valuation somewhere around $4.1 Billion. According to Magister Advisors, Cloudera has made some “fundamental [innovations]” in order to reach that mark while some other tech unicorns* have maybe not created as much real value. Cloudera has made a significant impact in big data and the tech world at large but there’s much more to be done. What companies are next? What will compel other businesses to purchase their services?

(Con)fluent in Kafka

Before continuing on to specific companies let’s talk about Kafka, an emerging big data technology. Kafka is an open source data processing software developed by engineers at LinkedIn to help manage immense data streams in real time. According to Forbes, Kafka is “now used by thousands of companies and tens of thousands of users, with downloads up 400% in the first six months of 2015”. Kafka was built to solve a big existing problem at a single company and now Twitter uses it to manage tweet analytics, Netflix for movie recommendations and Uber for surge pricing.

“Kafka is an integral part of big data 2.0”

In a way, Kafka is an integral part of big data 2.0 as “a single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.” Critically, the Kafka “log is always persistent” and very cheap. The engineers from LinkedIn decided to start a company around Kafka “[intending] to build a business selling management tools and services that make it easier to run Kafka”. Confluent is the name of that company and their revenue will approach $10 million next year without a sales team (a la our shire saving friends).

Companies shaping the future of big data

Kafka is groundbreaking and Magister would likely classify it as a meaningful innovation. But what else is out there? Here are six more (plus a bonus) tech companies currently shaping the future of big data:

  • Flatiron Health — organizes the world’s oncology information and makes it useful for patients, physicians, life science companies and researchers
  • Scaled Inference — machine learning-as-a-service “where machines are given the data and the problem” (big data meets machine learning)
  • Tamr — machine-learning algorithms and software that organizes data sets, especially if the data comes from numerous data sources and has not been organized in any way; learns what the humans need from data and gets better at finding that specific information
  • Looker— By connecting directly to their data source, businesses can create and enhance their data model in seconds and minutes; enabling data analysts go “from fire-fighters to story tellers”
  • ThoughtSpot—their Relational Search Appliance combines data from on-premise, cloud and desktop data sources and enables enterprise users to access them through a simple search interface while providing enterprise-class scalability, security and manageability for IT
  • Interana — their events-based analytical software works with clickstream data and other “events-based” information to help users answer questions about how customers behave and how products are used

You can find deep insights about innovative big data companies on the 500 Miles mobile app. Insights about these firms (and over 800 more) include the number of new hires and investment rounds in the past year, headcount growth, talent quality, H-1B hiring, salaries offered, number of employee hires and departures, and gender diversity ratios. But where do these insights come from?

We are a big data and machine learning company ourselves at 500 Miles. We apply these technologies to processing and visualizing data related to high growth tech employers*. By democratizing the relevant data for easy consumption, we hope to help more people join career launching companies. Go check out our Big Data Stack, a list of companies shaking up the world of big data.

Download: iOSAndroid

*unicorns — I think you know what this means. If not click here.

*For more about big data at 500 Miles, visit our SlideShare and jump to slide 19. You can find that here.

*many thanks to CRN and Datamation for their lists of innovative big data startups: link to CRN’s list and link to Datamation’s list

*header image courtesy of TechRepublic