Ten Big Data Technologies Informing The Future
As more organizations invest in big data technologies, so the ecosystem of tools and providers is expanding fast. While most of the coverage focuses on analytics, in fact, the range of technologies required for effective big data systems encompasses a plethora of infrastructural, operational and analytical technologies. Learn about the future informing technologies here.
So what are the hottest technologies across the whole piece?
According to Forrester’s survey of 63 big data vendors, among the big data technologies currently generating significant interest among customers are:
Getting data into a state where it can be usefully analyzed has traditionally been a wearying task. “Data preparation software eases the burden of sourcing, shaping, cleansing and sharing diverse and messy data sets to accelerate data’s usefulness for analytics,” explains Forrester. Tools from vendors such as Cambridge Semantics, Paxata, SAS, Teradata and Trifacta make it far easier for non-IT staff such as data scientists and business analysts to conduct analyses as and when they see fit. It allows them to do their own ETL (extract-transform-load) without needing to call on technical IT folk.
These are tools from companies such as Pentaho, Talend, SAP, SAS and IBM that allow firms to orchestrate, integrate and federate data across their big data platforms including Hadoop, Hive, Spark, MapReduce, MongoDB, etc. They are an essential component in enabling a businesses to carry out such functions as real-time data integration, analytic and business intelligence, enterprise search, as well as many others.
Real-time big data processing requires low latency. If you’re trying to provide such features as ‘on-the-fly’ customer personalization, you need to be able to access and manipulate vast quantities of data at speed. Traditional mechanical storage solutions like disk and tape just aren’t fast enough. Often, the only way to do it is by distributing data across DRAM or solid-state drives. Vendors of such in-memory databases and tools include GigaSpaces, VoltDB, Microsoft, IBM and Databricks.
While relational databases still dominate the enterprise database market, NoSQL alternatives — many of them open source — are growing fast thanks to their suitability for use in big data applications. They are essentially grouped into three types: key-value databases which are great for speedy, low-latency access; document databases to query structured and variable data; and graph databases which are optimized for uncovering relationships in data. Among those to check out include: Apache Cassandra, Couchbase, MongoDB and Neo4j.
Predictive analysis tools identify meaningful, previously undetected patterns in big datasets. Whether you want to use big data to make useful predictions about how customers are likely to behave: to optimize your operational processes; manage risks or anything else; predictive analytics tools will increasingly be able to help. They are becoming even more useful as libraries of machine learning algorithms grow in size and scope. Vendors include: Agnoss, RapidMiner, Revolution Analytics, Salford Systems and X15.
As more businesses attempt to extract meaning from multiple streams of data in real time — for example, when building Internet of Things (IoT), smart city or real-time customer interaction applications — they will need a way to filter, aggregate and analyze multiple streams of data at speed. Tools that do this are called stream analytics. This is another area likely to grow significantly in future. Tools and vendors include: Apache Spark Streaming, Apache Storm, Data Torrent, Striim and Vitria.
Data Governance And Auditing
The unstructured, distributed nature of much big data requires new approaches to governance. While many companies have side-stepped governance issues, in a bid to get their big data systems up and running, it’s likely to become far more prominent going forward. This is as businesses work to ensure they mitigate the growing risk of financial and reputational damage due to breaches of sensitive data. “Governance and auditing tools observe real-time activities to detect potential security breaches and unusual patterns of user access, based on defined policies,” explains Forrester’s report. Vendors in this space include: Adaptive Insights, Centrify, Imperva, Varonis, and Vormetric.
Data Modelling And Metadata Management
Many businesses want to use more traditional ways to understand their big data, which typically means performing SQL-like queries. Data modeling and metadata management tools make this much easier by automatically detecting and overlaying a schema onto unstructured, distributed big data when ingesting, writing or processing it. Tools include: Apache Hcatalog, Hive, Cloudera Navigator, IBM Data Explorer and FICO.
Data Encryption And Anonymization
The sensitive nature of some big data sets, and the increasing ease with which hackers are able to bypass traditional security systems, means there has to be shift in focus to protecting that sensitive data where it resides, by anonymizing and/or encrypting it. Tools for doing so will become increasingly critical for security and compliance in a big data world. Among the vendors producing tools to help are: Dataguise, SafeNet (Gemalto), Vormetric and Zettaset.
Given the complexity and cost of setting up and running an effective big data infrastructure, many (particularly smaller) companies will increasingly choose to use cloud-based big-data-as-a-service technology. We’re still at fairly early days with BDaaS, but as more vendors shape their offerings and make them available cost-effectively in the cloud, so this could become a key growth area. Those already operating in this space include: 1010Data, Big Step, Cazena and Qubole.
Don’t miss another 100TB post. Sign up for our weekly newsletter.
Originally published at blog.100tb.com.