Dmitry Anoshin
Rock Your Data
Published in
8 min readFeb 17, 2020

--

“The point of Big Data and analytics is not just to manage more data but to generate insights. Data discovery involves the use of analytic techniques for rapid, ad hoc exploration of multiple types of data. While skills and clear business goals are as important as technology, a data discovery platform should be part of any organization’s array of tools. “ by Forbes Insights

We are living in a century of information technology. There are a lot of electronic devices around us that generate lots of data. For example, you can surf the internet, visit a couple of news portals, order new Airmax on the web store, write a couple of messages to your friend and chat on Facebook. Your every action produces data. And we can multiply action on the number of people who has access to the internet or just use a mobile phone and we get really BIG DATA. Of course, you have a question, how big is it? I suppose, now, it starts from Terabytes or even Petabytes. And the volume is not the only issue, moreover, we struggle with a variety of data. As a result, it is not enough to analyze only structured data. We should deep dive into unstructured data, such as machine data which are generated by various machines.

Nowadays, we should have new core competence — “dealing with big data”, because this vast data volume won’t be just stored, they need to be analyzed and mined for information that management can use in order to make the right business decisions. It helps to make the business more competitive and efficient.

Unfortunately, in the modern organization, there are still many manual steps in order to get data and try to answer your business question. You need to use the help of IT guys or need to wait when new data will be available in the Enterprise Data Warehouse. In addition, you are working with an inflexible BI tool, which can refresh the report or export it to excel. You definitely need a new approach, which gives you a competitive advantage, dramatically reduces errors and accelerates business decisions.

So, we can highlight some of the key points for this kind of analytics:

  • Integrating data from heterogeneous systems
  • Giving more access to data
  • Use sophisticated analytics
  • Reducing manual coding
  • Simplifying processes
  • Reducing time to prepare data
  • Focus on Self-Service
  • Powerful computing resources

And we can continue this list with many other bullet points.

If you are a fan of the traditional BI tool (later in this chapter, we will compare BI and Data Discovery tool), you may think that it is almost impossible. Yes, you are right, it is impossible. That’s why we need to change the rules of the game. As the business world changes, you must change as well.

Maybe, you guessed, what I mean, if no, I can help you. In this book, I will focus on a new approach to doing data analytics, more flexible and powerful. It is called Data Discovery. Of course, we need the right car in order to overcome all the challenges of the modern world. That’s why we have chosen SAP Lumira — one of the most powerful data discovery tools in the modern market. But before diving deep into this amazing tool, let’s consider some of the challenges of data discovery our path as well as data discovery advantages.

Data Discovery Challenges

Let’s imagine that you got several terabytes of data. Unfortunately, it is raw unstructured data. In order to get business insight from this piece of data, you have to spend a lot of time in order to prepare and clean data. In addition, you are restricted to the capabilities of your machine. That’s why a good data discovery tool usually is combined with software and hardware. As a result, it gains you power for exploratory data analysis.

Discovery Analytics

Let’s imagine, that this entire big data store in Data Lake or any NoSQL data store. You have to be at least a good programmer in order to do analytics based on this data. Here we can find another benefit of a good data discovery tool, it gives power tools to the business users, who are not so technical and maybe even don’t know SQL.

Data Discovery vs Business Intelligence

You may be confused about data discovery and business intelligence technologies, it seems they are very close to each other or even bi tools can do all that data discovery can do. And why do we need a separate tools, such as data discovery tools, like Tableau, Datameer, Platfora, PowerBI?

In order to better understand what difference between the two technologies is, you can look at the table below:

Let’s consider the pros and cons of Data Discovery:

Pros:

  • Rapidly analyze data with a short shelf life
  • Ideal for small teams
  • Best for tactical analysis
  • Great for answering on-off questions quickly

Cons:

  • Difficult to handle Enterprise organizations
  • Difficult for junior users
  • A lack of scalability

As a result, it is clear that BI and Data Discovery handles their own tasks and complement each other.

Most of the organizations have a Data Warehouse (DWH). It was planned for supporting daily operations and helps make business decisions. But sometimes organizations meet new challenges. For example, Retail Company wants to improve customer experience and decide to work closely with the customer database. Analysts try to segment customers on cohorts and try to analyze customer’s behavior. They need to handle all customer data, it is quite big data. In addition, they can use external data, in order to learn more about customers. If they start to use the corporate BI tool, every interaction, such as adding a new field or filter can take 10–30 minutes. It is unacceptable in modern business. Analysts want to get an answer to their business questions immediately. And they prefer to visualize data, because, as you know, human perception of visualization much higher than text. In addition, analysts are independent of IT. They have their data discovery tool and they can connect any data sources in the organization and check their crazy hypnotizes.

There are hundreds of examples, where BI and DWH are weak and Data Discovery is strong.

Data Discovery for your Business

Let’s discuss various possible users of data discovery tool and their advantages. Here’s a small schema of possible data flows in an organization:

There are many systems that generate data. Organizations try to catch all the data and put it in Data Warehouse. In addition, for big data volumes, it can use Data Lake because it offers cheap storage and can be used as a staging area for raw data.

On top of DWH, we have Enterprise Business Intelligence tools, like SAP BusinessObjects, Microstrategy, Oracle BI or Cognos BI. It uses data from DWH and gives an opportunity to build reports and visualize data. But it takes a long time to create a semantic layer via bi tools such as the universe, schema or repository. In addition, it requires technical skills. As a result, there is no agility and it is not user-friendly for nontechnical business users.

But fortunately, we have a data discovery tool, which can easily compliment our existing BI tool. Let’s see and compare how data discovery tool can impact on our data flow in the organization:

Data Discovery tool has many advantages for the whole organization. It can be easy connecting with various data source and it doesn’t require any technical skills from business users. Users can combine and merge various data sources and visualize big data sets. They can easily find insights in data and use it for the decision-making process.

In addition, lets now look at the benefits of using Data Discovery tool for various roles of the organization:

  • Business Management — better information for the decision process. Greater accuracy, richer detail, enhanced ability to identify factors affecting business outcomes, and the connections among them.
  • IT managers — eliminate the stream of individual support requests for analytics. Free IT staff for other responsibilities. Improve internal customer satisfaction.
  • IT staff — dispense with oodles of requests for data extracts.
  • Distinguished Analysts — greater productivity, fewer errors, and more time to spend on the most interesting aspects of data analysis. Better and easier data access, better integration among data sources, an opportunity to probe data in greater detail. A broader range of analytic methods.
  • Other data analysis roles — access to greater data variety and depth of data and analysis methods.
  • Everybody else — better information to support daily decision-making at all levels of the organization.

We can continue this list with other roles of your organization and try to guess, what benefits they can get.

Data Discovery Best Practices

In my work, I have to work with data discovery quite often and I want to highlight useful best practices:

  • Agility and rapid cycle iteration — it gives you an opportunity to rapidly discover any set of data and very fast try new hypotheses. In case, if new hypothesis failed you can quickly start with a new one.
  • Begin with the end in mind — even if you don’t know what you should do to find in particular data set you can start to explore data and find business insight. But it is highly recommended that you understand the business process.
  • Take advantage of data insights — during exploring data you can find new valuable information, for example, you recognize a burst in high-volume sales of a new product. Immediately you can deep dive and, for example, answer the question, who is buying this product. We can dive deeper and try to understand more and more. Finally, we can create a new marketing segment based on your valuable insights.

Summary

We can clearly see the difference between traditional (legacy) BI platforms and modern data discovery tools. We see a strong trend that organizations understand the value of Data Discovery, Visual Analytics and Self-Service and we are moving towards the Modern Analytics Data platforms that allows them to deliver faster insights and scale analytics efforts across the organization.

Rock Your Data is a consulting and technology firm with a focus on secure and scalable cloud analytics solutions using top-tier cloud vendors.

Our mission is to provide high-quality analytic solutions for Canadian companies and help them drive business with informed decisions by leveraging powerful and modern cloud capabilities.

--

--