State of the (M)art — Big Data & Analytics

Open Factory Editor
Open Factory
Published in
4 min readFeb 17, 2019

It is now pretty widely known that data is a valuable asset enabling businesses to both inform and validate business processes. It is also widely known that business processes are vastly different across business sizes and industries and so it should come as no surprise that enterprise data architectures are also vastly different across business sizes and industries.

Now it is also evident that the marketing hype in the data industry confuses businesses more than it helps them to define and design their data journey. Thus, it becomes imperative for businesses to have a data strategy in place before putting their money into expensive data solutions.

Data architecture is the starting point to realize your data strategy and involves multiple internal stakeholders defining everything from the data governance policies to the IT infrastructure for various data operations.

Data architecture in the context of big data is defined by, among other things, the nature of data processing systems — online, nearline and offline and the nature of data itself — data volume, data velocity, data variety and data temperature.

The data strategy of a business can be driven by single or multiple business functions — marketing, sales, operations, engineering, etc. Also, a business whose primary objective is building data products — search engines, ad networks — will have a very different data architecture from other data-driven businesses.

With the emerging business need for data analytics, it is not a coincidence that the data market has ever-expanding categories of tools — relational databases, NoSQL databases, data warehouses, data lakes, data ingestion frameworks, batch processing engines, stream processing engines, unified data processing engines, in-memory data grids (IMDG), etc.

Lambda architecture for Big Data processing. Illustration by Microsoft Azure.

Furthermore, the big data tools can be combined using a growing number of data processing architectures — Lambda and Kappa, among others.

The emergence of big data has also created the need for new industry roles — data scientists and data engineers in addition to existing data roles — data analysts, data warehouse admins and data architects.

It is important to note here that big data and data analytics are not the same thing (as is also evident from the title of this post). Every business has different data needs and not every business generates big data. The nature of business dictates its data strategy which in turn dictates its data architecture and the tools and human skills required to realize its data strategy.

It is also useful to note that most of the big data tools are creations of the big web companies —the biggest search engines (Google and Yahoo) and the biggest social networks (Facebook, Twitter and LinkedIn) where the data is massive and largely semi-structured and unstructured — common characteristics of big data. These big data tools may or may not make sense for your business depending on its unique data needs and a traditional enterprise data warehouse (EDW) with comparatively smaller amounts of structured data may make more sense for your business than putting money into the big data solutions. Similarly, spinning up data analyst roles may make more sense for your business than spinning up data scientist roles.

A Big Data solution architecture on Google Cloud Platform. Illustration by Google Cloud.

Data analytics is one of the best served product categories with cloud providers and consequently, each category of data tools (mentioned earlier in this post) has a home-baked headlining product available with all major cloud providers. This ready availability of cloud data tools implies that businesses can easily outsource most data operations (and the associated data roles — data engineers and data warehouse admins) to cloud providers. For most businesses, data operations per se are a non-differentiating business function and so it makes sense for businesses to outsource the expertise to cloud providers and move straight onto data processing to gain valuable insights from data and drive business growth.

A Big Data solution architecture on AWS. Illustration by Amazon Web Services.

In conclusion, it is vital for businesses to have a well-defined data strategy meant to capture their unique data needs and subsequently guide their data architecture and investment in data solutions and data personnel. A useful comparison here would be between the traditional big-box stores (using Walmart’s Beer and Diapers parable as the context) and the recently introduced automated self-checkout stores (think Amazon Go). Both are data-driven businesses but their data needs are very different. The nature of data — volume, velocity, variety — generated by an automated self-checkout store is very different and so it requires very different tools and human skills to perform efficient and effective data analytics.

Open Factory provides Big Data consultancy at various forums. Drop us a message on our website if you need help with your data strategy.

--

--