PART 2: The Future of Business Intelligence and Data at Dyninno Group

Published in

Dyninno

6 min read1 day ago

Anastasija Grimailova, Data Platform Subdivision Manager, Dyninno Latvia

Welcome back to our series on data engineering and analysis at Dyninno Group. In the first instalment, we covered the fundamental aspects of data-driven decision-making and the significance of data governance and quality. Now we turn our focus towards the transformation and future prospects of data analysis, evaluating the impact of business intelligence systems and the next steps in our data journey.

As mentioned before, Dyninno Group has transitioned to using external Business Intelligence (BI) platforms for enhanced capacity and efficiency. This shift included partnering with the Looker platform and migrating data storage and processing to Amazon Web Services (AWS). This move marked a significant evolution in our data handling capabilities, aligning with our expanding data requirements.

Pros and Cons: Evaluating BI Systems

The pros and cons of a BI system largely depend on the volume of users and data, the diversity of data sources, and the size of the engineering team tasked with improving, fixing, and maintaining the system.

For the purpose of this discussion, let’s consider a scenario that falls between handling big data and having a large team focused on developing a single platform. Given current data volumes, it’s often more advantageous to adopt a pre-built platform tailored to data usage needs.

There are numerous BI platforms available, each with its own strengths, weaknesses, and potential workarounds. I have experience with seven different platforms from various providers, in addition to some exposure to a self-built BI system. These platforms can handle thousands of rows, create interactive graphs, perform aggregations, and can be configured for individual or department-wide use with features like folders, links, alerts, schedules, and publishing options.

Pros of a Self-Built System

• Cost: Since it’s built on open source, there are no fees to third parties, just the human hours for building and maintenance.

• Customization: It can be tailored from the start to meet specific business needs and data usage, with workarounds for open-source limitations.

• Independence: No need to wait for external system updates for new functionalities or bug fixes — the team can handle these internally.

• Familiarity: The team has a thorough understanding of the system, which can lead to better development and user experiences.

Cons of a Self-Built BI System

• Responsibility: The team must manage all maintenance and security, which requires significant development and testing.

• Resource Intensity: Building from scratch could necessitate a larger team.

• Dependence: Reliance on system releases can slow down bug resolution reported by end users.

• Scalability: Ensuring the system can grow with increasing data volumes and evolving business needs is a major challenge.

Infrastructure Setting and Future-proofing

Our Data Warehouse (1) primarily operates on AWS, though we still use legacy PHP fetches and wrappers. Currently, Looker is our sole BI tool, meeting most of our needs, albeit with some workarounds for specific requirements. Besides Looker, a couple of businesses utilize data ingestion pipelines within AWS, but Entertech, Multipass employ Looker for self-service BI.

Looking back, given the team’s scope, knowledge, and data usage at the time, our custom-made solution was appropriate for our needs.

A key learning point is the importance of anticipating future data growth. We’re now focusing on making our system more alert and dynamic, understanding that data volumes will only increase. This involves preparing retention policies and determining appropriate storage types for different data sets and periods, balancing data usage needs against cost considerations.

Goals of the New Data Platform Project

Our current data infrastructure still includes components that have not been fully migrated to the AWS cloud due to dependencies with other projects. Looking ahead, our objective is not just to migrate these “legacy” parts to the cloud, but also to prepare for the next evolutionary steps of our data platform.

Our team’s evolution has been marked by several stages. Stage I involved establishing the team and creating a data warehouse. In Stage II, we developed a self-written data processing and BI platform. Stage III saw partial migration to AWS Cloud and the adoption of Looker as our main BI platform.

This stage enabled a self-service approach for data contributions to our data lake (2) and self-service data analysis on Looker, where the team creates models, but users build their own looks, dashboards, and data analyses.

Currently, we are in Stage IV, which necessitates changes due to the increasing volume of analytics data and its usage. The uncontrolled nature of self-service has raised concerns about data integrity. There’s a growing demand for ad-hoc complex analytic queries, advanced analytics with artificial intelligence and machine learning (AI/ML), and other business needs, all of which call for improvements and adaptations.

To accomplish this stage, we have already audited our current architecture and prepared a roadmap for future steps. Our plan involves moving “legacy” data ingestion patterns to the cloud and adding layers to our data lake, changing the architecture pattern. Our current setup resembles a Data Mesh architecture but lacks controls, ownerships, quality checks, and data governance. We aim to enhance our data pipelines to include these components and start building a Data Hub (3) to help more of our businesses make their data-driven decisions.

Migration Plans and Advanced Analytics

For our next steps, we envision adding more layers and components to ensure quality checks, data completeness, and accuracy, thereby supporting business needs.

We’re also exploring advanced analytics (AI/ML). Our data science team currently uses raw data stored in AWS S3 buckets (4). As part of our evolution, we plan to build an add-on to our Data Hub, enabling global and cross-functional AI/ML capabilities. Harmonized data can be seamlessly integrated and shared across various departments, teams, and systems within an organization. This interoperability facilitates collaboration and enables different stakeholders to access and use the data effectively, leading to improved efficiency and productivity. This consistency reduces the risk of conflicting or contradictory information, making it easier to trust and rely on the data for decision-making. As well by standardizing and cleansing data through harmonization processes, organizations can enhance the overall quality of their data. This includes identifying and resolving inconsistencies, errors, and duplicates, which improves the reliability and trustworthiness of the data for analytical and operational purposes. As well data harmonization can lead to cost savings by reducing data redundancy, minimizing manual data reconciliation efforts, and streamlining data management processes. This allows organizations to allocate resources more efficiently and focus on value-added activities rather than spending time and resources on data cleanup and reconciliation tasks. And also mitigate the risk of making decisions based on unreliable or incomplete information.

If you haven’t read PART 1 yet, it is highly recommended to fully grasp the foundational aspects of Dyninno’s data-driven approach, setting the stage for the innovations and insights discussed in this segment.

(1) A data warehouse is a big storage space for a company’s data. It keeps different types of data from various sources in one place. This data is organized and stored so it can be easily used for analysis and making decisions. Think of it as a large library where data is kept in an orderly way, making it easier for people in the company to find and use the information they need.

(2) Data Lake: A storage repository that holds large amounts of structured and unstructured data in one physical location, like object storage in the Cloud. It employs ELT (Extract, Load, Transform) processes, loading data as is and transforming it as needed. Its primary purpose is data exploration, where users determine suitable queries and scripts for working with the data.

(3) Data Hub: A gateway for distributing required data, aimed at harmonizing various data types so they can be queried and utilized by multiple systems, analytical interfaces, and machine learning/AI models.

(4) AWS buckets are storage spaces in Amazon’s cloud service. They are like online folders where you can store and organize your digital files. Each bucket has a unique name, and you can use them to keep your data organized and accessible over the internet.

PART 2: The Future of Business Intelligence and Data at Dyninno Group

Written by Dyninno Group