Diving into data modeling with a Software Engineer mindset

Marc Soler Colomer
Bynder Tech
Published in
8 min readJun 17, 2024
Photo by Evaldas Grižas on Unsplash

When you hear Software Engineering, a thousand different concepts may come to your mind, but I’d bet a substantial amount of money that almost all of you have agreed on one: Object-Oriented Programming. It’s one of the first paradigms you encounter when you start taking your first steps in Software, and one that would probably be with you for the rest of your life.

Throughout my career, I’ve been involved mainly in Software Engineering projects, but lately, due to being part of the Analytics team in Bynder, I’ve delved a bit into the world of data. And it turns out there’s also room to OOP there. This may not sound surprising to you, as it’s fairly easy to apply to ETL jobs, but what if I told you that you can apply some of the principles of OOP when modeling in a BI tool?

In this article I’m going to explain how to implement the Inheritance principle with one of the most promising BI tools on the market, Looker.

If you’ve never heard of Looker

Looker is a business intelligence and data visualization platform that helps organizations explore, analyze, and share insights from their data. It was acquired by Google in 2019 and it uses their own modeling language, LookML.

The role of Looker in Bynder

At Bynder, we adopted Looker at the end of 2023. It was the result of an intense project that we carried out in collaboration with our Google Partner, Devoteam G Cloud. Since then, it has become our primary BI tool for showcasing insights to our clients and enhancing their data-driven decision-making process.

The Analytics team leverages Looker to model and organize all the data we gather from our clients. After modeling the data, we craft and embed dashboards and visualizations directly into the application.

Dashboard example for the Lowest-tier Analytics Module in Bynder

Steps to create a visualization and its scalability

The process to create these kinds of visualizations generally involves three steps:

#1 Create Looker Views
Views act as structured representations of your database tables (or derived tables) in the Looker environment, offering the flexibility to extend or refine them as needed adding new dimensions or measures. After connecting to your data source, Looker would help you to create the Views automatically.


# This file is auto-generated by Looker from a table called ASSET_VIEWS
# with all these columns

view: asset_views {
sql_table_name: "ASSET_VIEWS" ;;

dimension: account_id {
type: string
sql: ${TABLE}."ACCOUNT_ID" ;;
}
dimension: user_id {
type: string
sql: ${TABLE}."USER_ID" ;;
}
dimension: asset_id {
type: string
sql: ${TABLE}."ASSET_ID" ;;
}
dimension_group: event {
type: time
timeframes: [raw, date, week, month, quarter, year]
convert_tz: no
datatype: date
sql: ${TABLE}."EVENT_DATE" ;;
}
dimension: event_count {
type: number
sql: ${TABLE}."EVENT_COUNT" ;;
}
}

#2 Create Looker Explores
You can think of the Explores as a starting point for a query or, in SQL terms, as the FROM in a SQL statement. They leverage Views and can be joined with other Views. Once an Explore is created, it can be used to create visualizations with the data of all the views defined within it.

explore: asset_views {
view_name: asset_views # Not necessary if explore name matches a view name
join: assets {
type: left_outer
sql_on: ${asset_views.asset_id} = ${assets.asset_id};;
relationship: many_to_one
}
}

#3 Create Visualizations
After creating the “Asset Views” view and explore, it would require little effort to end up with simple visualizations like this. Since we joined “assets” in our explore, not only do we have access to the event data, but we can also see all the information on the asset level.

Total Asset Views and Asset Views over time visualizations.
Asset level drill-down of Asset Views.

Bynder is a leader in the Digital Asset Management industry, and as you can imagine our clients constantly deal with assets, generating a large number of events: Views, Downloads, Shares, Creations, Deletions, and more. So, we can expect a View and Explore for every asset event that we are tracking.

Additionally, the number of joins could easily increase as we plan to offer more detailed information. In the example below, I added a new one: users.

# In this example, we enhanced the asset_views explore and we added 2 new ones

explore: asset_views {
join: assets {
type: left_outer
sql_on: ${asset_views.asset_id} = ${assets.asset_id};;
relationship: many_to_one
}
join: users {
type: left_outer
sql_on: ${asset_views.user_id} = ${users.user_id};;
relationship: many_to_one
}

explore: asset_downloads {
join: assets {
type: left_outer
sql_on: ${asset_downloads.asset_id} = ${assets.asset_id};;
relationship: many_to_one
}
join: users {
type: left_outer
sql_on: ${asset_downloads.user_id} = ${asset_downloads.user_id};;
relationship: many_to_one
}

explore: asset_shares {
join: assets {
type: left_outer
sql_on: ${asset_shares.asset_id} = ${assets.asset_id};;
relationship: many_to_one
}
join: users {
type: left_outer
sql_on: ${asset_shares.user_id} = ${users.user_id};;
relationship: many_to_one
}

What’s the problem and how can Inheritance help us

If, at some point, we need to add or edit a join in the Explores or add a new dimension to the Views while ensuring coherence for all event types, we’d need to do it manually for every asset event. This could become a tedious task and prone to mistakes.

As a Software Engineer, I couldn’t resist trying to improve our team’s experience in dealing with this modeling challenge. I saw this as the perfect opportunity to apply Software paradigms in a completely new domain for me, without knowing if it was even possible.

As previously explained, we are primarily interested in leveraging one of the core principles of Object-Oriented Programming: Inheritance. Our goal is to reuse code from other Views/Explores while maintaining a unified hierarchy. This property of OOP facilitates more thorough data analysis, reduces development time, and ensures a higher level of accuracy.

The main idea would be defining a base View with all the dimensions needed, and then simply extend it for the rest of events, allowing us just to change the main view and propagate that change for all assets events. Just like we would do it in Java.

// OOP Inheritance in Java

public abstract class AssetEvents {
private String accountId;
private String userId;
private String assetId;
private Date eventDate;
private int eventCount;
}

public class AssetViews extends AssetEvents {}

public class AssetDownloads extends AssetEvents {}

public class AssetShares extends AssetEvents {}
# OOP Inherintance in a Looker View

view: asset_events {
extension: required

dimension: account_id {
type: string
sql: ${TABLE}."ACCOUNT_ID" ;;
}
dimension: user_id {
type: string
sql: ${TABLE}."USER_ID" ;;
}
dimension: asset_id {
type: string
sql: ${TABLE}."ASSET_ID" ;;
}
dimension_group: event {
type: time
timeframes: [raw, date, week, month, quarter, year]
convert_tz: no
datatype: date
sql: ${TABLE}."EVENT_DATE" ;;
}
dimension: event_count {
type: number
sql: ${TABLE}."EVENT_COUNT" ;;
}
}


view: asset_views {
extends: [asset_event]
}

view: asset_downloads {
extends: [asset_event]
}

view: asset_shares {
extends: [asset_event]
}

We can see that both approaches are very similar, making applying inheritance in LookML quite intuitive. It’s important to note we are using the extension parameter with the value: required, to indicate that this view is abstract. It cannot be used unless it is extended.

Once we have the View ready, let’s move to the Explore section, which is slightly more complicated.

# OOP Inheritance in a Looker Explore

explore: asset_events {
extension: required
view_name: asset_events
join: assets {
type: left_outer
sql_on: ${asset_event.asset_id} = ${assets.asset_id};;
relationship: many_to_one
}
join: users {
type: left_outer
sql_on: ${asset_event.user_accounts_key} = ${users.account_user_key};;
relationship: many_to_one
}
}

explore: asset_views {
view_label: "Asset Views"
from: asset_views
extends: [asset_events]
}

explore: asset_downloads {
view_label: "Asset Downloads"
from: asset_downloads
extends: [asset_events]
}

explore: asset_shares {
view_label: "Asset Shares"
from: asset_shares
extends: [asset_events]
}

Things worth mentioning here:

  1. We are using the extension: required to make the Explore abstract.
  2. If you are extending an Explore, it’s mandatory to use the view_name parameter in the base Explore. Simply matching the Explore name to a View is not sufficient in this case.
  3. I strongly recommend labeling your extension Explores. While it won’t have any effect if you develop dashboards using LookML, it would make creating dashboards via the Looker UI more clear.
  4. The parameter from is key in this approach. The underlying behavior of using this parameter, is that the generated SQL aliases the original table name, like this: FROM asset_views AS asset_events. This allows us to reuse the logic from the joins without having to change the SQL clause. Notice also, that measures created on specific extension events would need to be referenced as “asset_events.measure_name”.
  5. In this example, we are dealing with just a single level of hierarchy. However, by applying the same strategy, we can extend it to handle other kinds of events that could share common logic with these asset events, e.g. user events. For instance, we could create a higher-level class, called “dam_events”, from which both “user_events” and “asset_events” can inherit.

We’re done! This is how we apply Inheritance in Looker, at the Views and Explores level.

If you are a Looker engineer, I encourage you to review your models and think about whether you can take advantage of this OOP principle to simplify your code and improve your experience.

Photo by Scott Graham on Unsplash

Final thoughts

In my opinion, the BI industry lacks sufficient focus on providing tools for development best practices and source control. This oversight forces developers to engage in repetitive processes and tasks, increasing the risk to their businesses. Looker seems to have made a strong commitment to empower engineers when it comes to developing in their environment and that’s why we’ve chosen it in Bynder.

The support of the extends parameter, which you can find in Views, Explores and Dashboards, as well as its integration with Github, makes the experience of using their product for Software Engineers much more pleasant and enjoyable. Props to Looker for this; we are looking forward to continue discovering their product and all the possibilities they offer!

--

--