Alvin + Mode: Data Lineage and Usage for your Reports & Charts

Published in

Alvin

7 min readFeb 1, 2022

As soon as I joined Alvin as a Data Engineer I was given the task of building an integration with Mode, a collaborative data platform which we’re seeing get some rapid adoption. It was the first time I worked with core software engineering and it has been a really nice journey so far with lots of learning, accountability and building stuff.

As a former civil engineer, building things is something I have always enjoyed. Now instead of construction, I am helping building a really amazing data product with an awesome and talented team at Alvin 🥳

It is not news that the data space has been getting a lot of attention (and tools!) in recent years. At Alvin, we are building a product that helps improve the quality and usability of data, offering features such as impact analysis (what happens if I delete this column?), data lineage, problem tracing and usage analytics, all within a lightweight data catalog.

To achieve this, we are integrating with the most prominent data tools. We currently have integrations with data stores such as RedShift, BigQuery and Snowflake, BI tools such as Looker and Tableau, data transformation frameworks such as DBT, and data orchestration platforms such as Airflow.

To prioritize our integrations, we always listen to the data community, and this time they shouted: Mode!

How did we integrated with Mode

As described on the Mode website : ”Mode is a modern analytics and BI solution that combines SQL, Python, R and visual analysis to answer questions faster”

Mode has 3 metadata sources which are:

- Core API (API reference in their developer website)

- Discovery API (batch API, only for enterprise plans)

- Discovery Database (provided via Secure Data Sharing by Snowflake).

To make sure the integration works, regardless of Mode plan (at the time of writing at least!), I decided to use the Core API.

We used Amundsen databuilder extractors as starting point to create our own metadata extractors, and we took inspiration from their ETL framework to load that data into our backend.

*A job with a task which comprises of the extractor, transformers and loader.*

Although Amundsen has some ready to use Mode extractors they do not fit our use case, so I created extractors based on their RestApiExtractor that utilises RestApiQuery to extract data.

The RestApiQuery basically works by chaining a sequence of API calls (chaining RestApiQuery objects) as we hardly get any of the data we want by making the request to a single endpoint.

To tackle this, RestApiQuery response payload is parsed using JSON expressions. This seems a bit like magic but it basically just means that we can extract values from certain paths in the response object easily. A simple example is if you have an object like {"name":"alvin"} you ask for “name” and you get “alvin” back. The extracted fields can then be used to construct the next API call/endpoint for the next RestApiQuery object — for instance supplying parameters for pagination.

For instance, let’s say we want to extract reports metadata that exists inside spaces. This would require us to retrieve data from https://app.mode.com/api/spaces/<space_token>/reports/. So, prior to getting the reports, we would have to first get the list of the available spaces in the workspace/organisation to plug the space_token in this URL.

That’s what the ModeDashboardExtractor does, which is an extractor from Amundsen. So, first it gets the spaces query building a ModePaginatedRestApiQuery object which is just the RestApiQuery with some pagination logic added to it.

From the API response payload it parses the JSON to get the token (which is like the space id), name and description and builds a python dictionary object for each record in spaces.

These records are then passed to the parent RestApiQuery which is the reports query.

And then we have the the space_token (which they call dashboard_group_id) available to call the reports endpoint. For now, we don’t need any more API calls to get the data we want so with this final RestApiQuery object we build the RestAPIExtractor to then start extracting the records from Mode.

Besides building the extractors, I also built our own transformers and mappers to create our data entities model objects to load to our database (we have our customised Loader class for that!).

One thing to note here is that Mode does not provide granular usage data (i.e. individual report views) through their API, but just the accumulated view count. This isn’t much good to us as we need to know when the views took place. For example, a report could have 500 views, but if they all took place more than12 months ago, then it doesn’t have much business value anymore. I solved this by calculating the daily view count in our backend.

The main objective of this integration was to get data from Mode reports and charts, including the queries from reports so we can generate the lineage, which in turn powers our impact analysis feature.

✅ Milestone achieved: our plug and play Mode integration is up and running!

How does it look in Alvin?

Connecting Mode in Alvin is easy peasy lemon squeezy (just like all of our integrations!). Here are the quick steps to follow:

1. Get Mode API token

Get your Mode API token:

It’s important to note that the only data we access is read only metadata from Mode resources.

2. Add Mode as a platform

After that, go to your Admin panel in Alvin, click on ADD PLATFORMand choose Mode.

Then fill out the following fields:

- Name: unique name for your connection

- Display Name: name to display on the Alvin UI

- Access Token and Password: Retrieved from API Tokens in your Mode Account

- Workspace Username: username of the workspace (previously referred to as Organization). You can retrieve this information directly from your url https://app.mode.com/<workspace_username>/spaces

After filling the credentials, hit the sync button. In a few minutes you’ll be able to navigate through your Mode reports and charts 🙌

And with that we are done! Now let’s check out a some examples of what all that metadata powers in the Alvin UI:

Upstream lineage: Find out which tables and columns are being used in your reports and charts.

Downstream lineage: View all reports and charts that use a certain table or column in your data warehouse.

Impact analysis (I love this one!): Find out which reports/charts will be affected by your changes (e.g. dropping a table or column).

Powerful search: Discover your reports and charts and check usage over time. See what is popular and investigate those that have gone stale.

Wrapping Up

So that wraps up how I got on integrating Mode with Alvin. If you fancy trying it out for yourself, you can sign up to the beta here - I’d love your feedback!

So that’s one down, many, many more to go 😅 Feel free to let me know what you think our next integration should be in the comments or via Linkedin. If you want to know more about Alvin and our mission, check out our other posts here. Otherwise, stay tuned for my next one!

Want to give Alvin a try ?

Alvin is helping data teams of all sizes meet the challenges of data quality, reliability, cost and governance. Get started for free here and unleash the power of highly accurate lineage and usage data.