Data Engineering Tools: My Analytics Engineering Internship Journey at Dwelo

Iyeraksha
Dwelo Research and Development
7 min readSep 11, 2020

Do you like asking questions? Then welcome to the world of analytics.

Before joining Dwelo as an intern, I had only known one famous BI tool — Tableau. But to become a data engineer or a data analyst one must be proficient with new tools as well. So, thanks to my team that they let me experiment with all the new visualization tools that are Periscope, Sigma Computing, and introducing to DBT (Data Built Tool) Models. And, I would say I’m quite fortunate to have worked with the stack proposed by Fishtown Analytics.

What is Periscope?

Periscope is software for envisioning data utilizing SQL, Python, and R in one spot. It allows business clients to make choices on organization measurements. It was fascinating to analyze all of Dwelo’s organization metrics as an IoT company. They favored utilizing SQL for visualizing information. To utilize this software, one must know SQL at an intermediate level. It also allows you to create a View of the data and visualize it by simply dragging and dropping the fields.

More details: https://dtdocs.sisense.com/topic/sql

Yet, one question why Periscope when there are so numerous different analytical tools?

Well, the appropriate response is each leader can make their own dashboard and convey the profound experiences that drive the business forward. During my introductory days, I got an opportunity to explore and make my own dashboard with the existing data models which contained some tables like Communities, Devices, and Units. I could analyze some interesting questions immediately such as…

What are the battery level readings per device?

So, I just wrote a simple join query with aggregate functions like

and following query provided me the insights of devices such as, there are around 22k thermostats that have a battery level of 100. This means that 73% of thermostats are running with full power.

On the off chance that you’re curious about Dwelo, the central goal is to make living and working in rental lodging more magical, effective, and automated. They use Z-Wave devices in apartments, additional thermostats that can communicate with Google cloud, and the perimeter doors can be controlled by an access system like pdk.io. Every apartment has a hub and device installation where the tenants can control their lights and there are property managers as well who need to monitor units and know things like if the painters left the air conditioner running in a vacant unit.

For more details on our sensor architecture check out: https://medium.com/dwelo-r-d/what-are-you-syncing-about-16c9151e9ff

This allowed me to take some interesting insights where I could analyze questions like:

What are the total number of active hubs in the last month?

What are the average number of users created monthly?

What are the average number of hubs created monthly?

Periscope certainly centers around the comprehension of SQL, information base, and the questions you have to examine. But let us imagine a scenario where someone is not that open to working with SQL but might want to analyze the information in a simpler manner? One solution could be trying to improve your SQL skills to continue in the same tool. Or you can always experiment with another tool such as Sigma Computing, a worksheet-based analytics tool inspired by Excel.

What is Sigma Computing?

Sigma provides business intelligence and analytics administration by permitting the data teams at each level to analyze information, without code, and settle on knowledge-driven choices rapidly.

We explored this whole tool while migrating the existing dashboard from the Periscope. We first replicated a chart about Community usage which consisted of columns such as the unique ID of the community, hub metadata, occupancy information, and hub connectivity.

To migrate this chart to Sigma Computing, we attempted to comprehend the SQL rationale behind this, so we broke down a long query powering a chart part by part and added the required tables in Sigma like Community, Units, and Hubs.

In Periscope, to join two tables we will need to compose a basic join query — yet, Sigma gives you a simpler alternative by simply clicking the link option and that will interface the two tables.

It makes it easier to understand the concept of Primary and Foreign keys. The image below is an example of how to tables are joined in the UI:

How do WHERE, GROUP BY, and aggregate function work in Sigma?

Sigma has some built-in operations like group by and aggregate functions like Count, Sum, Min, and Max which are easy to activate through a mouse click, while in Periscope the user must know how to write the group by clause and when to use it. Sigma also has a filter feature which behaves like a WHERE clause, allowing you to include and exclude values through a UI.

What if Sigma detects a JSON column?

Well, that is the best aspect of the Sigma! It makes so straightforward by including an ‘extract’ feature to extricate semi-structured information. At the point that Sigma recognizes JSON or Variant segment types, ‘Extract Columns’ turns into an alternative in the section menu. On the off chance that your information is semi-organized and you do not see the ‘Extract Columns’ alternative, you can utilize the sort work JSON or Variant to change how Sigma deciphers the segment of information.

How is the Sigma dashboard is different from Periscope?

Sigma permits you to effortlessly share the analysis you have made in your worksheets through a custom dashboard.

Challenges faced?

  • A complex and large amount of data can slow the worksheet.
  • If I compare the tool with Tableau, then there is an extremely limited number of charts.

Why did we ultimately go with Sigma Computing?

We chose this tool so that each and every employee who is not technically strong can visualize and effectively utilize the data. Also, it is easy to use with drag and drop features. The spreadsheet-style interface and formulas are an easy transition for business users who are not used to working with complex BI platforms. They can play with any data source that the Sigma administrator grants them permission to access and create their own data models. It is now great to see how other employees in Dwelo are able to analyze their data by just having some preexisting Excel knowledge.

The complete tutorial could be found on: https://help.sigmacomputing.com/hc/en-us/categories/360003351314-Tutorials

Our next project was to analyze and visualize the data on user behavior in Sigma but while exploring we realized we will need some additional data models to be built. It was a good opportunity for us to learn more about the DBT model.

What is the DBT model?

DBT is a command line prompt that allows data engineers to transform the data in their warehouses very efficiently by simply writing a single select statement. It performs T in ETL — the transformation of the data. Dwelo uses traditional data warehouse tools like Redshift, BigQuery, and Stitch, and hence it was best to fit to use the DBT model.

The only task of DBT is to take code, compile it in SQL and then run against the database. We understood this concept while creating one new model for user behavior.

Why only one select statement?

DBT creates only one .sql extension file at a time hence it will create one model or one select statement. Models are materialized either as view, table, or incremental models. For our model, we preferred creating a table as they process faster queries and slower transformation. The objective of the staging layer is to make organizing models. They take crude information, and clean and set them up for further analysis. So we essentially made staging tables for central information.

dim_table is the short table where each line is a person, place, or thing. This is a different to afct_table they are small and mutable. I created dim_table for our data using the ref function, which created a dependency graph between the models

How do we run our dbt commands?

We preferred using Git to cast out our changes using docker and ubuntu. For every new change in DBT, we simply pass the dbt run model name command

Once we created the tables it was important for us to pass certain data types as unique and not null in order to avoid duplicity and redundancy. For that purpose, we created a test case where DBT checks whether the test failed or pass. I simply passed dbt test model name. This helped me create a successful model.

For more information on dbt you can refer to: https://docs.getdbt.com/docs/building-a-dbt-project/tests

To become a Data Engineer or Scientist one must love information and have curiosity to learn about the data. As Peter Sondergaard says “Information is the oil of the 21st century, and analytics is the combustion engine.” So keep learning!

References:

https://blog.getdbt.com/track-data-changes-with-dbt-snapshots/

https://discourse.getdbt.com/t/how-we-structure-our-dbt-projects/355

https://blog.getdbt.com/what--exactly--is-dbt-/

https://www.datawarehouse4u.info/reviews/business-intelligence/periscope-data

--

--