CodyGuard: How Do We Maintain Standardized Code Approach For Looker, LookML in Trendyol

Data Guards
Trendyol Tech
Published in
6 min readAug 1, 2022

Written by Cansu Sarı & Berrak Perk

Unlike many other BI Tools, Looker requires coding with a language called LookML for the development of the data sources rather than a simple drag and drop approach.

With the decision of switching to Looker from Tableau in Trendyol, the need of migrating hundreds of datasources and thousands of reports emerged, with the help of around 40 people, which means way too many lines of codes to be written.

Since 40 people with different personalities can come up with various styles and ways of coding, we defined some standards that we could apply in order to increase the readability, effectiveness, tune and performance of our codes, as if it was written by one hand.

While defining the standards, we examined the best practices while considering the needs of the company and customers. However, both the coding according to the standards and reviewing whether the codes are aligned with them requires a lot of effort.

Therefore, we came up with the idea of automating the reviewing process as much as we can without losing flexibility, which was the stepping stone of the birth of CodyGuard, a linter tool that will scan through our codes and make static code analysis using Python.

Project Steps

The project consists of many stages, that includes many iterations. For example, after the coding is completed, while testing we may face some irregular cases requiring an exception to be written, or even the function to be eliminated from the automation process as a whole.

Figure 1: Project Steps

1 - Choose What To Automate

As the first step, we need to choose the standards that do not include human factors, that we could build an algorithm to check the properness.

As an example, one of our standards is that “Each object should have a definition”, which can be easily controlled by CodyGuard rather than by hand. However, some standards can require human opinions. To illustrate, we should avoid many-to-many relationships between our objects due to performance requirements, but it may be used when it is obligatory. This needs to be done manually as it is open-ended. As a result, we first picked the standards to be automated.

2 - Divide Into Objects

In order to write our standards as functions and give them the required input, we grouped the standards according to the input objects, which were:

  • Views: An object similar to the table from the database
  • Fields: An object similar to the field of a table
  • Explores: A data source that can be used for reporting

Even inside them, we needed some subdivisions as well due to the changes in the requirements, parsing methods etc. The example for the views is like:

  • Base views: Direct views from the table
  • Extended views: An approach to manage different domains using the same tables, having different calculations
  • Derived tables: Used for custom sql queries, or level of detail calculations
Figure 2: Objects

3 - Group By Functions

To reduce the repetitions in our control code, we grouped into functions. For example, the existence of description, label and access grant parameters of an explore can be controlled together like below:

Example 1: explore_parameters

We also decided the types of the functions, (Error / Warning) and make the grouping accordingly. To illustrate, “A primary key should be defined to a derived table object” can be an error:

Example 2: view_pk

“The numeric fields should have a value format parameter” can be a warning:

Example 3: field_value_format

Depending on its importance, impact, and effort required to make it.

4 - Divide into Phases

As this project was maintained in parallel with our migration itself, according to their importance and ease, we divided it into three phases and released it accordingly.

5 - Code the Functions

As the result of our research, we decided to use the lkml library in python to parse our LookML codes.

Although some of the functions can be built upon the parsed objects, we also needed to analyze some using raw texts. One example is, the parser takes the dimension and measure objects in separate lists, so when we need a control using all, like checking whether the objects are sorted in alphabetical order, we need to work on the raw text.

Example 4: view_alphabetical

During the development of our product, we used Colab as our initial test environment, using text inputs.

Our structure consists of two classes, one of which is the CodyGuard itself, and the other is CodyGuardFunc where we have the generic functions that will be used inside the CodyGuard. An example for it is the titlecase function. We need to customize the definition of titlecase according to Trendyol’s use cases, considering the abbreviations like “GMV”, “L1M” and the connectors, where it can be used either lowercase or uppercase depending on its context:

Class of Generic Functions

All functions are connected to one main function, check_file(), and the dependency of the functions are hierarchical.

Main Function: check_file

Depending on the file’s type, the function calls the check_view() or check_explore() functions, and they call the necessary ones as shown in the figure.

Figure 3: Function Hierarchy

The results from the functions are kept and returned in a val_result string object. If the document in question does not have any errors, that is, all the val_result objects called for that document type returns empty, the output will be empty as well. If not, the errors and warnings are grouped and shown under the corresponding titles.

6 - Test

Our testing involved two different types:

  • Colab Environment: Manually giving example text inputs, manipulating the usages, making syntax errors, simulating misunderstanding of the standards or possible errors etc.
  • Git Environment: On a test branch, reading from the existing files, using live codes to examine the performance and accuracy.

7 - Integrate

To manage our collaborative work environment in LookML, we use the branch approach of git. Each of our coworkers have a personal branch, and there exists some shared branches as well.

When users make a development in their branches, they commit these changes, open a merge request and after it is reviewed, it is deployed to the main branch. Any commits that are made when there is an open merge request is also included in the same merge request.

We integrated our code in a way that it will be triggered when we make a commit for an open merge request, for the changed files only. With this way, the users can open a merge request and before it is reviewed by another user, they can check for the errors and warnings of CodyGuard. Outputs are shown under the stage within Git environment as follows:

Figure 4: Output of CodyGuard

After fixing these, they can commit the changes, this will be included in the same merge request and the review can take place after it passed CodyGuard.

8 - Feedbacks

After we go live with Phase 1 and present it to our colleagues, we started gathering feedbacks from them. Evaluating and implementing the feedbacks received, we are continuously improving our tool.

What is Next?

Our next steps will be:

  • Other Phases: We will improve our tool with the changes and functions we addressed to our next phases.
  • Adding New Functions: As we are newly migrating from Tableau to Looker, there can be some new standards added, resulting in new functions that could be taken into account.
  • Visuals: Currently, we can see our results through the stage in our git environment. We can work on the visuals for the outputs to make them more appealing.
  • Feedbacks: Asking for feedbacks for our tool, we can keep improving our product.

You can find the repository of CodyGuard here. Please feel free to contact us if you have any questions or suggestions, hope this article could help if you need a similar approach!

--

--