Good practices for implementing Collection — Special Data Collection Month

DP6 Team
DP6 US
Published in
6 min readApr 24, 2020

And so, the date arrives. The long-awaited date, when there will be a deluge of sales. You spent the last six months planning for this day. You prepared your stock, guaranteed the infrastructure for your website and app to support a huge volume of access, and designed a wonderful marketing campaign in order to further leverage the sales of your business.

The day begins and the campaign is in full swing, but when you access your Analytics tool, everything goes dark. The real-time report indicates that there are a total of ZERO users on your website and application. Your phone starts ringing, emails start popping up in your inbox: “Where’s my data?”, “What happened to the Analytics?”

After a short investigation, you discover that the Data Collection for your campaign’s landing pages has been poorly performed, and that no data is being collected. The IT area is brought in on an emergency basis, and, after several meetings and hours of discussion, they identify the reason for the problem and publish a correction. However, it is too late.

The day is over, and all the work done in the past six months has been severely compromised. Sales were made, but were not counted in the Analytics tool. Due to the lack of data, it was not possible to optimize marketing campaigns in real time during the day, a fact that resulted in a much lower volume of sales than expected. All the intelligence that could have been extracted from the data, that could have helped to enrich the current campaign and future campaigns to generate even more sales, was wasted.

This story, although fictitious, still represents the reality of some companies in our country, especially when faced with some seasonal dates, such as Black Friday. At a time when much is said about the importance of being a data-driven organization, basing all your decisions on data, it is contradictory to see the negligence of some companies when dealing with data collection activities on their websites and applications.

Data collection is not just a piece of code installed on your website or application that collects data and sends it to some repository. Data collection represents VISION. When it is well structured, it can empower organizations, allowing them to leave behind the generic vision of their digital efforts, where the results of all actions carried out in the online world are grouped together. Instead, we have a granular and strategic view of each user, understanding the profile and timing of each one of them on their journey to purchase a product or service, and understanding how each channel operates within that journey. This way it’s possible to design accurate and targeted communication for each user, reducing the friction of the message and increasing the chances of doing business.

With a view to improving the current scenario, DP6 is starting a series of publications about Data Collection on its blog. During the month of April, we will publish a series of articles to try to untangle the subject, including: how to conduct cohesive data collection, the best practices when structuring data, and the tools available on the market.

Today we start by discussing the pillars of good data collection. These will serve as a guide for all types of collection and are the basis of good practice.

The main problem we find with data collection today is the mentality that having the data is safer than not having it, that is, collecting everything even when the data is of no use in a given situation.

It may seem like a harmless idea, but it ends up reflecting a cultural problem that can be very harmful to the quality of the data collected. Therefore we need to reinforce good practice at this time.

Good Practice

  • Standardizing of the collected data:

Standardization of data is essential for the functioning of the entire analysis process. Using standardized collection, less effort is required by the Engineering team to make data usable for analysis. Without standardization, the data would not be able to talk to each other, a fact that would require the use of ETL (Extract, Transform, Load) processes to force the integration. Although this is often necessary, regardless of the standard, reducing its use should always be an objective to avoid unnecessary costs, whether direct monetary costs or that of the team, or possible delays due to dependence on development, which in turn tends to be complex.

Example:

  • Each one in its own box:

One of the most important concepts to follow in this step is that of Tidy Data, where one of its pillars states that each variable must have its own column within a database. This point is directly related to the standardization of the data that is collected, as it is common to want to insert several variables within a dimension to follow some defined pattern. However, if this is the case, you should create new dimensions for these variables, which can be segmented in a simple way without the need to treat the base using regex.

Example:

  • Standardization of the tag manager:

In the area of Web Analytics it is common to find professionals who are unhappy with the lack of standardization within an organization’s tag managers. The non-adoption of standardization ends up making the use of the tool time-consuming, due to the difficulty of understanding and locating implementations, a fact that can generate extreme situations such as duplicate implementations.

This is due to a failure in the development or maintenance process. For example, imagine that you were hired to maintain a pre-configured Google Tag Manager (GTM) container, and on your first day one of the main tags stops working. You, without any real understanding of the new container, decide to make a new implementation to speed up delivery and reduce the impact. The collection starts working again, but at the cost of breaking the GTM standard, in addition to other points that may have an influence later on.

The ideal is to keep your tag manager organized and standardized. A tag adjustment can take 10 minutes or 1 day, depending on how standardized the environment is, directly impacting the return and the efficiency of the collection area.

  • Necessary, and only the necessary. The extraordinary is too much:

This is a sin committed by everyone in the field at least once in their career, as it is very easy to be overawed by the analysis options and you want to collect everything. Or perhaps you do it out of fear of some area requesting an analysis that does not have enough data to support it. However, with experience comes the understanding that we do not use even a small portion of all that data collected. Not only that, but the maintenance of all that data, in addition to standardization, becomes unsustainable.

The best way to solve this problem is with dialogue. In other words, talk to the area that is asking you to collect data, question why it was collected and how the data will be applied. Always ensure that the data collection follows the organization’s objectives, and the KPIs needed to verify its success. In this way, your performance will be transparent and the collection process will be clearer to others.

We hope that we’ve helped you in your search for the evolution of a collection with the pillars presented above, but we we’re not stopping there. This month we will have four other posts about collection:

  • Data collection via selectors (8th April)
  • Data collection with page standardization (15th April)
  • Data collection with data layer (22nd April)
  • Collection tools (29th April)

Profile of the author: Ariel Fahel| A graduate of Business Administration from Fundação Armando Alvares Penteado he is currently studying for a Master’s in Business Administration at Fundação Getúlio Vargas. He has been working for DP6 for almost three years as a Data Analyst, and in his spare time he like to play with JavaScript, Python, SQL, and study new tools.

Profile of the author: Bruno Munhoz | A graduate of Digital Games from SENAC, he has been working at DP6 with data collection and engineering since 2016. A specialist in data collection for analytics, he is developing in data engineering and occasionally dabbles in analysis.

--

--