Data Management Series — 2. Data Collection

DP6 Team
DP6 US
Published in
9 min readMar 14, 2023

--

Data collection is the process of capturing data by defining the scope of what is to be analyzed. The collection process aims to measure user interaction, with the rules defined according to the needs of each project. This provides the necessary information to quantify experiences, develop insights, and direct actions based on the consolidated values of each metric.

For data collection to answer business questions assertively, you need to plan and map the project’s needs with caution before implementing the ultimate solution. We recommend that you diagnose the problems and needs of each project, aligning all the points discussed with the technical team responsible for building the data collection on the platform, including the team that will perform the analysis.

This mapping is necessary not just to ensure that all needs are met, but also to validate the technical possibility of carrying out all items considered important for the project as a whole. An unstructured collection can result in incorrect data being made available, and the unnecessary waste of time and resources associated with doing the work again.

The most important elements involved in planning your collection (people, processes and technologies) are dealt with in this post about planning a data collection project.

People

It is important that all people involved in the collection process are aligned with the objectives, challenges, needs and limitations of the project. For this, you should create a process schedule, with the participation of the people responsible for the technical implementation, as well as those responsible for the development and consumption of insights from the collected metrics. An overall managerial vision is also required, to understand how the data is being used for the project.

Processes

The main processes involved in the implementation of a structured data collection are: understanding needs, technical and operational limitations, deadlines, tools used, as well as the construction of an assertive schedule to meet each of these topics. These steps are extremely important if you want to avoid needless repetition of work, or ending up with a collection that does not meet your business needs.

Technologies

Data layer

The data layer is an intermediate object between the Back-End and the Front-End, whose main objective is making important navigation data available in a simplified way. Data layers are commonly used in Digital Marketing to create an environment that can be used by tag managers, media pixels, testing and customization tools etc. In this post, All about Data Layers, we talk in more depth about how to plan and structure a data layer for a website data collection solution.

Main collection tools

Implementing data collection in an environment is a technical exercise that requires some knowledge of programming. It is possible to implement data collection scripts directly on the desired website or application, using programming languages, but this implementation flow is closely linked to the website code. This means that the website source code must be altered with each change in the collection, so a developer or professional who knows how to deal with programming is required to implement the collection.

Google Tag Manager (GTM)

To facilitate the data collection process and make it less technical and less connected to the website, Google created the free Google Tag Manager (GTM) tool. Once you install the tool on the site, you can create, change, and configure all data collection without having to change the source code of the site. But what is the real advantage of this feature? GTM promotes a decoupling between the code present on the site and the collection code. In this way, changes and new implementations in data collection do not need to go through changes in the source code of the site, which allows the marketing or analytics team to change the collection and perform maintenance separately from the team of developers responsible for running the site, giving greater flow and agility to the process.

Another differential of Google Tag Manager is its visual interface. GTM has a series of standard elements that configure the collection by filling in fields without the need for coding. One example of this is the collection configuration for Google Analytics, where GTM offers a visual template that allows the user to fill in only the desired fields, without interacting with any type of programming language. This greatly reduces the complexity of implementing the collection, eliminating the need for a specialized professional.

An example of tag implementation using GTM

The tool allows you to visualize the tags that were implemented for the collection on the site, to manipulate and visualize the data layer, and also has a Debug visualization for implementation tests.

Tealium iQ

Tealium iQ is a paid tool that is similar to Google Tag Manager. It allows the interaction and manipulation of data collection via the tool itself, promoting the decoupling of data collection from the implementation and maintenance of the site. Tealium iQ also allows the ‘’versioning’’ of implemented data collection and has a timeline view of changes made to implementations.

The tool also reduces the need for coding, providing an interface that allows data collection by filling in templates to facilitate the process.

Main Analysis Tools

Once the technical data collection process has been successfully carried out, in accordance with the main issues and needs raised by the business, it is necessary to use analysis tools to display reports and dashboards, and extract insights from the collected data.

Regardless of the tool chosen for data visualization and analysis, it is important to note that the data describes the user’s journey on your website, and its use is recommended for the personalization and optimization of the journey.

Google Analytics

Google Analytics is Google’s free analytics platform. By default, it collects some standard metrics, such as page views, language, browser, device (desktop or mobile) etc. However, the main functionality of the platform lies in the consumption of the collected data.

The tool allows users to assemble their own charts and data listing via an interface that enables the crossing and visualization of collected data, which makes the process of exploration and value extraction more straightforward.

The new version of Google Analytics, Google Analytics 4, or GA4, facilitates the use of the tool by business users, with a series of standard reports and the possibility of using predefined insights in the tool itself. It includes the default display of insights such as ‘’increase in engagement rate’’, ‘’events on the site by users of a given campaign’’ etc. This reduces the effort required for data analysis, allowing you to answer business questions more quickly.

One of the highlights of Google’s tool is its ease of integration with the company’s own services. With just a few clicks, Google Analytics connects with Google Ads, making it easy to create audiences for ad serving. Another option is the connection to BigQuery, Google Cloud Platform’s database service for extracting Google Analytics data on Google’s cloud platform, which is ideal for long-term data storage and the establishment of more advanced processes and analysis.

Adobe Analytics

An alternative to Google Analytics is Adobe’s paid tool, Adobe Analytics. Similar to its competitor, Adobe allows the creation of reports and graphs in its own interface, as well as the selection and filtering of the collected data. Adobe Analytics also has some features that exist in Google Analytics 4. It can detect anomalies, which allows the user of the tool to identify sudden drops in user events, issuing alerts when anomalies are detected. In addition, the use of Machine Learning for predictive analysis by default, within the tool itself, reduces the need for a data science professional to build simple predictions about user journeys and events on the monitored site.

Like Google Analytics, Adobe Analytics also enables media attribution analysis. This allows its users to credit, according to a previously defined attribution model, the media that played the biggest part in bringing users to their website, which enables more effective reinvestment in media ads.

Business impact

The data collection process goes through a process of planning, with a survey of business needs, the mapping of important events, the implementation of collection, and the analysis and extraction of insights. It is a multidisciplinary process that involves business, technical and analytical areas for its realization.

Bearing in mind that the process is long and requires the participation of professionals from several different areas, the main question is whether the value attached to the structuring of a data collection is proportional to its gain. There is no standard answer to this question, as it all depends on the business and the needs raised.

Data collection allows the company to get an understanding of its users’ behavior on the site, from the point of view of the user’s journey. We are not talking about tools that identify the elements that are clicked or viewed the most. There are UX tools that can provide a broader view of these aspects. When it comes to the user journey, the focus is on understanding the macro behavior of the user within the environment e.g. which pages are accessed, whether there are pages that cause users to leave the site, the navigation flow (which pages users go to the most) before converting etc.

Understanding the user’s journey is important for optimizing the environment. For example, take an ecommerce where the checkout flow is divided into 5 steps i.e. the user needs to go through for 5 screens (including filling in personal data, delivery and payment) before finalizing the purchase. After analyzing the collected data, you notice that users have a tendency to leave during the third step. In this case, what do you do to prevent users from leaving the checkout flow? A test can be done where you reduce the number of funnel screens to 3, using the number of stages that the user completes before giving up the purchase as your new limit.

This solution is possible because of the data analyzed in the collection. Visualizations of the content and pages that are consumed the most on your website, as well as the website’s output flow, are possible after you collect the data, and these can then be used for optimizations.

Data collection also allows the use of a more advanced marketing strategy. You can create specific groups of users for remarketing based on the screens that they accessed. For example, it’s possible to create a group of users who abandoned the cart at the time of purchase and send them a discount coupon so that they can resume their journey. It’s also possible to create groups based on the events within the site and the consumption behavior of the pages.

Finally, analytics tools allow you to identify the campaigns and ads that bring more users to your website and the campaigns that provide the most conversions. This allows you to invest in more strategic media and campaigns, such as awareness campaigns that bring in a larger number of users, or campaigns with users that are more likely to convert.

Although it is a long, multidisciplinary process, data collection and its subsequent analysis allows you to understand the users’ journey on the site, and this in turn enables optimizations and customizations. In addition, analysis of media performance, focused on the user’s journey within the environment, allows the reallocation of funds to campaigns that bring users who convert more. You can also use the data for more advanced marketing strategies, such as remarketing targeted at a specific user segmentation.

We have produced a complete e-book on the subject. Download here and learn more!

Profile of the author: Laura Kiehl | Passionate about the world of technology and digital marketing, she works at DP6 as a data engineer, dealing with the collection, crossing and structuring of data.

Profile of the author: Lucas Tonetto Firmo | A Computer Engineer graduate of Universidade São Judas Tadeu with an MBA in AI and Big Data from USP, Lucas is passionate about Technology and its ability to transform society’s way of life. He worked for two years developing websites and web applications and is currently a Data Engineer at DP6.

--

--