Challenges of building a Data & Analytics area in a fast-growing company

AMARO
AMARO
Published in
8 min readAug 13, 2018

Written by Murilo Nigris — Data & Analytics Lead at AMARO

1) Introduction

I joined AMARO in the end of 2016 with the challenge of building the company’s Data & Analytics architecture, team and strategy from the ground up.

To provide context, AMARO is a direct-to-consumer omnichannel fashion brand, selling at amaro.com and through its concept stores, called Guide Shops. The company sells originally designed women’s clothing, accessories, and footwear, and integrates technology everywhere, from product design to home delivery. At the Guide Shops, customers can try products on, get help from Fashion consultants and complete the purchase online through one of the iMacs available. Items bought will then be delivered straight to her home, since the Guide Shops don’t carry any stock.

By that time, despite AMARO’s heavy usage of technologies to improve their customer experience and process, there was no centralized data infrastructure for the internal teams to feed from. All teams were downloading reports and spreadsheets from the systems they interacted with, and with AMARO’s accelerated growth, this was not scalable. That’s when the Data & Analytics area was created.

In this post I want to share what allowed us, in one year, to move from manual exports to a centralized data infrastructure from where most business teams can consult data, create and share their own analysis in the cloud. I will talk about the main contribution factors in the areas of data strategy, architecture definition and project execution.

2) How did we begin — understanding the data strategy

To build AMARO’s data infrastructure, we first had to understand and map the challenges and data sources of a direct-to-consumer fashion brand, which englobes from product design and manufacturing to logistics and omnichannel sales in order to define the data strategy.

The definition of a data strategy is usually one of the most challenging and important parts for companies, since it will guide the beginning of the initiative and consequently set a tone for the rest of the period as well. Luckly, when I joined AMARO, the founders already had the main KPIs and business processes to be tackled defined, at least from a high-level business perspective.

After the data strategy, the second biggest challenge is to define the scope of the Data team. Will they be focused on providing analysis for the other areas of the company or building a solid infrastructure to allow for self-service BI in the future?

Executing ad-hoc queries vs building a self-service infrastructure

Fortunately, at AMARO we decided for the later. Thus, in the first three months (or until the two main cubes were created), we would not attend ad-hoc requests to focus on the basics. This decision payed-off by allowing us to move from a data team that spends all day doing ad-hoc queries directly in the production database to one that puts all efforts in evolving and building the stack.

3) How did we begin — understanding the BI SaaS and frameworks market

After defining the data strategy, with accordance of the stakeholders, it was time to set our priorities and choose our tools and frameworks. We conducted a detailed research within the Business Intelligence market by assessing a great variety of tools (data warehouse, ETL, data pipeline, data visualization) in order to find the combination that would meet our priorities:

  • Fast and simple to build on;
  • Low maintenance, and allow focus on building what matters to us;
  • Scale with AMARO’s growth;
  • Affordable cost;
  • Bet on right the player — I believe this is one of the most important criteria, considering the high commitment with those tools. Their speed of evolution, for instance, may be more important than its actual capabilities, and will make a big difference in the long term.

After all this analysis, we ended up choosing an ELT approach instead of the traditional ETL. In short, the reasons were:

1) our data sources were already rather organized

2) we had evaluated and liked Looker, which had a good fit with ELT.

Therefore, this approach would be more agile and would allow us to evolve our definitions in a easier way than traditional ETL. For the initial stack, we chose Stitchdata as a data pipeline tool, so it would execute the ‘EL’ collecting data from Ecommerce Platform (MySQL) and ERP (SQL Server) and loading it to Redshift, our choice for data warehouse. Then, using Looker, we modeled all the facts and dimensions and used its persistence feature (PDTs and NDTs) to write back to Redshift, and those persisted tables are the source for all Dashboards and Analysis.

4) Starting the project of building the initial Data Models

For AMARO, it was important to release features quickly in order to receive feedback, but at the same time avoid taking in new requests from different topics (as the internal users would get excited about the tool capabilities). After all, executing ad-hoc requests is not only about the time you spend on it, but the break of focus and continuity in productivity, what makes a huge difference when there is only one person working on the project.

Single-tasking vs Multitasking

There are many ways from where you can start, but by starting with sales and inventory, where most of KPIs were already defined, we were able to quickly build a solid base and test the complete stack. Starting with ill defined metrics would definitely delay the project as a back and forth of decisions would most certainly happen. In addition to that, it is crucial to focus on fixing critical bugs, as those will undermine you and what you want to build in the future.

Before April we had already replicated the most important Excel reports in Looker, and many users were consulting the data and exploring it without manual downloads or without demanding SQL queries from the data team. The feedback was very positive.

4.1) Choosing new Data Models to add after the basics

After the conclusion of the basic data models, a large amount of requests started to come in. In order to understand how to prioritize those requests, we met with all Department Heads to collect demands and organize possible projects.

The requirements were very broad, including: incremental features on existing data models, new data models that would reuse ‘facts’ or ‘dimensions’ we had already built, entire new models that were from complete different areas and hundreds of visualizations and dashboards to be built. Our choice was to focus on new data models and visualizations. New models would add value to the teams without requiring much work or different types of maintenance, and building visualizations was necessary to keep the quick feedback loop on the data.

At this point, when we better understood our short and mid-term demands and the BI stack showed it would easily pay itself, we decided to start hiring. In short, we opted for an analyst with knowledge in visualization and business understanding, and an intern with technical background and SQL knowledge to help us build our data models.

It was a very interesting period because, even with a lean team, we had fast developments and good feedback from stakeholders. From my point of view, this was due to choosing the right projects to focus on, generating more value with less effort.

4.2) What to focus on after a rollout of the Data infrastructure

After the conclusion of the projects mentioned above, during the last quarter of 2017, we turned attentions back to the demands on our backlog. With the increased awareness of how our work could be helpful to their daily activities, the other teams came to us with more demands. During this period, it felt like we were not progressing, because no new big projects were being taken in. However, everything done in this period was crucial to make sure we could evolve. As any software implementation, nothing runs as smoothly as planned, so it’s necessary to reserve some time for the other teams to ‘consolidate’ what was built. In order for that to happen, you have to provide fast and assertive support.

One of the good decisions we made in this quarter was to focus on a) fixing bugs and b) adding small incremental features that for us, individually, was not time consuming and for the final user was the difference between happily using Looker and grumply using Looker in their daily activities. It was a period when we made our data models more solid, for the Data team to be able to start new projects without always having to come back for adjustments, delaying new projects.

Minor adjustments and small bug fixes improve the perceived value of the BI
Minor adjustments and small bug fixes improve the perceived value of the BI

Another important consideration here is that we’re not talking about infra maintenance, because the stack we chose required almost none, most adjustments and bugs mentioned above were in the data model itself, due to the visibility on new KPIs and new ways of analysing the business in greater depth. In this matter, the choice of ELT approach was also proven successful in our case, because it was a lot easier to adjust and add business rules after data was loaded than it would be to change a traditional ETL process.

5) CONCLUSION

By the end of 2017 we had met the most crucial demands and made them solid. So 2018 would be a period to focus on new projects and ideas. With that in mind, we hired a Data Scientist and a Data Engineer, that had the right skills to:

1) Work on Marketing, Customer Experience and Company level projects, bringing several new data sources and building new models into Looker, making it quite complete;

2) Start rebuilding our current data platform, by moving to a Data Lake structure in S3, using Spark to process the data and having Airflow to manage it;

3) Start challenging Data Science projects.

Now that we’re on a point where we have reached a more solid base and several teams are supplied with data, we are currently working on:

1) Planning our new data strategy

2) Focusing on Infra & Scalability

3) Preparing for internationalization

4)Tackling interesting Data Engineer & Data Science challenges that the fashion industry can offer.

I believe we came a long way in this past one and a half year, but this was just the base to allow us to grow much faster from here. We have very challenging goals for the future and fantastic learning opportunities by working with new technologies and open problems from the fashion industry.

I’m very happy to share our journey with you, hope you’ve enjoyed it! Thank you for reading it!

I would also love to hear your feedback and thoughts, as this is only our view about our journey and not necessarily a one-size-fits-all-solution. Leave your comments below :)

--

--

AMARO
AMARO
Editor for

We build the future of retail through best-in-class technology and data