How to Deliver Data ASAP (Without) Drowning in Anxiety and Frustration Part I

J Rachmanas
Wix Engineering
Published in
7 min readMay 3, 2023

The challenges and learnings of a data engineer at a big data company

Introduction

Hey there, folks! It’s your pal Julius, the Tech Lead at WIX Data Guild. Let me tell you, it’s been a wild ride to get to where I am today. I’ve gone from being a full stack engineer working with Microsoft and Oracle products in the public sector, to being a BI developer working with an open-source stack and doing infrastructure and data engineering tasks, to finally landing at WIX and working with data. And now, I’ve even become a tech lead — talk about a promotion!

Life at a Big Data Company

Working at a big data company like WIX means that there are tons of departments and moving parts to keep track of. Sometimes, it feels like I’m in charge of everything and know everything, but let’s be real — that’s just not possible. And trust me when I say, it can lead to some pretty hilarious situations.

As a member of the Wix team, I can confirm that we are a big data company, and as a data specialist, our emphasis lies in understanding user engagement. This aggregation of user engagement allows us to continually enhance the experience for everyone using Wix. Handling massive volumes of data daily, we process and manage upwards of tens of terabytes. We put a lot of effort into making sense of that data, and we are always looking for ways to improve our processes and make our work more effective. We have several teams responsible for ensuring that our data collection, processing, and analysis systems are running smoothly and efficiently. It’s a challenging job, but we all enjoy what we do and take pride in our work.

Dealing with Data Delivery Challenges

I firmly believe that it’s possible to deliver data quickly in a big data company without any anxiety, by tackling the things that cause it. Believe me when I tell you, I’ve become a pro at it — from starting with napkin sketches to deliver simple tasks, to actually enjoying the process and saving some time. But enough about me, let’s get to the fun stuff.

In the following sections, you will uncover a fascinating amalgamation of distinct situations that I have encountered throughout my career, all centered around the common theme of delivering data as quickly as possible. As you delve deeper into the article, you’ll find that these scenarios illustrate the challenges I’ve faced in dealing with inefficient data extraction processes, compromised code quality, lack of proper documentation, and issues with code reusability. By sharing my experiences and insights, I hope to shed light on these critical aspects of software development and data management that, when overlooked, can lead to long-term setbacks and frustrations.

The “Simple” Task that Wasn’t

It all started on a random Monday, with a task to extract data from two types of events stored in log tables on S3 in columnar format — one storing data on app users who opened the app, and the other storing data on those who actually used it. Additionally, it was necessary to retrieve supplementary information from MongoDB to augment the dataset, further enhancing its overall value and applicability for the intended analysis. Easy peasy, right? I sat down and right away wrote some code to extract the data from S3, feeling pretty proud of myself. But then, disaster struck — the service querying/processing data from S3 was down! And since I wasn’t in charge of the process, it took an entire day for the service, which wasn’t under my control, to get back to normal.

Communication Breakdowns and Roadblocks

The next day, at our regular stand-up, I found myself telling everyone that I was still working on this “simple” task. Little did I know what was in store for me next. As I tried to extract data from MongoDB, I realized that I couldn’t select only the related pieces of data — I had to extract everything, and it was huge and mostly unnecessary. So, being the smartass that I am, I went to talk to the professionals in charge of Mongo. But, of course, I came unprepared and it took forever to explain the problem I was trying to solve. Communication… duh. Eventually, after an intense scientific investigation and research, a solution with Mongo was found. Woohoo!

The Never-Ending Loop of Fixes and Approvals

At our regular stand-up the next morning, I found myself yet again telling everyone that I was still working on this “simple” task. I tried to push it to production, but the build failed because I used a third-party library related to S3 that wasn’t allowed in production for security reasons. Ugh!!! So, I went back to the drawing board and eventually “invented” a solution from Stackoverflow. I pushed it to production, waited for verification and approval, and finally, the code worked!

But, of course, that wasn’t the end of it. The very next morning, the data analyst told me that the data wasn’t in the format they were expecting. Back to the code I went, making the necessary changes and pushing it to production yet again. And then, just when I thought things couldn’t get any worse, the analyst decided that a few more columns were necessary. Can you guess what I did next? That’s right — I went back to the code, added the columns, pushed it to production, waited for verification and approval, and then found out that I had corrupted the initial dataset. Yikes!

I announced to the analyst that the data was ready for checking. This accomplishment filled me with a sense of pride and satisfaction. But, of course, things didn’t go as planned, not even getting into details — I had to go back to the code, fix my “features” (as if they were intentional!), and push to production an uncountable number of times. I waited and waited and waited for verifications and approvals — it felt like I was stuck in a never-ending loop!

Facing the Dreaded Code Review

As if my stress levels weren’t high enough, the next thing kicked in. Another approver of my code stepped in. Ah, the dreaded code review phase! It’s like standing in front of a firing squad and hoping they miss. And I’ll be honest, this time I was the one shooting myself in the foot. My code was a mess — it was like trying to find a needle in a haystack, except the needle was buried under a pile of spaghetti code. I mean, I was using a single Python script to extract data with Spark from S3, without any comments or structure. It was like a Frankenstein’s monster of code, cobbled together from whatever snippets I could find. Let’s just say, it didn’t go over well in the code review. Thanks a lot, guys! Of course, I had to go back to the code yet again, fix it, push it to production, and wait some more.

Lessons Learned and Moving Forward

You can conclude my never-lasting experience into a Daft Punk song:

Fix it, push it, wait for approval, present it,

Fail it, fix it, push it, wait for approval,

Present it, fail it, fix it, push it,

Wait for approval, present it, fail it, fix it.

Finally, three whole months had passed, and a BA asked me to enrich the dataset. Did I remember what I did before? Nope, not even close. I had to re-read the entire code, understand where, why, and how I did certain things, and even run the tests manually before showing it to the analyst. And that, my friends, took another whole week.

So, what have I learned from all of this? Well, for starters, never underestimate the power of a “simple” task — they can turn into never-ending nightmares. And secondly, it’s important to stop and reevaluate the processes. Sure, I have increased my anxiety levels at a stable pace throughout this whole ordeal, but in the end, I came out on top learning lots of new ways to deal with the issues.

I hope you’ve enjoyed hearing about my misadventures in the world of big data. Remember, when it comes to data engineering, sometimes you win, sometimes you learn — and sometimes, you just have to laugh at yourself and keep pushing forward.

Stay tuned for the next part of my story, where I’ll share the approach I moved to and how much time it saved me while avoiding all these pitfalls.

--

--

J Rachmanas
Wix Engineering

Tech Lead @ WIX🚀| Big Data Guru💾 Demystifying complex data solutions for all🌐| Avid learner & adventure seeker in the quest for digital excellence🌟