Things I Wish I Knew When I Started Working in Data

Lessons from my first few months in Data

Kemp Po
Data @ First Circle
6 min readMay 2, 2020

--

It’s almost been a year since I graduated from university with a Computer Science degree and a Specialization in Data Science and Analytics. Coming soon is my first year with First Circle as a Jr. Data Analyst, and I’d like to take you through some key things I’ve learned.

First, what does it mean anyway to be a Data Analyst in First Circle, a Fintech company that specialises in working capital financing for B2B companies?

On some days, being a Data Analyst could mean creating exploratory analyses or exploring new tools and models. Other days, we work closer to the business to enable other people in the company to make data-driven decisions. There are also days where we integrate new data sources or augment to the data technology stack, which you can read about in my teammate’s post here. Throughout my experience working through my first year, these lessons have proven to be continuously helpful for me as I develop myself as a data person.

Lesson 1: The First and Primary Goal is Always to Drive Business Value

A person in the data team is a problem solver, be it in the technology space or the business. Business outcomes are always our north star, and we use data as our tools, coming in the form of data science, analysis, or engineering.

Graph we generated using the PageRank algorithm
The graph we generated using the PageRank algorithm

One of the key projects I worked on when I joined was leveraging a graph database to start introducing network effects concepts to the business. The impact of this, however, was not immediately recognizable.

This led me to realize that business has to guide what we do and what we make. Whether it’s increasing efficiency, acquiring customers, or even reporting, every project you do must be tied to a business goal and must be prioritized accordingly.

Before you even begin writing that first line of code, you must always understand the potential impact of the project and be able to justify its value.

One of the first pieces of major work that I did as a data analyst fell through the cracks. I was the new guy in the team and we needed a new dashboard. I immediately jumped onto working on it by just taking the metrics asked for and not running it through people who were the intended users. Wrong move. What I should have done instead was co-authored the dashboard with its direct stakeholders.

This could have been avoided by applying a practice that we have in the data team in First Circle. It is what is commonly known as a Request for Comment or RFC. These are known in the technology community as a document, which outlines specifications, concepts, and procedures.

Lesson 2: Concise Communication is Extremely Important

https://workbean.co/company/first-circle/

A lot of people with more experience have already said this, but I can’t stress it enough. Without proper communication, all those metrics, models, or systems you’ve created will never be seen past your computer. No one in the company will adopt anything you’ve made if you don’t communicate this properly.

The Request for Comment process, previously mentioned, also helps us in this situation. Immediate stakeholders should be involved during the entire drafting of the RFC document, at least for defining the problem and creating the solution idea and requirements. After the document is drafted, it is put through a peer-review process within the team and is then decided whether we have sufficient tools and information to begin a build. When the team comes to a consensus that the build is required or will drive business value, we send it out to the rest of the company to see if they have any other comments.

The process helps in getting alignment and buy-in from the different stakeholders of the project by involving them in the entire process; from ideation to scoping and actual build and implementation of the system or change.

But also remember, you have to be concise. No one will understand or even bother to read long documents that could be just a couple of paragraphs. Remember who you’re speaking to and communicate in their language.

Lesson 3: There is a Time and Place for Everything

https://pbs.twimg.com/media/CniGt-LXgAAyAAU.jpg

Honestly, I see this one a lot and personally do it too. Many data people fall into the routine of just jumping straight into the thick of creating models and analyses or what have you. There’s no point in building models or extremely complex systems if the organization and business are not ready for it.

Creating these systems and models takes time and work. There’s no worse feeling than seeing something you spent an entire month’s work on just fall through and never see the light of day.

Thankfully, the RFC process saves the day in this aspect again. During our RFC process, we normally take time to understand what the current situation is and layout the business needs before we even come up with solutions.

Another thing that I again find myself forgetting is that sometimes all the business needs is something simple. Not every problem needs a machine learning algorithm or a completely new system to solve it. You don’t cut your steak with a sword. So sometimes simple heuristics or systems are all you need to bring immediate impact on business.

Lesson 4: Sanity check. Seriously

Even the simplest of queries could yield results that just don’t make sense because of some faulty logic or join. When possible, always try to benchmark numbers and calculations against the gold-standard, which could come in reports or accepted internal metrics.

We always have to thoroughly consider caveats or issues in the data we have at hand. Understanding our data better helps us get better insights and thus drive more value. Always remember “Garbage in, garbage out.” This means if the data is unclean, the results from our processing should also come out unclean. We should never fix data in between steps. Instead, we fix the problem from the roots.

This doesn’t necessarily just mean that our data is bad; it could also mean that we’re making wrong assumptions about it or we’re understanding it completely wrong.

Lesson 5: Lastly, Documentation is Tedious but it is Necessary

https://imgur.com/jzraiVM

As any developer or anyone really who has written code knows, it is extremely tedious to write documentation. However, documentation is what drives collaboration. Without it, no one else can work on what you’ve worked on or it will take them longer to understand what you’ve created. Overall, it creates a mess and it just won’t be that fun working with people who don’t write documentation.

This holds for everything we do in data. It is necessary to document the process of how we extracted the data and how it has been transformed. It’s also necessary to document how other people can use this data for downstream purposes. We also need our analyses and models to be clear in how it was created and arrived at.

We have to make our work easily reproducible and understandable. Having good documentation ensures that the entire process and the data can be understood especially in the future.

There’s a lot more than just these five things. But I think these are some of the most important things I’ve learned so far. Many of these are centered around working best in a team of data people. After all, no one is a data unicorn. Rather, it is a complete team.

If you want to ask questions or just interested in discussing and exploring these ideas further, don’t hesitate to send me an email at kempspo@gmail.com or connect with me on LinkedIn. Always enjoy hearing other people’s perspectives on things.

--

--

Kemp Po
Data @ First Circle

Data troll writing about data and data product development. In fintech