Proven tips for acing your Data Analyst job application process!

mahatma waskitadi
6 min readOct 11, 2023

--

Over the past couple of months, Vokraf.com announced a Data Analyst role and opened the gateway for talented individuals to join us. Of over 200 applications we were excited to receive, none made it through the selection process. Let’s unpack why.

Here’s a peek at how our selection process unfolds:

  1. A Take-Home Case Study
  2. An Interview with HR
  3. First User Interview
  4. Second User Interview — Final Round
  5. Job Offer Extended

The majority of applicants stumbled at the very first hurdle: the take-home case study. Here, in essence, is a breakdown of why their journey ended at this early stage:

  1. The data were not completely cleaned.

Ever wondered why data cleaning is so essential? It’s because it forms the bedrock of correct conclusions. Imagine steering a business down the wrong path due to inaccurate data analysis. It can trigger ineffective marketing campaigns, failing new product launches, overspending on incentives, premature product termination, even losses running into billions of dollars. The domino effect of inaccurate data is profound and far-reaching.

Modern-day leaders are increasingly relying on data to navigate complex business terrains. They feed on data before attempting to unravel knotty business issues. The most successful ones boast an impressive blend of gut-feeling and data-backed decisions. When presented with data, they won’t merely gulp it down. They’ll assess it critically, match it with their intuition, and if there’s any discrepancy, they won’t hesitate to raise an eyebrow. The data might be sent back for verification. Imagine their dismay when a revised analysis showcases different figures. Your credibility as a data analyst might take a hit. So, double-check, triple-check, ensure data accuracy at all costs.

Sounds easy, right? And yet, many applicants stumble on this seemingly straightforward task. An effective data cleaning involves numerous checks, from timestamp format, uniform categorization, null elimination, statistical outlier removal, to ensuring data representativeness. A surprising number of applicants tick off a few boxes and skip the rest, thinking their job is done. Beware, partial data cleaning can be as damaging as no cleaning at all.

It’s no secret that data can sometimes be tricky to handle due to inconsistent formats. For instance, you might come across dates that mix “:” and “/”. In the realm of e-commerce, an SKU type like a smartphone should only have one category, for example under the gadget accessories. Bananas are consistently pigeonholed as fruits or food and beverages. In the real world the categorization error often happen, it caused by system bug. So, having an eagle eye on this matter is really important.

And let’s not forget about our old friend — null values. These can creep in unexpectedly, perhaps in the form of empty nominal transactions or missing timestamps. It’s crucial to identify and weed out these nulls promptly. Last but not least, there are those pesky statistical outliers. According to statistical theory, these should be eliminated using a specific equation.

The rule for a low outlier is that a data point in a dataset has to be less than Q1 - 1.5xIQR. This means that a data point needs to fall more than 1.5 times the Interquartile range below the first quartile to be considered a low outlier. The rule for a high outlier is that if any data point in a dataset is more than Q3 - 1.5xIQR, it's a high outlier. More specifically, the data point needs to fall more than 1.5 times the Interquartile range above the third quartile to be considered a high outlier.

Lastly, let’s discuss data representativeness. Picture this: You’re running an online webinar business that typically hosts 150 attendees per session. In this case, your data should never display more than 150 participants. It’s a crucial consistency that underscores the validity of your data!

2. Business insight weren’t coming from time series data.

The fundamental element of insightful data analysis is identifying both proportions and trends. Achieving a comprehensive understanding of your data requires both these elements; missing either can eliminate its value as an insight. Let’s break down why. We leverage data in business to drive strategic initiatives. Before picking an initiative to focus on, it’s crucial to ensure its relevance and further scope of improvement. Mere data about completed transactions may not present a clear picture without context; however, if you peg this against the total market size, it provides a proxy of scale and an initial understanding of the business’s performance.

But the journey to generating meaningful insights doesn’t stop there. A critical element that often goes unnoticed is the time series analysis. Many candidates suggested initiatives based on pie charts. This is sub-optimal because there are timestamps in the raw data. Time relevance plays a crucial role in understanding if the identified trend or proportion is still significant or not. Let’s understand this with a simple example of market share. If your data shows that the market share was 5% two years ago and climbed to an impressive 80% last year, it’s a reason to pop the champagne. But the scenario could be drastically different if these numbers were reversed — it would potentially become a significant concern for the business. Moreover, if there’s no noticeable change in the figures, say it was 50% two years ago and remained the same last year, the management could perceive this as a challenge and contemplate improving it. This situation isn’t necessarily bad, but of course, there’s room for improvement. Your insight should always echo this degree of time relevancy and offer a true snapshot of your business environment.

3. Jump to conclusion too soon.

In the data-centric world of business, diving deeper into numbers can lead to more effective and impactful decisions. Simply put, you need to look beneath the surface. For instance, if a spike in sports category transactions surfaces in an e-commerce landscape, it doesn’t necessarily mean that every item within the category is performing well. Similarly, an increase in transaction failures doesn’t imply all users are experiencing issues. In reality, a small percentage often influence the overall metrics significantly. The real challenge? Spotting that all-important 20% that triggers 80% of the outcome — a concept you probably know as the Pareto Principle or Pareto Chart. This ability, to pinpoint and concentrate on the vital 20%, should be part of every Data Analyst’s toolkit.

Missing the all-crucial 20% and defaulting to generic behavior can lead the business team down a risky path. They might hatch broad action plans that affect all users, potentially causing more issues. It could also lead to budget overruns by providing solutions to users who don’t really need them. So never underestimate the power of targeted problem-solving based on accurate data analysis.

4. Prioritization and estimated potential improvements

Consider this: if you’ve highlighted the significant 20% of issues, you’ve already grasped their effect on the overall metrics. Let’s visualize a scenario. Suppose most webinar attendees who complained about not gaining prompt access to the main room had been left waiting for more than two minutes. In this case, 90% of these complaints come from those who’ve waited longer than this time frame. Your metric for improvement becomes crystal clear — for all webinars, attendees should not be kept waiting for more than two minutes. By adhering to this, you could potentially slash the number of complaints from, say, 100 to zero.

The step-by-step guide for the analysis isn’t explicitly highlighted in the take-home test and that’s by design. It is expected that an experienced Data Analyst would instinctively know how to proceed. It’s somewhat disconcerting to observe the current acceptance rate as it hints at the average caliber of professionals in the job market — a quality bar that is still to reach the expected standard. The insights seem to elude them somehow. However, I am hopeful that this trend will see a favorable reversal over time, as more individuals tackle these issues, investing efforts into amplifying the quality of our learning ecosystem. In my view, a more comprehensive approach is required to effectively tackle this problem.

Anyhow, I sincerely hope this concise guide enables you to land your dream job. Best of luck! Oh, and by the way, Vokraf is currently looking for talented Data Analysts like you. Don’t forget to check out the available vacancies here: https://vokraf.com/career/all-jobs We’re excited to see what you can bring to our team!

Have a good day, Cheers!

--

--