Tips: My Journey as a Data Analyst

Trisna Yulia Junita
Tokopedia Data
Published in
6 min readOct 31, 2018

On my first day as a data analyst, I did not have much idea about what my daily task would be — and how to actually do my job. Job description did tell me about data analyst responsibilities. However, it mentioned only the big picture of the job in which I should collect, manipulate, analyze, and interpret data into actionable insights. As someone who was new to data analytics, those were too abstract for me to grasp.

As a beginner and someone who explicitly wanted to start her data analytics career specifically in a tech company, I had lots of questions in mind — how do I collect the data? — is coding needed? — which tools should I use? What are the right statistical methods? Can someone like me, who do not major in statistics learn all of this? In summary, I was confused about how I would start my journey in data analysis, yet eager to learn more.

If you are new to data analytics, experiencing the same as I did, and interested in learning more about the basics, whether for fun or for a career change, then here are five tips that might help you to get a good head start in your journey.

If you are interested in data analytics, you will need to learn a few programming languages like SQL or R. I promise, those are fun and worth your time.

1. Learn SQL (Structural Query Language) Programming

Source : Lynda.com

So, why should you use SQL? Isn’t Excel the perfect tool for analysis and visualization of data? Well, as a data analyst you are required to collect data directly from database and no one hand-picks this. It will be your job to actually decide what data you need.

The other reason is: Not only data in current time needed, but also you might end up analyzing a full year of sales. Let’s say you are analyzing 3,000,000 data points of sales transaction. Excel limits you to 1,048,576 per data sheet, which means you will need three sheets of a million entries, that need to be maintained manually. Rather than facing the hustle bustle, learning SQL seems more pleasant. And here is why.

SQL lets you access and manipulate databases. It is great for automating types of aggregations that you might normally do manually in an Excel pivot table — sums, counts, minimums and maximums, etc. — not to mention its capability to handle much larger datasets and multiple tables at the same time. The language itself is very straightforward; once you have mastered primary commands, you just need to SELECT columns FROM a table, JOIN it with another table, and only grab the data WHERE your condition applies. And SQL will do all the heavy lifting for you.

So, here are a few resources that got me started:

https://community.modeanalytics.com/sql/tutorial/introduction-to-sql/

https://www.w3schools.com/sql/default.asp

2. Learn Basic Statistics

Once you have collected the data, you will have a lot of numbers which means nothing until you carry out some statistical analysis to make sense of it and draw some inferences from it. What I mean with basic statistics here are :

Source : rawpixel on Unsplash

Summarizing Data: Grouping and Visualizing

The first and very basic thing you can do with your data is grouping it into dimensions that you are interested in. For example, you are interested to see how business performs in daily, monthly, or yearly basis. Then the thing to do is to group the data by date, month or year. And in order to make your grouping easy to read and understand, it is a good idea to visualize it into a graph such as bar, line, or pie charts.

Calculating Average: Mean, Median, Mode

If you have no idea on the difference between Mean, Median, and Mode, do not panic. They are very easy to calculate. You can google it and find a lot of information about them, including example code. The important thing is not about calculating them but knowing when you need to use which. A mean might be a good idea for normal distributed data. But if outliers skew your data, a median can be the better choice.

Calculating Dispersion: Range, Variance, and Standard Deviation

More elaborated statistical methods include ranges, variances, and standard deviations. To be honest, we do not use them often, because means and medians usually do the the job. But we have those advanced tools for more complicated problem, especially when looking for more complex patterns.

There are many online courses where you can learn basic statistics for free, here are some of them :

https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data

https://www.coursera.org/learn/basic-statistics (7-day trial)

https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x

https://www.udacity.com/course/st095

Source : unsplash.com

3. Learn R programming

Source : coursera.org

R is a language with a whole environment for statistical computing and graphics. At first, it might be frustrating to learn R, but once you become fluent and understand its — admittedly — complex coding structure, you will see that it is the most advanced language for complex statistical tasks. You can also use R for visualization, but I usually just use it for complex data manipulation and statistical calculation.

Here are some good sources for learning R languages:

https://www.datacamp.com/courses/free-introduction-to-r

http://tryr.codeschool.com/

http://www.tutorialspoint.com/execute_r_online.php

Source : storytellingwithdata.com

4. Learn Visualization Skill and BI tools

Once you have skills in data collection, data manipulation, and analysis using statistical techniques, you will also need to learn skills in data visualization. Why? Because it may reveal surprising things about data that would not be visible by just simply looking at the numbers. I had performed correlation analysis before which told me that my variables had no correlation — but once I visualized my data, it clearly showed me one. Moreover, visualizing your data helps others capture and understand your analysis as well as make decision driven from your analysis. So, no matter how well you think you know your data, visualizing it might reveal something surprising.

You can start practising data visualization using chart feature on Excel or Google Sheet. After you got the experience with data visualization, you can start to learn how to use BI tools such as Tableau, PowerBI, etc. Here are also some recommended sources for learning data visualization :

http://www.storytellingwithdata.com/

http://www.vizwiz.com/

http://www.visualisingdata.com/

5. Practice and Play With Data

There is no better way to be good in data analytics other than practicing what we have read or learned. So, my last tip is to find a data set and start applying what you have learned. You can find free datasets for practice online. I usually grab data from UCI: Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php) and Kaggle (https://www.kaggle.com/). If you get stuck, you can always search for help and solutions on google. As you work with more data, you will come to see yourself as a proficient data analyst.

Source: Pixabay

I hope this few tips gave you a good overview about how to start to be a data analyst. If you have questions or suggestions feel free to drop any comments below this article.

--

--