Please pay close attention to everything I say. Everyone has their perspective to look at things. I may be wrong but it’s finally you who has to make the choice.
Let’s start by discussing some harsh but true facts.
Kaggle is less likely to be considered during evaluation.
Why? because everyone is copying. Uniqueness has gone. I’ve seen many people doing EDA by plotting plots, cleaning data, converting data, etc. without actually understanding what conclusions can be drawn from the data. EDA is also about answering questions and discovering hidden patterns in data.
Example- You have to create an ad campaign for Houses. Your job is to target different audience based on their interest and match. You will start by plotting plot between age and price or maybe between age and number of rooms. Why? It will help you to target a specific audience. How? Let’s assume that these houses have only 2 options 3BHK and 4BHK. People earning more are more likely to buy one of these instead of one who is earning low. This will help you to target only specific audience and reduce the cost of the ad campaign.
Show your Project on GitHub
If you had built a project please put it on GitHub. Most of the times your resume is not shortlisted because of this. Also, create a readme for every project. It doesn’t have to be fancy but just little about the project is good.
If you need project idea look here-
Do real stuff
When you will move from Kaggle’s data to real-world data you’ll feel why most of the Data Scientists always talk that more than 70% of the time is utilized in data cleaning. It’s a fact. Most of the data on Kaggle is already preprocessed. If you want to learn to go out and scrap data or find some real machine learning use case.
Example — Your boss asks what is affecting the weather most? Increase in the human population? an increase in the number of cars? or a decrease in the number of trees or maybe some other factor. This is a good problem to start with. You’ll have to scrape data from multiple websites, preprocess them, clean them and what else. Try to develop an end-to-end project by creating REST API using Flask.
If you want to learn how to deploy ML models on AWS or GCP, follow this-
Read blogs and read articles
This is my favourite thing, reading articles. Once you get into the habit of reading you’ll learn a lot. You’ll always in touch with the current happenings in the industry, new methods, algorithms, etc.
How do I get into Data Science as Fresher?
- Start by learning Image Processing or Natural Language Processing. Both have very good scope and try to get in-depth of that topic. Don’t try to learn both at the same time. Then apply for internships that offer PPO (Pre Placement Offer). There is too much competition for internships, if you have learned anything in great detail it will always give you an edge. Most of the startups are working on NLP and Image Processing.
- Enrol for certification courses on Coursera, Udemy or any other. As a fresher, it will give you an edge. I usually prefer Coursera courses.
- Don’t put your MOOC projects on the resume. These include cat vs dog, iris, IMDB movie review, MNIST, etc. These projects have become very common and whenever you are building a project always have a use-case for that project.
- Try to adopt the best python practices that are widely used in the industry. Why? It gives you an edge that you can write production-level python code for deployment otherwise it’s gonna remain in Jupyter Notebooks. What are the best python practices? Refer this
Best Python practices for Data Scientists
Let’s see some industry standards for production-level code.
5. Don’t get demotivated or depressed if you don’t understand certain things at the time. As a beginner no one expects you to know the complete mathematical concepts behind an algorithm except for linear regression.
Data Science is a very challenging field. You will have to work very hard in the beginning. When you’ll start developing real-world use case applications you’ll see there a lot more challenges which no one tells about like deploying your model in a scalable way, regular training and a lot more.