Lessons learned from my first Data Science Job
I have got my first data science job in June 2021, after spending about two years and a half preparing myself for data science interviews by taking MOOCs (mostly Udacity and Udemy courses), reading several books on the subject, and taking tips from people who were succeeding in the field (luckily, got the best advice). I got my first data science job as described in this story, in a Healthcare Startup named Bright Photomedicine, which is settled in Sao Paulo, Brazil.
Now there has been around one year and a half since I have been working with real medical data, working mostly on the improvement of the medical treatment of the patients. In this post, I want to share with you the lessons learned from an aspiring Data Scientist to a full-fledged Data Scientist.
Lesson 1: Real Data is really messy
One thing that I noticed from my experience is that real data is really dirty, much more than the ones presented in the MOOCs or even machine learning competitions. I think that the courses providing data science lectures are focusing too much on the modeling part of the project and exploratory analysis is being underestimated. This aspect should be changed, since there have been lots of efforts to automate the modeling part of the data projects by introducing Auto-ML frameworks, such as H20 and auto-sklearn.
Lesson 2: Real interdisciplinary work
As part of a Startup based on scientific knowledge and research, I have been engaged with people of a great variety of majors. Therefore, in contrast to the academy, I have been acquiring the most diverse knowledge concerning Physics, Medicine, and Artificial Intelligence by working with Medics, Physicists, Biomedicals, Engineers, Physiotherapists, and people from sales with different backgrounds. One thing I learned is that those who want to succeed in data science should be able to deal with people of different backgrounds and very different personalities.
Lesson 3: Teamwork really matters
The diversity of people profiles makes really impossible to put into practice the lessons learned in data science courses and books without being a good team player. Understanding the business problem is very hard to grasp at the beginning and good communication is essential to bridge your hard skills with the business problem.
Lesson 4: Communication and presentation skills are mandatory
It is not enough to get the correct results of the data modeling or data analysis, it should be presented in the right way for the respective audience. Your audience is what really matters. For instance, for the R & D team, it is really important to dive into the details of the models, quantitative analysis, and so on. However, for the salesperson, you should focus on the visual analysis, by using intuitive figures and explaining the results in layman's terms. This is something that sounds logical, but it is hard to put into practice, and that comes along with experience.
Lesson 5: Data Exploration is the most important part
As stated in other posts, when doing Machine learning projects typically happen “Garbage in — Garbage out”. It means that if you give the model “bad” data, the model will output a bad prediction. The best way to circumvent this problem is to make a really good exploratory data analysis and data cleaning, and business problem understanding is crucial.
However, when dealing with real data, the data cleaning process should be done carefully, and sometimes most of the data is “lost” (can not be used in the predictive modeling, since these data would worsen the predictions).
Lesson 6: Industry vs Academia
As I have a strong academical background, there was an adaptive time to learn how the industry works. The real goal of a Startup is to deliver high-level products in the short-, medium- and long terms, and at Bright Photomedicine research is crucial to convince the clients that the treatment really works. This kind of scientific work is more than just publishing papers delivering “strong” results — you have to show that your scientific work makes a profit somehow. This is in strong contrast to the pure academic work that we aim at impact factors and citations acquired in your scientific journey.
These are the lessons that I learned during my first data science jobs that I am sharing with you. If you liked it, please give it some claps. Constructive criticism is welcome. You can add me on LinkedIn here.
Thanks for reading! Hope you enjoyed it :)