What a Senior Data Scientist learned in 6 months to not be outdated?

Filipe Pacheco
6 min readMar 12, 2024

--

Hello Medium Readers,

There comes a time when I feel compelled to write this specific post. If you’ve been following my posts, you may have noticed that I’m a Senior Data Scientist, having held this position for the past 3 years, not always as a senior :). Although my journey into studying and experimenting with ML began long ago, in 2015, during my undergraduate degree in Mechatronics Engineering, I still consider myself somewhat of an outsider in the IT world. I believe that Data Science, in the real world, is closer to IT than the Engineering world.

In my initial post on Medium, I discussed how rapidly Data Scientists can become outdated due to the constant influx of new resources and techniques released every month. At the outset of my second post, I outlined three paths I intended to pursue to avoid early obsolescence as a Data Scientist:

  • Learn about LLM — Large Language Models
  • Upskill in ML on AWS
  • Become a MultiCloud Practitioner

Now, I will provide my perspective on these three paths after 6 months.

Por Que e Como Data Science é Mais do Que Apenas Machine Learning? — Ciência e Dados (cienciaedados.com)

Learn about LLM — Large Language Models

This path was the first thing I pursued. I conducted some personal experimentation using HuggingFace models and explored applications with LangChain using RAG, instead of fine-tuning a model, which I did in my work, or building a model from scratch. Additionally, I completed a course from Databricks Academy to gain a better understanding of the theory and engaged in hands-on classes to apply LLM techniques.

After this period, I still believe that LLMs have immense potential to impact human daily life, particularly in boosting productivity across various fields. However, I don’t believe they will fundamentally change how the world operates, as was previously suggested.

Most of the applications I’ve seen over the last 6 months aim to improve productivity. Thus far, I haven’t encountered a single job that has been completely replaced by this technology. Of course, I’m not ruling out the possibility of this happening in the future, but I believe there’s still a long way to go.

On the horizon, before 2030, I envision scenarios where I can converse with my car about maintenance needs or the optimal configuration for a planned trip, or with my washing machine regarding the best settings for washing damp towels, possibly with support from an LLM.

Reflecting on these examples leads me to believe that it will be necessary to reduce the model’s size in order to specialize them and reduce energy consumption, particularly for integration into vehicles and household appliances.

After a full year of using the most renowned LLM, ChatGPT, I’ve come to understand that LLMs possess an extensive memory but are unable to generate new information from scratch. However, when utilized as assistants or for specific tasks, they can be incredibly helpful.

Upskill from ML in AWS

That was my second task. On a daily basis, I utilize the Databricks platform for data science workloads, which has been incredibly beneficial. However, I realized that my knowledge of AWS, the primary cloud service provider used globally and within my company, was lacking. Consequently, I recognized the need to expand my expertise in this area.

This realization led to the creation of this post, where I shared my one-month experience using SageMaker Studio, the most analogous service in AWS to the Databricks platform. Additionally, I understood that by broadening my understanding beyond simply using and developing ML workflows and models in AWS, I could reap further benefits.

Hence, I enrolled in the AWS Skill Builder program and completed my first AWS Cloud Quest, a gamified hands-on experience which I discussed in detail here. While I haven’t progressed to other trainings yet, as I was focused on completing my expansion in DevOps, which was the theme of my recent posts on Medium.

However, as I mentioned in my last post, it’s now time to return to ML training. I anticipate sharing my experiences with AWS Cloud Quest for ML knowledge expansion.

While I haven’t developed extensive skills in ML on AWS, only covering the basics so far, I firmly believe that having experience with one of the three largest Cloud Service Providers is essential for staying relevant as a Data Scientist.

Become MultiCloud Practitioner

I must admit that I haven’t yet become a MultiCloud Practitioner, as originally intended, but I have valid reasons for this decision. Firstly, I want to emphasize the importance of having a plan while remaining open to improvisation. When I embarked on my journey to learn AWS with TheCloudBootCamp, I enrolled in a comprehensive Cloud course covering AWS, Azure, GCP, and OCP. Initially, my plan included completing lessons for all these platforms. However, I eventually chose to change course.

Sometimes, readjusting your route becomes necessary. During this period, I was promoted to a Senior position within my team. Through various 1:1 conversations, I became convinced that specializing my knowledge in AWS was the most strategic move at this juncture. This decision was influenced by the significant importance of AWS to my current company and its dominant market share.

Last November, I achieved my first AWS Certification, becoming certified as a Cloud Foundational, which I discussed in more detail in this post. Throughout this journey, I engaged in numerous practice exercises, many of which began within the realm of DevOps. These experiences led me to produce 8 posts here on Medium covering the key topics in DevOps today, links to which can be found below.

For me, this strategy focused on AWS proved effective. However, what works for me might not necessarily be the best approach for you. That’s why I highly recommend conducting 1:1 meetings with individuals in higher positions within your company. Additionally, staying informed about HR trends in the market and identifying the skills in demand can be invaluable. Furthermore, staying connected with colleagues in similar roles is essential. Personally, I’m always active in LinkedIn groups related to ML, and I find them to be valuable resources. You can find the links below.

Conclusion

In conclusion, my journey over the past 6 months as a Senior Data Scientist has been marked by a concerted effort to remain agile and relevant in an ever-evolving landscape. Exploring paths such as delving into Large Language Models (LLMs) and upskilling in machine learning (ML) on AWS has been instrumental in broadening my skill set and understanding.

While I may not have fully realized my goal of becoming a MultiCloud Practitioner, I’ve embraced the necessity of adaptability and strategic decision-making. This journey has reinforced the importance of continuous learning, staying informed about industry trends, and fostering connections within the community.

Moving forward, I’m committed to leveraging these principles to navigate the dynamic intersection of data science and cloud computing, ensuring continued growth and success in this rapidly changing field.

--

--

Filipe Pacheco

Senior Data Scientist | AI, ML & LLM Developer | MLOps | Databricks & AWS Practitioner