My technical training in Data Science

Vagner Zeizer C. Paes
Geek Culture
Published in
7 min readDec 31, 2022

In my first job at a startup named Bright Photomedicine, I was granted a FAPESP scholarship, along with job benefits, in order to work as a full-time Data Scientist. Here, I talked about my journey before getting this job and breaking into the field of data science. In this story, I will tell you the lessons and challenges I have been through during this almost two-year period.

Let us begin by taking an overview of the startup I am working at.

Customized Photobiomodulation Therapy

Figure 1: Bright Photomedicine (source)

Acute and chronic pain is a health problem affecting hundreds of millions of persons worldwide. However, alleviating the pain of those people and improving the life quality of those is something expensive and comes along with several side effects depending on the type of medicine or surgical procedure that is being taken. So, with the aim of alleviating pain and bringing a more affordable approach, Bright Photomedicine is developing a light-based therapy, named Customized Photobiomodulation, in which the light beam energy applied in the patient is customized according to the phenotypical features of the patient, such as Body Mass Index, Age, Gender, level of pain, skin phototype (the skin type of the patient in terms of concentration of melanin), among others. In this therapy, light stimulates the creation of analgesic and anti-inflammatories substances in the body of the patient, alleviating and reducing the intensity of the pain. The advantage of this treatment is that it is non-invasive and has no side effects. More details of the therapy can be found on the startup’s website, along with references to scientific studies corroborating the efficiency and efficacy of the treatment.

The issue here is: can data science/analysis help optimize Customized Photobiomodulation Therapy? In order to answer it, I will briefly discuss what I have been doing as a Data Scientist in industry research and with real data of patients.

Industry Research

As an academic dropout (see my Lattes here), I know how to write scientific articles and present results in a way it can be published in reputed scientific journals. Somehow it is useful in my daily work because the startup I am working at is based on science. Therefore, the validation of our treatments by using the scientific method is very important. In this section, I will discuss some results that we have published concerning data from patients in clinical trials undergoing Customized Photobiomodulation Therapy.

In the article provided in this link, which was published together with NotreDame Intermedica, we have investigated the evolution of the pain of patients undergoing Customized Photobiomodulation Therapy with low back pain. In this case report, 6 patients were investigated and all of them have shown a tendency to significantly have pain relief after ten sessions of treatment, improvement in the life quality of patients (through analysis of questionnaires of life quality) was also observed even four weeks after the end of the treatment (follow-up). We also noticed a reduction in the visits to the unit care station, and surgical indication after a follow-up of four weeks compared to the beginning of the treatment.

Figure 2: Chatbot-user interactipon. Source

In another article, that will be soon published online and was presented at a conference (CBIS (Brazilian Conference of Informatics in Healthcare), here), we analyzed data from patients that were surveyed by a chatbot on a daily basis for 4 months, along with the record of data by the therapist during the sessions of the treatment. The goal of the work was to find out whether the chatbot is a useful tool to monitor the pain of patients undergoing Customized Photobiomodulation Therapy undergoing low back pain. Basically, the chatbot sent a message to the patients on a daily basis asking about the patient's pain, every day at the same time, and for all the patients for four months (see Figure 2). As will be published online soon, we found a strong correlation between the pain collected by the chatbot and the therapist, and the delay to answer the chatbot’s message was good, i.e., the majority of the responses were recorded within almost 30 minutes after the chatbot sent the message to the patients asking for their pain level. The results that we found were promising and more studies need to be addressed in order to fully validate the chatbot as a tool to monitor pain, in general.

Data from Real Patients

Patients that undergo the treatment, in general, do not follow strictly the protocol that was given to them in order to get the best out of the therapy. So, data analysis/exploration and machine learning approaches are useful in order to find specific patterns in the data in order to find out whether the patient is getting better or worse along with treatment. Through machine learning, we can find out which features of the patient are most responsible for the outcome of the patient. We can also find which is the optimum frequency of attendance to the clinic, in order to get significant relief of pain after sessions of treatment. More strategies are being implemented through data exploitation in order to find the characteristics of the patients that are succeeding in the treatment. More articles concerning real-life patient data will be published in the future, which will be a powerful weapon to show the strength of Customized Photobiomodulation Therapy to reduce pain.

Now, I will discuss the main technical and soft skills that I developed during this technical training in data science.

Most Developed Technical Skills

  • Statistics: this is the cornerstone of everything I have been doing. Surely, a deep statistical analysis is needed in many cases, when possible;
  • Python for Data Science: mastering pandas, seaborn, plotly, and other libraries were essential. When analyzing real data things are much harder, and these libraries are really essential;
  • Machine Learning Concepts: surely it is good to have a deep knowledge of how the machine learning models work when fitting data, as well as understand the role of hyperparameters when looking for interpretable models;
  • Data Visualization: getting insightful graphics when dealing with data is also a kind of art. The way we present data makes data much more interpretable;
  • Data Exploration: getting insights from the data is very important and also related to deep data exploration is the business problem understanding. You have to understand very well the data in order to provide insightful plots and delve deeper into the data;
  • Data Cleaning: it is an ongoing process that always needs to be updated and it is much harder than we can expect from a simple inspection of the data;

Most Developed Soft Skills

  • Presentation: this is the soft skill that I most developed during the technical training. Surely, I improved the way I present results, but the most remarkable skill was using good fundamental tools, such as PowerPoint in a smart way. That may sound simple, but I come from academia and I was used to making LaTeX-based presentations, and I had a kind of prejudice against Microsoft-based tools. The truth is that in the job market nobody cares about which tool/technology you use, but the most important is that it can be delivered fast and with high quality;
  • Communication: communicate with people from very difficult backgrounds can be challenging, likewise the audience you are talking to must be taken into consideration;
  • Teamwork: working in a smart way and making others understand what you did is also a challenge because explaining hard concepts/analysis in layman’s terms is kind of an art;
  • critical thinking: breaking into a new field is something challenging and delivering high-quality results is something demanding that need to be very well discussed before implementation in the real world. The issue is that real data is much more complex than the data we are used to finding when learning data science, likewise, business understanding is crucial in order to get the best out of the data. Therefore, developing a high-level critical thinking strategy paves the way for delivering high-quality results.

A wrote a story about the Soft Skills that a Data Scientist must have, which can be found here.

Specializations taken

Work is just 8 hours a day, but your career must be developed along with your job. Regardless of whether your employer is asking for a specific set of skills, you must think in the long term, take the lead of your fate and plan the kind of professional you want to be in the future. That is why I have read several books and enrolled in several courses/MOOCs during my technical training in data science. A full list of books and courses/MOOCs that I do recommend can be found here.

That is all, folks. I am sure that this time period having this data science scholarship was worth it.

If you enjoyed this story, please give it some claps.

You can add me on LinkedIn here.

Thanks for reading.

--

--

Vagner Zeizer C. Paes
Geek Culture

Data Scientist; Data Passionate; Applied Machine Learning; Data Analysis