Why a physics degree prepares you well to become a Data Scientist

alessandro guarnieri
CodeX
Published in
4 min readAug 27, 2021

Everybody knows what physics is and what it means to take a degree on this subject: a lot of maths, formulas, cerebral exercises, abstruse concepts, and complex laboratory experiments.

Of course, there is a complete world behind physics and the capacities needed to complete a physics degree depends on the University itself but one of the most important lessons I learned from graduating in physics is how to tackle unknown problems scientifically and that “It’s never over until it’s over”.

The Science part of “Data science”

Most people nowadays are talking about “data”: how to leverage them, how to secure them, how to get something out of them. What about the “science” part?

Throughout my years as a physics student, what fascinated me the most were the laboratory experiments when I could see and touch the physics concepts I was studying in the books. During every experiment what we were usually asked to do was to solve a problem by applying the concepts learned during the lessons.

Experiments: Data Science and Physics

During many simple laboratory experiences, we were starting from a physics law or concept and we were trying to creatively set up an experiment, trying to demonstrate if that law was still valid or not. We were not specifically told which instrument to use or what was the best way to proceed but the starting point (the physics law) and the end goal (demonstrate if it is valid or not within error margins) were clear.

So the first concept which I find being in common with Data Science is the importance of defining your starting point and your endpoint: the business problem to solve and what its solution will bring to your business.

Secondly what I think was the most exciting moment during the entire journey from the first year until the final dissertation at the end of the master's degree was the thesis period. During those months it was finally possible to perform real experiments and extensively trying to demonstrate a hypothesis.

In my case, as I was interested in biophysics and nanotechnology, my goal during the master thesis was to demonstrate if a certain dye was suitable to understand the behavior of calcium protein pumps at the single-molecule level. As said before, the problem and the goal were very clear from the beginning but what I want to highlight now is the iterative process I had to follow. First of all, it was necessary to decide the experiment condition, which means deciding what I wanted to measure, and then setting anything that could affect the measurement. After this, I had to prepare the instruments used during the experiment, perform the experiment, collect the results and analyze them. After the analysis, I had to summarize what I found and decide what was the next step.

Also in Data Science, it is necessary to perform experiments and what is more than necessary are the iterations done to reach a certain evaluation metric. These iterations depend on the specific problem which is being analyzed but generally, they are represented by changes to the features or the model hyperparameters based on the results of the previous experiment. This iterative process is well described by the CRISP-DM approach.

Common background

This one is more obvious but not less important: the common knowledge background necessary to understand and apply physics or data science concepts.

Studying physics means studying nature’s laws, trying to describe, understand, and to a certain extent predict the behaviors of certain natural phenomena. The language we use to read the natural phenomena is, guess what, mathematics! All the physics laws are permeated with mathematical concepts and it is more than necessary to first know calculus, linear algebra, probability, statistics, and informatics to understand well physics.

It is not a surprise to see that all this knowledge is considered the basic foundation of any course of Data Science. This is because machine learning models, pipelines, recommendation systems, or other Data Science topics can be tackled and solved using tools that are built on mathematical and statistical concepts.

The pleasure of discovery

Another important common aspect between physics and data science is the fact that both try to understand or predict something that is not known and this definitely satisfies curious minds.

Differences

After pointing out what are in my opinion some of the common points between physics and data science, I want to pinpoint a difference: their goal. Physics is hard science and its goal is to expand human knowledge towards the natural phenomena, from micro to macro scales; on the other hand, data science has the goal of trying to solve business problems and making business processes more productive and efficient, is it applied to marketing or sells.

Call to action

I want to end this short overview with one or two questions: what are going to be data science applications in the future? Are they going to be only related to business or could they be useful to solve also another type of problems, may be useful for our society?

I think that motivation is what keeps you interested in what you do as a job so I always ask myself these questions, trying to reflect on which kind of impact any application of data science could have in the future. I would definitely like to know what you think about it!

https://www.linkedin.com/in/alessandro-guarnieri-ag/

--

--