What is Data Science & Tips Of The Trade

Nick Gardner
My Journey Towards Data Science
6 min readJun 6, 2022
From A Data Scientist In The Semiconductor Industry: https://www.linkedin.com/in/nick-drew-gardner/

Intention

In writing this article, I intend to help others in becoming a Data Scientist and intend to provide existing professionals with some additional thoughts as they continue their journey. Hopefully, my experience thus far inspires others to pursue their passion in the field. If one person is able to utilize one sentence of this article, this article is deemed a resounding success as far as I am concerned.

I omit to discuss different algorithms and their use cases here as I intend to cover such topics in-depth within subsequent articles. Here I focus purely on what is Data Science and some tips to help Data Science professionals with respect to overall professional strategy.

What is Data Science Anyway?

THE DATA SCIENCE VENN DIAGRAM

So what actually is Data Science? Well, Data Science is a broad domain and in truth is a collection of pre-existing disciplines. As Wikipedia suggests:

“Data science is a “concept to unify statistics, data analysis, informatics, and their related methods” in order to “understand and analyse actual phenomena” with data.[3] It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge.” Source

And hence a Data Scientist:

“Is someone who creates programming code and combines it with statistical knowledge to create insights from data.” Source

Although this definition appears useful I deem it incomplete and unsatisfactory. The definition omits to mention the importance of project management, specifically risk management and stakeholder management. This skill set is seldom highlighted as a requirement, however, is of vital importance in my opinion. Let me ask you,

→ How can a Data Scientist deliver value to their customer if the Data Scientist does not know how to organise a project to realize that value? A house cannot be built by randomly pouring cement at any stage, can it?

→ How can a Data Scientist manage risks such as regulatory risks (like GDPR) without analysing the risk with care and diligence to the context?

→ How can the sponsor of the project remain engaged with the progress of the project, engaged with the positives and negatives of the project, feed into directing the future position of the project, and be assured the eventual product will yield a return on investment (ROI) or losses will be curtailed early?

The simple answer is, that without foundational project/stakeholder/risk management skills a Data Scientist cannot efficiently and effectively manage projects without hand-holding from those with these skills. This is probably, at least in my view, a significant contributor as to why;

85% of all AI investments fail

Source

Typically MLOps and or a Data Science process can help bridge this gap. I leave the details of this topic for another article, however, if you are interested please check out the following for pretty neat explanations:

  1. https://ml-ops.org/ and
  2. https://towardsdatascience.com/the-data-science-process-a19eb7ebc41b

So Is Data Science Just A Bag Of Tricks?

So is Data Science just a loose combination of the aforementioned skills? The answer, at least in my view, is no absolutely not. Let me explain by referencing a wonderful yet generally underappreciated book written by Rachel Schutt (Managing Director at BlackRock AI Labs) and Cathy O’Neil (Harvard PhD and quant for the hedge fund D.E).

The book, Doing Data Science: Straight Talk from the Frontline , recommends a Data Scientist should understand the following:

“ 1) Computer Science

2) Mathematics and Statistics

3) Machine Learning

4) Domain Expertise relative to the data topic

5) Communication and presentation skills

6) Data visualisation skills”

Which is extremely similar to that of the Wikipedia definition. This does not mean one can loosely apply topics with superficial understanding, however. This will not work in the ‘real world’. In my view,

Data Science is about pooling these skills together with peers to find cohesion such that valuable information can be realized from the data.

I appreciate that to the fledgling Data Scientist the enormity of the domain and the task of learning for practical application is likely daunting. To this I say first and foremost;

You must be willing to undertake a life long journey of learning. You will never be done, but you will likely be effective if you try.

If you can stomach this fact, the inexorable fact that a Data Scientist (in the long run) must continuously learn to be effective in an ever-changing environment, then the following may provide solace. As O’Neil and Schutt discuss within Doing Data Science: Straight Talk from the Frontline ,

“Nobody is an expert in everything, which is why it makes more sense to create teams of people who have different profiles and different expertise so as a team they can specialize in all those things.”

Source: Doing Data Science: Straight Talk from the Frontline

This advice is certainly contrarian to many unrealistic job specifications I read which expect everything. I personally label this indicative of the maturity of the profession. Regardless, my advice is:

Not to expect to be an expert in everything. Utilize your expertise aswell as the expertise of those around you to maximize value.

This is illustrated nicely by O’Neil and Schutt below, illustrating that the composite of skills, the team as a whole, is greater than the parts alone.

Source: Doing Data Science: Straight Talk from the Frontline : Figures 1–3, Chapter 1: Introduction: What is Data Science?

Finally, some closing thoughts that may help:

Although you may have knowledge in many domains and or depth in few domains, know that applying textbook answers to problems will unlikely yield you success in the real world. Creativity is required

One must engage with stakeholders and the data via exploratory means to identify truly valuable projects that are not solved trivially.

One must invest time understanding the problem at hand and the data through cleansing and analysis prior to applying ‘fancy models’ so to increase the likliehood of valuable delivery.

One must be a problem solver that is more than just technically proficient at one thing they are ‘comfortable doing’.

One must cultivate ‘soft skills’ aswell as technical ability.

One must be ethical in their execution.

One must critically ask questions.

After all, as O’Neil and Schutt quite rightly suggest:

“Ideally, the next generation of data scientists-in-training are seeking to do more than become technically proficient and land a comfy salary in a nice city — although those things would be nice. Ideally, we’d like data scientists to merit the word “scientist”, so they act as someone who tests hypotheses and welcomes challenges and alternative theories. That means shooting holes in our own ideas, accepting challenges, and devising tests as scientists rather than defending our models using rhetoric or politics. It means cultivating good habits and remaining open to continous learning. It means learning a variety of hard skills including software engineering, statistics, machine learning, visualization, communication, and math.”

Paraphrased from pages 352,353, 354

Closing Remarks

I hope these thoughts are helpful to you on your Data Science journey. If it were easy then the reward would not be so great and so I wish you all the best success. If you have any suggestions for improvements to this article and or would be interested in further topics please contact me via the comments below and or via LinkedIn.

Recommended Reads That Might Be Useful

The following books should give you a good grounding for proceeding as a Data Scientist. Happy reading!

Doing Data Science: Straight Talk from the Frontline

Data Science for Business

Hands-On Machine Learning with Scikit-Learn & TensorFlow

Machine Learning A Probabilistic Perspective

Mastering The Requirements Process

Neural Networks and Learning Machines

--

--