A TALE OF THREE DATA SCIENTISTS
“I like a good story and I also like staring at the sea — do I have to choose between the two?” ― David Byrne, How Music Works
If you are a data scientist, the answer is no! You can have both; as long as your data-driven stories are built on solid research ground. A data scientist utilizes techniques and tools found in the context of many disciplines (computer science, statistics, math, information theory) to find meaningful interpretation of data. This interdisciplinary nature of data science attracts people from all sort of background; economist, physicists, mathematicians, computer scientists, engineers, statisticians and many more.
Today we will be interviewing three budding data scientists about their data science journey. All our data scientists have successfully gone through the phase of settling in. They have faced the same questions and hurdles many newcomers in the field are going to meet. We hope this fusion of opinions from our data scientists would provide an aspiring data scientist with a clear picture of the current state of data science in the industry.
Meet our data scientists:
Passion is the name of the game: Data scientists are some of the most passionate people out there. For all of our data scientists it was a conscious decision to move into this field. We wanted to know why.
What excites you most about data science/machine learning?
“I read somewhere about how being a data scientist is like being a detective. I guess it’s exciting to put on Sherlock Holmes hat and investigate the data — understand how it works, understand the business and build something that works.” -Shruthi
“What excites me most is, this field has so many new avenues to explore. The techniques and theories used in this field are definitely not out of the world. These techniques, at least the basics, have been out there for a long time. We just didn’t use them in a very useful way. With huge amount of data and increasingly cheap computing power we are in a position to explore so many things, which we never could do in the past.” -Shariful
“Being able to make use of data that’s been generated or can be easily recorded from the business processes & workflows, and leveraging the same to strategize and improvise business through data-driven decisions and actions is cool & exciting.” -Vishal
Learning to learn: As with most things in life getting started often poses the hardest challenge. All our data scientists took the route of completing a graduate degree to get into the industry. They all felt that their graduate degree was instrumental to their success.
How did your graduate degree help you become a data scientist? How relevant is the course structure with the current industry?
“I had been working on ML and NLP use cases even before joining this program. But the primary motivation for me to pursue the course was to be able to expand applicability of the solutions on data at scale. The curriculum is very well structured covering a good mix of components of this vast domain.” -Vishal
“I would specifically mention the co-op program. The courses are also very up to date and industry focused.” -Shariful
“I really liked the course structure of SFU Big Data Program. The 12 credits big data lab 1 & 2 made the whole course worth it. The course is very relevant with the current industry. I am using the things I learnt there, so it’s been a really good investment.” -Shruthi
What happens in a co-op stays in.. :Getting some industry experience in the form of a Co-op/Internship remains an effective way to get into the industry. All of our data scientists had 4–8 months Coop work experience before starting their current job.
Could you give an overview of your Coop experience?
I did my co-op at Xerus medical Inc. Being a small company Xerus demanded other works which does not come with a job description of a data scientist. I have done some data visualization using D3.js, data preparation and cleaning besides working on a machine learning project. Although sometimes I really wanted to focus on data science and machine learning specific projects, in retrospect I think this helped me to get the experience of the full software development life-cycle.-Shariful
I liked working for Community Sift/Two Hat Security as my internship. They have so many advanced products which labels data based on the offensive content present in it. They kick-started a project to study how offensive language in Reddit impacts its user — do people tend to quit if other people start using offensive language in a community. I was a part of it and I had a great coop experience.-Shruthi
Skills Needed: The field of data science is ever evolving. The tools and techniques you learned today might not be relevant tomorrow. The field requires a strong commitment towards learning.
Name skills that you think one needs the most to succeed in this field?
“Truckloads of confidence. Machine learning/AI are still buzzwords and a lot of people don’t fully understand them yet. So, you need to have enough confidence to say “Hey I can program a NASA satellite if you want me to” (Confidence is not my forte). The other skills you can always pick up by reading/practicing, the technological requirements vary for each job.” -Shruthi
“You need to keep learning. This field is moving very fast. It’s not only about learning a few algorithms or library. To succeed you must keep yourself familiar with the new and exciting stuffs coming out every day.
In terms of soft skill, we must learn how to collaborate. Data scientists need to collaborate with data engineers, software engineers as well as with the business people.” -Shariful
“Firstly, one has to be good at dealing with data. Need to be curious and enthusiastic enough to keep revisiting data from fresh point of view to spawn new ideas and more questions about the data. In addition to be able to perform analysis, asking the right questions and understanding stakeholders’ needs, formulate solution and translating the results of the analyses in terms of business impact to executives is equally important.” -Vishal
Languages and tools: We asked our data scientists about their choice of programming languages and tools.
Which big data/machine learning tools you use every day?
“Kafka, Hive, Spark, Cassandra, Druid. In addition to that, I also explore other tools to optimize the existing architecture.” -Vishal
“Python and SQL is sufficient for my job.” -Shruthi
“In everyday work I use Python based machine learning tools. For most of the works I use scikit-learn for machine learning, pandas and numpy for data preparation, matplotlib and seaborn for visualization. Occasionally I use Keras.” -Shariful
The work they do: We asked them about the kind of projects they work on a day to day basis.
Could you tell us about a data science/big data project you have worked/working on recently?
“One of the recent projects I did was related to topic modelling using regulatory risk data-set, to identify potential recurring themes across various business units.”- Shruthi
“I have just finished working on a project for predicting booking cancellation for Left Travel. Cancellation of booking has a huge impact on our revenue.” -Shariful.
“Prioritization of vulnerability remediation efforts — Using severity, exploit-ability and business criticality related features to prioritize remediation of vulnerabilities discovered across organisation-wide assets.”- Vishal.
Everyone has an opinion: Industry is often different than what people expect.
Misconception people have about working in the field?
“I think sometimes people focus too much on tools. They ask questions like, ‘do you use Spark?’, ‘do you know TensorFlow’, ‘why don’t you use GPUs’ etc. To me it’s not about using such and such tools. It’s more about the problem you are trying to solve. Being an expert in using a particular tool definitely helps, but the tools will and should change based on the project.” -Shariful
“I had this crazy idea that I go into a job and the problem statement will be handed to me, and I would use machine learning to solve it. But it’s not the case in many places as people are still not sure about data science. A lot of times, you will have to come up with business case yourself and educate people what you can do. Sometimes people come up to me and say “You are a data scientist, right? Can you automate this report for me?” and I am like “But this has nothing to do with data science. This is just simple automation.” -Shruthi
“Not sure about the misconceptions. But regarding the practical industry, all that matters is how you can add value to business. At university, everything is streamlined and well structured. We learn in a controlled environment with predefined expected outcome alongside a point of reference for guidance. But at workplace, it’s not the same. I believe getting exposure to practical industry in the form of co-op is very important.” -Vishal.
Avoiding known pitfalls: We asked them about the way they’d choose to do the course again if they have a choice to do so.
If you could do it again (learning machine learning/data science), where would you start?
“I would do more projects. Specially, I would participate on a bunch of Kaggle competitions. Will read more papers. I would focus less on tools. Because you can learn new tool whenever necessary if you already have the basics figured out. I would also focus on software engineering side. How to scale and deploy machine learning models for a production environment is a very important skill to learn.” -Shariful
“I came here as an international student. Doing 3 courses in my 1st term it was probably a little too much. I would have liked to take things a little slow. I am not the kind of person who does a lot of online training/certifications. I needed the discipline and structure that comes with being in a graduate program” -Shruthi.
A window into the future: We asked our data scientists about the future of data science.
Where do you think the field is going?
“I think it is going into two major directions. The first one is developing new and exciting technologies and really pushing the boundary to explore the extreme possibilities. We see a huge amount of AI start-ups in so many different fields.
The next direction is the application part of it. All the companies who have a huge amount of data want to do something with it. These companies are not necessarily tech companies. They are not after building anything exciting, they do not care much about the technology, they are more after using the existing tools and data to create value to the business. To my understanding both of this branch will keep growing in near future. However, I am more excited about the first direction.” -Shariful