“There are Always Some Hidden Qualities in Big Data that No One is Aware of.” — An Interview with Siddharth Baronia from SAP and Our 2nd Cohort
We are honored to invite Siddharth Baronia from the second cohort of SFU Professional Master’s Program in Big Data, who is currently working in SAP as a data scientist under the SAP Analytics Cloud team. We had some enjoyable time talking about the current situation and future vision of the data science industry and gained some insight into how it is like to work as a data scientist in an international business solution technology firm.
Q: Hi, Siddharth. We are so glad to meet you. Would you like to begin with an introduction of yourself?
S: Sure. I started studying Electrical Engineering in my undergraduate study at University of Winsor and graduated there in 2012. Then I worked for Cisco Systems for 3 years as a software developer. Actually, it was during that time when the hype in big data and data began, especially in North America. A lot of companies and professionals followed the trend of data science. So, the demand for data scientists increases higher and higher in all sectors and industries. I was extremely curious about it, so I decided to shift my focus from being a general software engineer to a specialized data scientist.
SFU was one of the early movers in Canada to start a professional master’s program in Big Data. When I found out such information, I just wanted to give it a try. I found myself so lucky to be accepted. I was so glad that I had the opportunity to develop my skills further and deeper. As I was a low-level software engineer in Cisco, I could eventually shift my focus from low-level programming, which is mainly about operating systems and routers, to the field of cloud and data science. Getting a formal education in SFU could help me move into this profession and it acts as the first footstep to enter the data science job market. I think I did a good start in applying to the Professional Masters of Big Data in SFU. SFU taught me the core of performing data science practice. The education is pretty good overall in my opinion.
Q. There are so many big data and data science programs in North America and all around the world. What are your specific reasons to choose SFU’s big data program?
S: As I mentioned earlier, SFU’s big data program was the very first of its kind in Canada, if it was not the first one. I was not interested in moving out of Canada, so my goal was to study at a university in Canada. Also, I wanted to move from the east coast to the west coast. Things just worked out pretty well for me as I could come to Vancouver to study in SFU.
To be honest, you don’t need to have a formal education to study data science as there are so many resources available online. If you have motivation on your own and want to understand this field, you can take micro-degrees and nano-degrees in Coursera, Udemy, and other online educational websites. You can just read and understand on your own if you want to know a bit. But I think SFU was the only one to offer formal education in data science back then and the reputation of the Computer Science department was really good. There are SFU graduates working in Silicon Valley as well. That is why I gave it a try and I reckon that it was a good move.
Q. Can you introduce your role in SAP?
S: When I was an intern in SAP, I was a software developer. We have a product called SAP Analytics Cloud, which is basically a software as a service (SaaS). It provides cloud prediction, visualization, charting, monitoring of data, among all other features. When I got hired as an intern, I was working on cloud infrastructure in the DevOps team. I created new build pipelines and other services, infrastructure and microservices. I then moved to a lower level. I was responsible for KPI (key performance indicator) and that’s where I started working with data.
We started collecting data from different infrastructures and services, with different components and features. They included adverts, clients, licensing, and that’s where I got into the data side. I performed ETL and data collection and started creating big charts.
Then, I took a break for around one month after finishing my internship. I came back as a full-time in 2017. By then I started working on cloud infrastructures. My responsibilities were all about developing services, monitoring pipelines and tools for other products and SAP Analytics Cloud. Recently, I was on a fellowship for 3 months, and I was a data scientist and machine learning engineer in one of the teams. My job duties included solving a problem of classifying emails into proper categories. So, this is another data science and machine learning experience that I have.
Q. Does SAP focus aggressively on machine learning?
S: In the team I am working in, the developers develop services that involve in predicting features. Our teams don’t design and research new core machine learning algorithms, but they use the ML algorithms that are developed from other teams at SAP. We leverage the machine learning services to our products.
Q. Would you mind sharing what you think are the advantages and potential of studying Big Data? This question may sound general but at the same time, it is important for those who want to enter the big data field.
S: Although recently big data and data science gets more and more popular in the industry and the commercial world, machine learning and data science research have been at the back in labs since some decades ago. For instance, map reduce was published in a research paper like 20 years back. Thus, the underlying technology of extracting meaningful information from data began a long time ago. Certain experts just grouped related technologies under the name of big data and data science in recent years. In fact, the values of data have been known for quite some time. I remember I read an article titled “Data is the New Oil”.
Intel CEO Says Data is the New Oil
Brian Krzanich believes that big data will dramatically change the world.
The more data you have, the more you can retrospectively think about how your products, services, customers, and the company are going.
You can have more insight into whether the company is going in the right direction. We also need all the data to be stored.
Let me give you a very simple example. The only reason we can predict the weather of the following week or make a forecast for the next 14 days is that we have historical data collected since the last century. We have information to inform us about the pattern of the climate and the temperature. By using algorithms, theories, and technologies, the scientists can predict the weather of the next morning pretty accurately to inform the public to get prepared ahead. Everything is about extracting insight from data. If you are lucky enough to have historical data available and you have the capacity to process and to work on it, you can guide yourself in the right direction by finding insight from the data.
There are always some hidden qualities in big data that no one is aware of.
Only when someone starts to dig deeper and analyze the data, answers are gradually revealed. You cannot make conclusions just by reading the database with your eyes. You need to systemically make aggregation and do the right joining and formatting on the right data and tables. What one thinks is useful for his/her scenario would be different from what another data scientist thinks in another situation. Every data scientist needs to develop his/her analyze and theories on his/her own. You give your own meaning to the very same data that another data analyst recognizes differently.
You are also prepared to understand what tools and technologies are available and at what costs. You can use your personal computer to start running data mining and machine learning in the cloud or with your own CPUs or GPUs.
Data are available everywhere, so you can just do data mining anywhere.
The university teaches you by guiding you in the right direction and helping you get prepared through assignments and projects. After formal education, it is up to us to develop further and make use of what we know in different situations. For examples, we can apply machine learning techniques to come up with personalization of medicines or build a recommendation system for Netflix and Amazon with collaborative filtering. We can do all the fancy things when we have a history of user behavior.
I should tell you that I always follow the Netflix data science blog. From time to time, the data science team posts on the algorithms and concepts they use behind Netflix. For examples, they tailor the cover of movies based on the users’ taste. They even predicted the success and popularity of House of Cards before it was released. The power of data science and big data helped them predict the customers’ tendency about a TV series.
Q. Do you have any recommended websites where we can read articles about data science, or to further develop our skills in our leisure time?
S: You can always find the technical blog of the data science teams behind popular online services such as Facebook and Quora. You can also participate in Kaggle competitions and earn rewards by solving data science related problems across all the fields you can think of.
Q. As you have been working in the industry for quite some years, what types of changes have you noticed over the years where data science and analytics become more and more popular among all business sectors? How are organizations and companies divide the work of data science, engineering and analytics?
S: In the SAP Vancouver office here, we don’t have a dedicated team called the Data Science team.
My team is in charge of monitoring infrastructure, in some ways similar to what data engineers or data scientists do.
As there are metadata from everywhere. Every team, such as the customers’ success team and the product management team, to name a few, collect all the data they have and create lots of different data pipelines and try to figure out relations and discoveries by exploring the data. So even if there is no one single team doing data science work, all the teams in some sense perform data science practice within their own areas.
Actually, if you come to our SAP office here in Vancouver, you can see multiple visualizations and charts made by other teams shown on screens everywhere around the office. You can have a peek at how customers behave, how many firms and teams are involving in whatever different angles. But as you mentioned earlier, I know that other firms are moving forward to the direction which there is an established dedicated data science or data engineering team. They can retrospectively understand how the company’s products are performing and see the big picture behind. I have to note that this is a very essential key point.
Q. Do you have any long-term goal in your career, given that you have been working in SAP for two and a half years?
S: In terms of my long-term goals, I really want to shift towards product management while exploring in data engineering and data science. I want to try the next version of the products or come up with the features that customers and people need, right? So, I want to move into that direction, the direction of being a product manager.
Thank you for reading our article. We hope that it worth your time to read. Interviewed and written by Yunhye Joo and Mak Hoi Victor Hau.