Faces of data science 4

Casey Doyle
Data Science at Microsoft
15 min readAug 24, 2020

--

For the fourth in our “Faces of Data Science” series, I’ve interviewed three colleagues who are all new to data science or to Microsoft: Raghu Aditya Kiran Chavali, Xin Xin, and Joshua Chang. All are members of the Customer Growth Analytics team in Microsoft’s Cloud+AI division.

Raghu Aditya Kiran Chavali

Data Scientist

What’s your educational background, Raghu? I do not have a traditional data science or computer science educational background like most people in this field. In India, where I’m from, it’s often an expectation that students become lawyers or doctors or engineers. Based on my family history I pursued mechanical engineering for my undergraduate and graduate study. It was in Carnegie Mellon’s graduate mechanical engineering program that I encountered machine learning. During an introductory course I felt a calling that it was the field I’d been looking for. So, I pursued a research track that allowed me to take courses from different fields. They were primarily in computer science, and with them I was able to experience machine learning and deep learning inference models. Along the way I taught myself to code because as a mechanical engineer I did not have that in my background. My research project focused primarily on using artificial intelligence for legged-robot locomotion, which led to a first author publication in the American Control Conference in 2019. During this time that I was ramping up I effectively had no personal or social life, but I learned that I could identify a field of interest and use my passion to put in the effort to build up my skills to the point someone else could see the value in what I could provide. From time to time I still come across concepts I’m unfamiliar with that would have been taught in traditional computer science engineering. But based on my educational background I’m used to going online and figuring something out that I’m not familiar with to make my work effective.

What did you do prior to coming to Microsoft? Prior to Microsoft, I had a six-month internship at a startup called Obsidian security, a cybersecurity startup focused on improving the security posture for various cloud services. I was exposed to the various components that constitute the cloud. I was able to work on end-to-end problems that involved things like creating extractors to get data from APIs, building and training models to do the analysis, integrating various pipelines, and visualizing human-readable strings in the end product. It was a great learning experience.

Now that you’re at Microsoft, what is the focus of your current work? I’m working on the Azure customer lifetime value forecasting model. The model forecasts monthly customer consumption of our Azure products and services and produces twelve-month forecasts. It’s a complex model with many components, and after familiarizing myself with it I’m now looking after the model in production and making sure it runs smoothly. As issues are identified I fix them and redeploy. Additionally, I am working on a model to help our sales team identify customers for special offers and promotions. I recently took part in the Microsoft Hackathon, working on using hand gestures in Teams. My team is also focused on continuous learning, which is important because no one in data science can be an expert in everything given all the areas it comprises such as data analytics, data visualization, data modeling, data engineering, machine learning, database querying, and so on. My team encourages its members to take Microsoft-sponsored courses from Coursera, collaborate to partake in Kaggle competitions or other experimental projects, and share what we’ve learned.

What are some of the challenges faced by a new team member that a longer tenured team member might not realize? A few things come to mind. First, although new hires typically have the knowledge to do the job, we might struggle with troubleshooting. It takes time to think through the business logic and gain the perspective of the bigger picture that comes more easily to those with more experience. The second challenge for me is keeping updated with the numerous acronyms used within Microsoft, which can create a communication gap between those who are new and those who are more experienced. The third is identifying who to collaborate with. As a newcomer it can be hard for me to identify someone who might be able to help with a particular problem I’m facing, especially while we are all working from home at the present time. The last point is gaining familiarity with Microsoft Azure and its services. Having that knowledge makes a lot of difference in the task at hand. So, for example, if the same task is given to me and someone with more experience, I will focus more on the code and creating that model to make sure it works where someone who’s more experienced might leverage a Microsoft Azure service to create an end-to-end automated framework, which is going to be more scalable, robust, and efficient. This is something I’m actively working toward learning as it’s one of the biggest factors in developing efficiency, and I know it will come with time.

How did you prepare for your interview with Microsoft? Part of it was time investment. After working a full day at my internship I would work from about 6:00 p.m. to 2:00 a.m. preparing for interviews. As part of that I would go through different Medium blogs, looking at a variety of data science concepts and working to improve my coding skills. What helped me most was looking at the job description and analyzing the different aspects involved to get an idea of the sort of questions I was likely to be asked so that I could learn as much as possible about those concepts. I highly recommend Medium in this regard. I would also spend time going through my previous coursework from Carnegie Mellon, and undertake small projects of my own to make sure I’m able to link different concepts to it, including understanding the intuition behind a problem rather than just the problem itself. So, if it’s code, understanding how the code works or why it is written a certain way rather than spending time trying to memorize it.

Knowing what you know today, is there anything you would have done differently to prepare for the interview? Microsoft has a specific way of conducting interviews that I really enjoy. The questions are typically open ended and test for knowledge. They provide an opportunity for the candidate to stitch different concepts together to present a coherent story. One of the interview questions I was asked was to assume for a moment I was running a car dealership. Customers are going to give a rating of one to five and maybe a few comments, and as a manager I would have to build a report at the end of the month about improvements to make, so how would I go about it? This open-ended question allowed me to integrate all the concepts I had learned so far, building connections between them with validation points along the way. Others have asked me how to prepare to interview for Microsoft, and I tell them you can’t really prepare for a Microsoft interview. What you can do is just learn as much as you can and then get eight hours of sleep before the interview. I think most people will notice that most questions are about setting you up for success. It’s an opportunity for you to show how you link all the different things you’ve learned and integrate them by describing multiple approaches. I also tell those who ask for this advice to keep reading the Medium and other data science blogs. I have created my own interview prep material that I have posted on LinkedIn for everyone to view, and I’ve heard from a couple of people who’ve used it as prep material for their own interviews.

Xin Xin

Data & Applied Scientist

What’s your educational background, Xin? I did my bachelor’s degree in China. It was in applied mathematics in engineering, also called mechanics engineering and often confused with mechanical engineering. Mechanics is really applied mathematics. It’s not actually that mechanical, although the programs share some of the same mechanical classes. After that experience I found I was more interested in studying people. I wanted to study social science but I didn’t have the relevant background. So, I went to Texas A&M, where in the educational psychology department they have a program called research, measurement, and statistics, where I earned my master’s degree. After that I thought to myself that because I’m new to the behavioral statistics field, I want to learn more, and so I completed a Ph.D. at the University of North Texas in educational statistics.

That’s an impressive educational background. What did you do prior to coming to Microsoft? After graduation I wasn’t quite sure what to do next. Most of my peers wanted to become an assistant professor or something like that with their Ph.D., but I no longer wanted to stay in higher education, and so I spent some months at Loyola University in their bioinformatics center, where they process genomic data. They wanted to have survey data on patients, and so I helped them with that. Then I spent a few months in the Bay Area at a startup company. It was small, perhaps a couple dozen people. I was the only data person in the R&D department and that was my first taste of the industry. After that I went on to a full-time employee job with Expedia Group, which brought me to the Seattle area. I worked there for a little over two years before coming to Microsoft.

What is the focus of your current work? I’m working on commerce, specifically on scenarios with payment data and chargebacks. Chargebacks are specific scenarios in which someone has a dispute over a payment they made with their bank, or perhaps about someone who stole their credit card and made fraudulent charges. The bank may pass the chargeback onto us, and then we may dispute it by presenting evidence that the customer actually did purchase the product or that the charge was not due to fraud so that we can reclaim some of the funds. My first project was around reviewing a model we set up in this area a couple years ago, adding some additional metrics and seeing whether there are additional opportunities to build it out further. Another project I’m working on is around dunning, which means when someone makes a payment that is declined, causing us to have to continue trying to collect the funds owed. We want to understand how a customer’s usage changes once they are in dunning to help us understand measures we can take that don’t affect legitimate customer usage while curbing abuses and fraud.

What are some challenges faced by a new team member that a longer-tenured team member might not realize? I think everyone realizes this is a special time to be joining a new team because of the pandemic. Instead of asking a quick question in the office, it might be necessary to schedule a meeting, because it might be hard to know when you can interrupt someone online, and things can be a bit more awkward. I want to find an effective way to express myself and share the burden with my teammates but without becoming bothersome to others. It is not easy to realize what I have missed from time to time.

How did you prepare for your interview with Microsoft? Before joining the team, I wasn’t sure what sort of work I would be assigned to, and so my preparation was really just to be myself and efficiently describe my previous projects at Expedia or during my Ph.D. or even my master’s so that the interviewers could understand whether I would be a good fit. In that way it wasn’t about preparing something specifically around Azure or a specific model used by the team. It was more about expressing myself in an efficient way so the interviewers could see it and we could both determine whether there’s a good match. I was looking for work involving people-generated data instead of the type of machine-generated data I worked with during my bachelor’s degree. Specifically, customer actions at different parts of the funnel representing their purchase behavior is what interested me. So, during interviews I was focused less on the domain of the work and more on the team’s structure: Are they trying to evolve, to implement something fast? Do they have support for their work from other teams so they can make a difference? Those were the types of things I was looking for. During the interview I expressed these interests with the interviewers so that as they asked me about specific modeling or statistics experience they would also know about my interests, which I think was helpful in matching up with each other.

What would you do differently to prepare for your interview now that you’ve gone through it? I can only think of minor things to change, such as perhaps the order of stories I presented to highlight some different things. But that’s in hindsight after understanding more about what the interviewers were looking for that was more important than I anticipated. There’s nothing like a mistake I want to correct. With some other interviews with other companies I’d had, there might have been things I’d underestimated or could have prepared for more, but with Microsoft the whole interviewing process was very pleasant, and everyone treated each other with respect. It was a very healthy conversation with my interviewers, when I asked them about the full story about the team they gave it to me, and when they asked me about the line of work I was doing, I gave them the full story. So overall I think it was a pretty good conversation, and now here I am.

Joshua Chang

Data & Applied Scientist II

What’s your educational background, Josh? I started off as an undergraduate at the University of Washington and transferred to Cornell to finish my degree, majoring in math and economics. At that time I was aiming to be an actuary, so I went into the master’s program in actuarial science at Columbia University directly out of undergrad. I also recently completed my MBA at the University of Washington’s Foster School of Business.

What did you do prior to coming to Microsoft? After grad school at Columbia, I worked at Liberty Mutual as an actuary for three and a half years. Actuarial science is basically a very insurance-specific branch of analytics, and there’s certainly a fair amount of math and statistics involved. Ultimately though, I didn’t want to work in the insurance industry for my entire career. Luckily, I had the chance to work with data scientists during my time at Liberty Mutual, and their work seemed broader and more interesting to me in the long run while also building on my existing skill set. Once I decided to make the jump, I left my job and attended a three-month immersive data science boot camp offered by Galvanize. This enabled me to transition into my first data science job at Nordstrom, where I was a member of the corporate analytics team, an internal consulting group of sorts. I then worked at Expedia for two years on the metasearch marketing team, where our primary output consisted of production-level bidding algorithms that optimized our advertising spend on meta channels. I worked in both data science and product management capacities at Expedia, but I eventually wanted to focus just on data science again. This largely informed my decision to move to Microsoft, where I’m currently employed.

What is the focus of your current work? One of the things I enjoy most about data science is that the problems it can be applied to are really varied. Right now I’m working primarily in the support space, building on efforts such as at-risk case recovery, a project that aims to predict which open tickets are at risk of returning a low satisfaction score. This model is a key component of a larger Azure initiative around improving customer support. I’m also currently involved in the intelligent support rollout initiative, in which we pre-emptively reach out to people who we feel may benefit from additional support. Within this project, I’m working on fine tuning the event thresholds that indicate a customer in need of support. Once identified, we can then point them to relevant documentation or let them know who to contact to move forward. I really appreciate the customer-centric aspect of support work, as it’s good to know I’m having a downstream impact. There are a lot of data science focus areas that are more internal or company facing, and I’ve found that I prefer to work on products that influence the customer experience.

What are some challenges faced by a new team member that a longer-tenured team member might not realize? There’s a pretty steep learning curve here with regard to knowing who you need to know. There are many subject matter experts on various topics and stakeholders who must be consulted to accomplish your projects, and different organizations within the company are often working on similar efforts in parallel, all of which can make things challenging at times. Longer tenured members of my team may take for granted who to consult and where to go for a particular resource. I often go through a two-stage referral process to get the answers I need, wherein I first ask one of my teammates who else I should speak to. For example, one of my early projects was around deriving sentiment for customer support emails. I did a lot of work around cleaning email text data and applying our internal cognitive services API to calculate a sentiment score, which was then utilized as an additional signal in our at-risk case model. As I progressed, I was pulled into many conversations and learned of scientists in other organizations working on similar sentiment related efforts — maybe even using the same API for a slightly different use case. So, just getting all the people who need to be involved in any one conversation in a room together can be a challenge. Working at home during the pandemic definitely adds to the challenge because it’s much more difficult to get a natural feel for who to talk to when we’re not all on campus.

How did you prepare for your interview with Microsoft? To prepare for data scientist interviews, I generally brush up on coding in Python and SQL by running through practice problems. I personally use a site called Codewars for coding challenges, which is helpful to get me into the interview mindset. I also try to find out who I’ll be interviewing with ahead of time so I can prepare two or three questions for each of them, usually along the lines of what they’re working on and what their role is within the team, or about their previous professional experience and their career journey to this point. I think a lot of what interviewers look for in candidates is the ability to communicate effectively and clearly explain your thought process, so I always rehearse my elevator pitch and my answers to common interview questions out loud. Finally, I brush up on machine learning concepts. I have a “cheat sheet” of five or six pages from my time in school on data science fundamentals, which I review to refresh my understanding of the theoretical basis and math behind subjects I’m familiar with but have not had the chance to apply regularly on the job, such as recommender systems.

What would you do differently to prepare for your interview now that you’ve gone through it? Something interesting to me about the Microsoft interview is that, in comparison to other interviews I’ve gone through, it was not nearly as technical as I expected. The only portion where we did coding was in the initial phone screen before the full interview loop. We had a one-hour conversation with an online coding pad covering SQL and some mathematical concepts. I admit that I blanked on the entire notion of self joins, so looking back I wish I’d had the presence of mind to anticipate that coming up, but that’s pretty specific — a sort of SQL trick, so to speak. Once we got to the onsite interview stage, I was mostly presented with case study type questions, where I would talk about how I would approach the problem, the metrics I would look at, and what my recommendations would be based on certain hypothetical results. Also, how I would consider correlation versus causation, that kind of thing. There were also a couple behavioral questions — not many — and ones about my previous experience. One thing I feel I could have improved upon going into my Microsoft interview was my domain-specific knowledge on cloud infrastructure and Azure in particular. I had thought that I was already familiar enough with the cloud due to my previous work using related products such as compute, storage, and notebook environments, but the cloud is an extremely broad category and it’s easy to just scratch the surface. Because I was interviewing for a role within the Azure organization, the case questions themselves all centered around cloud-specific terminology and issues. The questions I’ve been asked in interviews with other companies have generally been more theoretical, and even those focused on the product at hand were fairly standard. But the cloud is unlike most other products in the way it’s consumed and measured, so understanding the difference among concepts such as infrastructure, platform and software as a service is very helpful.

Our organization is delighted to welcome Raghu, Xin, and Joshua among the dozen or so people new to Microsoft who joined our team during this past year. The perspectives and backgrounds they bring help our organization push the limits on the most challenging problems in data science today, and help us find the answers.

Casey Doyle is on LinkedIn.

--

--

Casey Doyle
Data Science at Microsoft

Principal Data Scientist of a data storytelling program fostering thought leadership in information design and data visualization inside and outside Microsoft.