Practicum Project — a perfect segue into real world analytics

Learn from this journey of becoming a promising analytics professional!

Paridhi Agal
7 min readApr 15, 2020

I have been sheltering in place for over a month and a half now and have been reading about the COVID-19 pandemic, how it has posed unprecedented challenges on every possible front whether it be social or economic. Amidst all the chaos I have been trying to channelize my energy to come out stronger after this crisis. One of the key tasks keeping me occupied is working on the last leg of my Practicum Project. I feel thankful as this project is helping me in coming up as a stronger analytics professional after my MS in Business Analytics.

Channelizing Energy! Becoming Stronger!

In this blog I want to share about one of the enriching exposures I have got in the past year while working on a 10 month long practicum project with a SaaS company that provides operational, security and business intelligence for your applications at cloud scale. My team is working on this stimulating project which entails applying Machine Learning and advanced Data Science to drive customer retention, churn and expansion decisions at this cloud-based enterprise machine data management and analytics-as-a-service vendor. The analysis combines usage data from petabyte-scale platform and its CRM to identify strategies for customer retention and opportunities for market expansion.

I knew practicums projects are an experiential learning opportunity for students which helps develop and enhance skills, gain insights, and gather knowledge. They are a perfect bridge between academic and professional career. True to its purpose of being a part of the curriculum this project prepared me well by bringing me closer to my career goals. This helped me develop my functional knowledge, soft skills as well as people competency. This gave me a platform to implement the learning from courses such as data management, machine learning, data visualization, data design and advanced statistics right away in solving real world problems. Working on a data and decision-centric anlaytics project enhanced my technical as well as business understanding as we combined both big data (Eg. usage logs) and domain knowledge(SaaS metrics) to shape our insights.

I hear and I forget. I see and I remember. I do and I understand — Confucius

A key takeaway for me from the hands on experience of working on a data science project from its inception to execution is acknowledging the importance of abiding by a data analysis process model for any machine learning project. One popular model we followed to plan, organize and execute our data analysis project was CRISP-DM. The model has 6 stages as shown in the flow diagram.

CRISP-DM 6 Step Model

Learning the Nuances of every stage

Every stage had its own challenges and learning associated with it. I am glad I now know the fine details of every phase that I cannot miss in order to make my future data science projects a success. It will be a good idea to take you through the nuances of every stage, share the highlights of our process and share what I picked up along the journey. It will be a combination of what worked for us and what we learnt from our mistakes. To ensure a smooth progress of the project you should keep the following things in mind:

1. Appreciate the importance of business understanding

While working on the customer retention and revenue expansion problem, it was very important to understand what success looked like for our MSBA Industry Partner. We understood the industry, the competitors, the product, the ways of working, the terminology, the popular SaaS metrics and even got trained on the platform to get a comprehensive understanding of the business landscape. This in turn helped us clearly agree on the objective and project plan.

This involved having detailed interviews, dyads and triads with the stakeholders, leadership and end users of our deliverable especially account executives and customer success mangers to understand the current processes and issues. Knowing the stakeholders helped us create their personas. The persona of the users, customers and stakeholders included details about their goals, needs, desires, attitudes, and actions. These helped us continuously have them in the center of our thought process while designing and tailoring the deliverable. This phase overlaps with Empathize and Define phase of Stanford’s Design Thinking process.

Design Thinking process

2. Prioritize understanding the data

We utilized initial few weeks in thoroughly understanding the data elements as data was coming from two disparate sources. It was helpful to get demos from experts to make sense of the variables in usage logs and CRM data. We documented the understanding in the form of a data dictionary which can be referred for future executions of similar project. We verified the data quality and acknowledged that the CRM data is often manually populated so it is better to get samples verified. This helped us find the missing values, errors and relationship between certain fields. We understood the data limitations and constraints.

3. Never underestimate Data Preparation

In this stage I could totally relate to this statement that “Data preparation accounts for about 80% of the work of data scientists” as mentioned in this news article from Forbes. We cross verified the data quality and summary statistics of the data time to time to comment on the credibility of certain data fields and identify inconsistencies/missing values. With multiple versions of data and validating the data sanctity we proceeded with data cleaning, wrangling and exploring ways to impute the missing data. We used Tableau and Python to do exploratory data analysis and build data pipelines. We explained the reasons behind including and dropping specific fields, rationale behind merging different tables and the rules we used to transform data. Not to forget in the process we kept in mind that the data pipeline we create using Python needs to reusable for new data, the dashboards we design need to be reproducible.

4. Select the appropriate machine learning models

While we explored various modelling techniques that can solve the business problem one of the important tasks was feature engineering, to create the features that come from weaving the data with business intuition and domain knowledge. We chose to spend a lot of time in doing exploratory data analysis and visualizing the trends and insights using Tableau dashboards. We wanted to not restrict ourselves to descriptive and diagnostic analytics and go beyond by performing predictive and prescriptive analytics.We explored various classification models like logistic regression, random forest, XGBoost. To balance interpretability with performance we decided to refine our more interpretable logistic regression model for churn prediction.

Machine Learning Techniques

We used linear regression to predict incremental revenue, logistic regression to predict probability of churn and K- mean clustering algorithms to segment customers. It was a value adding experience to build and reject models that later did not add a lot of value such as time series analysis and regularization due to data constraints or inconsistency with business.

5. Defining the right evaluation criteria for the model

As we don’t want to predict that customer will stay when they will churn in reality, the false positives need to be minimized in our case. We want to improve our precision as much as possible. This phase involved using the right criteria to evaluate our models. We strive to continuously improve our model performance and accuracy keeping in mind that the model needs to perform well with out of sample data.

6. Embedding the solution in the business processes and ensuring smooth handover

We presented our findings and recommendations to the leadership. After seeking feedback we refined certain parameters and variables , then we worked on the deployment plan. You get a sense of achievement when you finally deploy your solution. For us this involved exploring the current practices and how our metric can be tied to improve some of the current processes and introducing new processes. Not only this we explored other business actions to sharpen the business strategy. We designed and proposed product feature utilization metrics to predict customer behaviors and also validated some metrics that were getting tracked currently to define customer health. This has a key task of preparing instruction manuals to use our code and dashboards and monitoring and maintenance plan for continuity of our recommended models.

7. Acknowledging the Limitations

During the entire process we did not get bogged down by the limitations and strived to make the best out the available data. The challenges we faced were

a. Limitation on availability of historical data

b. Lack of time to wait for new data around customer’s decision to churn/upgrade as the organization has multi-year subscriptions

But we kept our eye on the prize and identified opportunities to still deliver high quality, reliable results.

Going beyond!

Not only this the project helped me hone my technical skills as well as team working skills. Working in a diverse team was like working in a real corporate set up where we acknowledged as well as questioned the opinions and results to continuously polish the outcome. Throughout the project key tasks that honed my soft skills were collaborating with stakeholder to make this project a success, managing and aligning stakeholders effectively and translating technical details into engaging business stories.

Like Sydney J. Harris says “The whole purpose of education is to turn mirrors into windows”, this practicum project helped me overcome the barriers, get an endless view of the analytics world and appreciate the complexity of data science projects. I hope this blog helped you in getting a detailed understanding of what goes into every stage of a data analytics process model . As these learning helped we evolve, hope they added value to you too.

--

--

Paridhi Agal

Pursuing Master of Science in Business Analytics from UC Davis. A Computer Engineer with an MBA.