Ways to mitigate or account for the effect of COVID-19 in data.

Photo by Dimitri: Unsplash


If you’d like to learn more about the possible repercussions of COVID-19 on ML pipelines, check it out here. The first part of this series covered data-centric methods that can be found here. It would be helpful to go through those before jumping ahead.

Replacing data with Original Predictions

In the first part of this series, we discussed replacing data impacted by COVID-19 data…

Ways to mitigate or account for the effect of COVID-19 in data.

Photo by Kevin: Unsplash

After attempting to answer the “Why” question about the impact of COVID-19 on Data Science, it makes sense to follow up with the “How to Handle” question. This series will aim to provide some potential answers to the “How” with input from Data Science leaders in the industry.

Not sure what the “Why” is? You probably didn’t read the prelude to this series. Please read it here to understand the possible effects of COVID-19 on Data Science methods, before going any further.

In this series, we will be reviewing dropping data, replacing data & feature engineering approaches, thus far. …

Multiple stakeholders mean, end-user preferences are not the sole target

Image from Unsplash

In this age of online frenzy, everyone has certainly come across recommendation systems. Dating apps are not the only use cases for such systems. They are used in a variety of different domains to help users find items/information relevant to their personal liking. You might be inclined to think that the main goal of these systems is to best match the interest of the user i.e. you. Though that is important, users are certainly not the only ones looking to benefit from these systems. In other words, there are multiple stakeholders. I’ll explain these using more generic examples.

Multi-Stakeholder Recommendation

A multi-stakeholder…

The effects of coronavirus will ripple through data science projects

Photo by Aaron: Unsplash

“Your model is as good as your data” is the most basic postulation in data science. Good data equals a good model! The coronavirus has impacted millions of lives around the globe, wreaked havoc on the airline industry and shattered equity markets globally. Depending on how quickly it is brought under control, the coronavirus, undoubtedly, will continue to affect the daily lives of many. Like everyone else, Data Scientists will also be affected. And no, I am not talking about work from home.
In economics, the term “Response Lag” is used to denote the time it takes for corrective measures to…

Using residual plots to validate your regression models


One of the most important parts of any Data Science/ML project is model validation. It is as important as any of your previous work up to that point. It is that one last hurdle before the Hurrah!

For regression, there are numerous methods to evaluate the goodness of your fit i.e. how well the model fits the data. R² values are just one such measure. But they are not always the best at making us feel confident about our model.

Image from Unsplash

To Err is Human, To Err Randomly is Statistically Divine

And that is where Residual plots come in. Let’s talk…

Costly mistakes that can render your machine learning based time series forecasting model inaccurate in production

Image from Tralfaz

However, lately, a lot of research has propped up that shows promise in using the shinier ML models such as Neural Networks

Uber demand prediction using streaming data (Trajectory Streams)

Photo by Zhipeng Ya on Unsplash

Outlier detection is an interesting data mining task that is used quite extensively to detect anomalies in data. Outliers are points that exhibit significantly different properties than the majority of the points. To this end, outlier detection has very interesting applications such as credit card fraud detection (suspicious transactions), traffic management (drunk/rash driving) or Network Intrusions (hacks) etc. Due to the time-critical nature of these applications, there is a need for scalable outlier detection techniques.

In this project, we will aim to detect outlier in a Taxi Dataset (Beijing), using a technique that only uses spatio-temporal characteristics to detect outliers…

Usman Gohar

Data Scientist | Speaker | Open-source contributor https://www.linkedin.com/in/usman-gohar/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store