66DaysOfData challenge -DataScience Interview questions-Day26
Greetings everyone!👋
Sorry for postponing the article for a while. I’ve been really busy with my upcoming Master's data science degree. I will be sharing all of what I learned here. Alright, enough for the talk. Let me introduce myself and get ready for today’s interview question.
My name is Matt, and I used to be a teacher with no technical background. However, I decided to transition into a career as a data engineer. Now, I work as a Satellite Data Engineer at FCU(Feng Chia University). I am excited to share my data science experience with you here.
For those who are not familiar with this field, here’s a quick overview.
I’ve recently come across this video about how to build up a habit of learning data science. I was inspired by the author, Ken Jee, and the author of 5 Tips to Make Data Engineering a Marathon, Not a Sprint — Tim Webster. After completing my MSc in Data Science at Sussex, I plan to search for a data engineer job in the UK. To achieve that goal, I’ve decided to take on the 66DaysOfData challenge. I aim to post one random data science interview question from Stratascratch every day. All the questions will be coming from big tech companies like FAANG. I plan first to fulfill the challenge of 66DaysOfData and then see how far it will lead me. I plan to utilize an AI tool to enhance my learning speed on specific topics. To ensure accuracy, I will fact-check all of the information and welcome any feedback regarding inaccuracies. I am eager to engage in discussion and learn more.
Right! Before diving into the Day 26 question, ensure you’ve done Day 25.
LET’S DIVE IN!
Company: Meta/Facebook
Question type: Statistics
Question level: Easy
Job Title: ML Engineer / Data Scientist
Question:
In what cases does the coefficient of determination take negative values?
The coefficient of determination, often represented by R2, is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
It typically ranges from 0 to 1, where:
- An R2 of 0 indicates that the model does not explain any of the variation in the dependent variable around its mean.
- An R2 of 1 indicates that the model explains all the variation in the dependent variable around its mean.
However, in some cases, R2 can be negative. A negative R2 is a counter-intuitive concept and often signifies that the chosen model fits the data worse than a horizontal line (the mean of the dependent variable). This is often a sign of a very poor fit.
Here’s when you might see a negative R2:
- Wrong Model: The model being used to predict the response is not appropriate for the data.
- Overfitting: Adding too many unnecessary terms or predictors to a model can lead to overfitting, which can sometimes result in a negative R2 for out-of-sample data (data that was not used in fitting the model).
- Extrapolation: If you use the model to make predictions outside of the range of the data on which the model was trained, it may produce unreasonable results and lead to a negative R2 when evaluated on such data.
- Measurement Errors: Large errors or outliers in the dependent variable can disproportionately affect the model’s fit and result in a negative R2.
- Nonlinearities: Using a linear regression model for data that exhibits strong nonlinear patterns might result in a poor fit.
Just to keep in mind, a negative R2 value does not hold any theoretical importance when it comes to the percentage of variance explained. If you find yourself with such a value, it could be an indication of some problems during the modeling process, so it’s best to re-evaluate the model thoroughly.
Feel free to drop me a question or comment below.
Cheers, happy learning. I will see you tomorrow.
The data journey is not a sprint but a marathon.
Medium: MattYuChang
LinkedIn: matt-chang
Facebook: Taichung English Meetup
(I created this group four years ago for people who want to hone their English skills. Events are held regularly by our awesome hosts every week. Follow the FB group link for more information!)