3 ways to deal with SettingWithCopyWarning in Pandas

Padhma Muniraj
Analytics Vidhya
Published in
8 min readDec 22, 2021
Image: Source

It doesn’t pay to ignore warnings. Even when they don’t make sense. — Debra Doyle

One of the things I was taught while learning to code was to not be bothered by ‘Warnings’ in your code. “Focus on fixing major bugs and errors, warnings aren’t a big deal”. I realized what a terrible advice this was when I started working on real-world situations. Sometimes warnings can cost you more than you think. One such warning is the SettingWithCopy warning in Pandas.

No matter how long you’ve worked with pandas, sooner or later you’re bound to encounter the Settingwithcopywarning. If you’re trying to wrap your heads around what it is and why it keeps showing up even when you “do get the output you expected”, then this article is for you.

In order to explain the logic behind the warning, I have used the Car Sales dataset from Kaggle. This dataset contains information about different types of cars.

Here is a glimpse of the data and the structure of the dataset.

# car_sales is the dataframe containing the dataset
car_sales.info()
car sales dataset info

Let’s assume a scenario where we have received an update that the fuel_capacity of all the Porsche cars have been increased from 17.0 to 18.0 and we have been requested to make the changes. Let’s go ahead and change them.

car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity'] = 18.0

Uh-oh! We have triggered the famous SettingWithCopy warning.

If we take a look at the dataframe now, we can see the values are not updated.

car_sales[car_sales['Manufacturer'] == 'Porsche']

We have to understand SettingWithCopy is a warning and not an error. An error breaks your code and prevents you from moving on further without fixing it. But, a warning indicates that there is something wrong with your code while producing the output.

In this case, sometimes we might get the output we intended to and also be tempted to ignore the warning. But we should never ignore this warning because it means that the operation we are trying to perform may not have worked as we expected and there can be some unexpected issues in the future.

These are the words of Jeff Reback, one of the core developers of pandas, on why you should never ignore this warning.

In order to understand how to fix this warning and what to do when we face it, it is imperative to know the difference between Views and Copies in Pandas and how they work.

Views Vs Copies

In the code above where we try to return all the Porsche cars from the data, the result we receive may either be a view or a copy of the dataframe.

A view (or a shallow copy) is a subset of the original object which doesn’t have its own memory and address space. It is just a projection of the object we are trying to access.

View of a dataframe

A copy (or a deep copy) is a duplicate of the original object which has its own memory and address space. It is a separate entity that is thrown away in Pandas once we are done operating on them.

Copy of a dataframe

One of the main differences between views and copies is that modifying a view modifies the original dataframe and vice versa, whereas modifying a copy doesn’t affect the original dataframe.

Let’s say we change sales_in_thousands for the car of Model Boxter to 9.35.

Modifying a view

You can see above, modifying a view modifies the original dataframe as well.

Modifying a copy

On the contrary, modifying a copy doesn’t necessarily modify the original dataframe.

Pandas got this behavior of views and copies from the underlying Numpy arrays. Numpy arrays are limited to a datatype so whether a view or a copy is returned can be predicted. While Pandas uses its Numpy core, it follows a complex set of rules to optimize space and determine whether to return a view or a copy. Because of that, whenever we are indexing a dataframe, there is no set way to predict whether a view or a copy is returned. To quote the pandas documentation,

Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees) ………. That’s what SettingWithCopy is warning you about!

To check whether a view or a copy is returned, you can use the internal attributes _is_view or _is_copy. _is_view returns a boolean and _is_copy returns a reference to the original dataframe or None.

Let’s look at 3 of the most common issues for encountering this warning and how to handle them.

1. Chained Assignment

One of the most common reasons Pandas generates this warning is when it detects chained assignment or chained indexing.

There are two things we do with a Pandas dataframe, we either

  • Set — assign a value to something
  • Or, Get — access values from something

A chained assignment is when we try to assign(set) something by using more than one indexing operation.

Recall the example below which we used previously.

car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity'] = 18.0

Here, two indexing operations are combined to set a value. First, we are trying to access (get) all the ‘Porsche’ cars from the dataframe, then we try to assign(set) a new value to ‘Fuel_capacity’.

We want to modify the original dataframe but this operation may try to create a copy and modify it. This is what the warning is telling us. ‘A value is trying to be set on a copy of a slice of a dataframe’.

We discussed above that Pandas can either create a view or a copy when we are trying to access (get) a subset of an operation.

Let’s see if the operation we are trying to perform is on a view or a copy.

car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity']._is_view

# output
True
car_sales[car_sales['Manufacturer'] == 'Porsche']['Fuel_capacity']._is_copy

#output
<weakref at 0x7fe118b59b80; to 'DataFrame' at 0x7fe1187a14f0>

_is_view has returned True meaning it’s a view while _is_copy has returned a weakref meaning it’s a copy. Hence, output of the ‘get’ operation is ambiguous. It can be anything in the end. This is why ignoring SettingWithCopywarning is a bad idea. It can eventually lead to breaking something in your code when you least expect it.

The problem of chained assignment can be tackled easily by combining the back-to-back indexing operations into a single operation using .loc.

car_sales.loc[car_sales.Manufacturer == 'Porsche', 'Fuel_capacity'] = 18.0
car_sales[car_sales.Manufacturer == 'Porsche']['Fuel_capacity']

#output
124 18.0
125 18.0
126 18.0
Name: Fuel_capacity, dtype: float64

2. Hidden Chaining

The second most common reason that triggers this warning is Hidden Chaining. It can be tricky and hard to track down the source of this problem as it may span across your entire codebase.

Let’s look at a scenario for Hidden Chaining. We’ll go ahead and create a new dataframe containing all the ‘Chevrolet’ cars while bearing in mind to use .loc from our previous lesson.

chevrolet_cars = car_sales.loc[car_sales.Manufacturer == 'Chevrolet']
chevrolet_cars

We do some other operations for some time and play around with our code.

chevrolet_cars['Model'].value_counts()
....
# few lines of code
chevrolet_cars['Sales_in_thousands'].std()
....
chevrolet_cars['__year_resale_value'].max()
....
# few lines of code
chevrolet_cars.loc[20,'Price_in_thousands'] = 17.638

Boom! This warning again!!

There was no chained assignment in that last line of code but it still went ahead and triggered that warning. Let’s look at the values in our dataframe.

It has updated our value. So, should we go ahead and ignore the warning this time? Probably not.

There is no obvious chained assignment in this code. In reality, it can occur on one line or even across multiple lines of code. When we created the chevrolet_cars dataframe, we used a get operation. So there is no guarantee whether this returned a view or a copy. So, we might be trying to modify the original dataframe as well.

Identifying this problem can be very tedious in real codebases spanning thousands of lines but it is very simple to tackle this. When you want to create a new dataframe, explicitly make a copy using .copy() method. This will make it clear to Pandas that we are operating on a new dataframe.

chevrolet_cars = car_sales.loc[car_sales.Manufacturer == 'Chevrolet'].copy()
chevrolet_cars.loc[20,'Price_in_thousands'] = 17.638
chevrolet_cars.loc[20, 'Price_in_thousands']

#output
17.638

3. False Positives

False positive is a case when the warning is triggered when it’s not supposed to be. It’s very safe to ignore the warning in this case. Many of the scenarios which caused the “False Positive” warnings have been fixed in Pandas through the years. It’s discussed in the Pandas documentation if you want to take a look.

Let’s say we want only the cars with Vehicle_type as ‘Passenger’ and we would like create a new column which will be a boolean indicating whether the car is available or not.

car_sales = car_sales[car_sales['Vehicle_type'] == 'Passenger']
car_sales['In Stock'] = 'True'

#output
:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

If you look at the dataframe, it would have updated the new column. In this case, we are not bothered if it overwrites the original dataframe.

We can go ahead and suppress the warning by changing the default behavior as follows,

  • pd.set_option(‘mode.chained_assignment’, None) — suppresses and turn off the warning
  • pd.set_option(‘mode.chained_assignment’, ‘Warn’) — issues a warning
  • pd.set_option(‘mode.chained_assignment’, ‘Raise’) — raises an exception

Changing the behavior is not recommended unless you know what you are doing.

End notes

Most of the scenarios of the SettingWithCopy warning can be avoided by communicating to Pandas clearly. When you want to modify the original dataframe, use .loc or when you want a copy, specify it directly. This will not only prevent future warnings and errors, it will also make your codebase more robust for maintenance.

You can take a look at these issues on GitHub #5390 and #5597 for background discussion.

Thank you for reading all the way down here! Let me know in the comments if you have any feedback, criticism, or concerns. Have a good day!

--

--

Padhma Muniraj
Analytics Vidhya

100K+ Views | Data Analysis 🔬| Data Visualization 📊 | ML 🤖 | DL 🦾 | NLP 💬 | https://www.linkedin.com/in/padhma-sahithya-muniraj/