Setting with copy warning pandas: short story

lu edward
3 min readJan 8, 2019

--

If you are new to pandas, you can be very confused when you first see this warning:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame 

The origin of the warning is called Chained Assignment. What it means is that you are chaining two indexing method together while trying to set a value. Now pandas may change the value for one or None or all of the subset generated.

The solution is simple: convert multiple chaining actions into one using the .loc/.iloc methods in pandas.

Explicit Chaining

Let me give you an example. Suppose dataframe df has two columns [“Volume”, “Price”].

You want to set all the Price for when Volume > 100 to be 200 dollars. If you wrote it like

 df[df.Volume> 100][‘Price’] = 200

you will get a Setting-with-Copy warning.

But if you wrote it into one single indexing operation like it below, pandas will not throw such a warning.

 df.iloc[df.Volume>100, “Price”] = 100 

Implicit Chaining

This happens when you implicitly chained two indexing operations while trying to set a value.

Imagine you want to analyze a subset of the df dataframe above. You are only interested in when the Price is higher than 200 dollars.

df_high_price = df[df.Price>100] 

Then you might go on to do some analysis on the df_high_price dataframe. After some exploration, you found out that there are some Volumes data that are negative, which is not possible. You decide that all negative values in the Volume column should just be set to 0. You learned the lesson about Chaining assignment from above so that you wrote the code like this.

 df_high_price.iloc[df_high_price.Volume < 0,"Volume"] = 0

Then the above code will throw a SettingWithCopy warning. But why? How can this happen?

The origin of the problem is that the chain assignment happens at two separate lines.

df_high_price = df[df.Price>100]df_high_price.iloc[df_high_price.Volume < 0,"Volume"] = 0

The solution is to always force a copy of the df_high_price and then set the values for the subset.

df_high_price = df[df.Price>100].copy()df_high_price.iloc[df_high_price.Volume < 0,"Volume"] = 0

What if you want to make the change to the original dataframe? This is how you do it by combining the index outside the loc method.

high_price_index = (df.Price > 200)zero_volume_index = (df.Volume < 0)combined_index = high_price_index & zero_volume_indexdf.loc[combined_index ,’Volume’] = 0

Another new approach that came out after 2016 is the query method, which makes the code much more readable and shorted to write. If you use a lot of pandas for data analysis, you should definitely check it out.

Conclusion

All the above is my understanding of SettingWithCopy warning and how to avoid it.

If you want a more in-depth discussion of it, you can go to this blog for more details.

TL;DR

Avoid Chaining indexing whether it is implicit or explicit.

--

--