Scatter Plots: What Are They and Why Your Business Needs Them

Mokkup.ai
Microsoft Power BI
Published in
5 min readMay 24, 2024
Scatter Plots: What Are They and Why Your Business Needs Them

Scatter plots depict the relationship between two variables. They are especially useful when the variables are non-categorical and numeric.

Where Are They Used?

Variable Dependency

The correlation between the two variables generally depicts how the variables are affected by the change in either of them. However, in reality, one of those variables is independent, generally depicted on the X-Axis, while the one on the Y-Axis is a dependent variable, which is affected by the independent variable.

Causality Between the Variables

The causal relationship between the variables on a scatter graph is better than on a line diagram because it establishes disparate and distinct links between the said variables. In a line diagram, the inherent logic is implied even in between the plotted points because they are connected with a line.

Look at this blog to understand what we mean. However, since no such relationship is being established in a scatter plot, we can establish the relationship between variables across a plot spectrum independent of however the intermediate relationship is, on the graph.

Identifying Pan-Graph Relationships

Since each point on the plot denotes one data point, it is easy to establish a relationship between all the data points without establishing causality between all those data points. Since these plots make identifying relationships throughout the data set easy, scatter graphs are used when data sets with dependent and independent variables are positive, negative, weak, strong, linear, or non-linear.

Types of Scatter Plots

The grouping of the points on the graph follows different patterns for different data sets. As a result, each plotting tells a different story. So, users generally divide the data points into groups or trends based on how the points cluster together. These groupings have the additional benefit of identifying where the points are not grouping but should be doing so.

The identified gaps help in understanding outliers or where enterprise resources need to be redirected. This is useful when developing customer personas, dividing people into segments, or dividing the target base into geographic or demographic segments.

There are three attributes to scatter plots, combinations of which are evident in every single one

  • Strong or Weak
  • Linear or Non-Linear
  • Positive or Negative

So, scatter plots have combinations like strong, linear, and positive or strong, non-linear, and negative.

Strong or Weak

Strong grouping represents a scatter graph where all the points are closely grouped. This represents a strong pattern among all the data points. This states that the organization is focusing their efforts in the right direction because the elements represent close relations. On the other hand, weak grouping represents a scatter plot where the data points are all over the place. While there isn’t a right or wrong graph, weak grouping represents that organizations need to cover a wide berth when addressing insights from the graph. There are moderate groupings as well. However, because of the nature of ‘moderate’, it is generally avoided, and those groupings are classified as strong or weak.

Linear or Non-Linear

Linear relationships represent the grouping progressing in a straight line, either upwards or downwards. Every other shape of the scatter plot grouping signifies non-linear relationships. Once again, there isn’t a right or wrong graph. Some organizations might look for non-linear relationships between the variables, while others might look for linear relationships.

Positive or Negative

Positive grouping represents an upward movement in the plotting, starting from origin. Negative grouping represents destination movement towards the X-Axis. Positive movement generally represents a positive relationship between the variables, where both variables increase simultaneously with an increase in the other, or vice versa. Negative movement represents a negative relationship, where an increase in one variable leads to a decrease in the other variable. Unlike strong or weak variables where there’s a midpoint, positive or negative plotting doesn’t have a midpoint.

Common Errors When Graphing Scatter Plots

Overplotting

When there are multiple data points to plot, overplotting becomes a possibility. For scatter plots that exhibit strong grouping, excessive data points mean that the plot becomes too dense to give any significant insights. Since making the dots small enough to accommodate multiple datasets is statistically impossible, any analysis becomes redundant with such overplotting.

One common alternative to tackle this is to sample elements on the graph. The pattern from a small sample should normally be representative of the entire plotting. While this won’t be true for data sets like “forest cover across the world in sq. km.”, it will work for sets that track increased progression over time.

Heat maps are alternate mapping options to counter this issue.

Causality as Correlation

While this isn’t an issue with the plotting itself, this is an inherent problem in analyzing a scatter plot. Just because something follows a pattern does not mean they are related. The cause of the pattern does not equal correlation between the variables in that pattern.

For example, over time, the number of people flying in airplanes has increased, as has worker productivity. When we plot that using a scatter graph, assuming linkage, both show a linear and positive movement on the graph. However, the underlying cause for both of them might be different. It’s possible to interpret this as a correlation, which might show incorrect results when making decisions.

Dependency-Independency Relationship Between the Variables

It is easy to confuse dependent and independent variables when plotting a scatter graph. Sometimes the variables might not even be connected. For example, if you were to track the amount of green space in a city and violent crimes committed, you would definitely arrive at a pattern. However, it’s meaningless because those two variables are not connected unless the green spaces are used as lairs for the violent crimes.

If enterprises don’t examine the variables properly and assume that one variable is dependent and the other independent, the resultant analysis could be meaningless and won’t really say anything about the business.

Conclusion

Scatter charts are valuable tools for effective data visualization. However, because of the vastly similar nature of graphs, enterprises tend to use them interchangeably for rudimentary purposes. If used correctly, scatter plots can give valuable insights where other types of graphs fail.

Don’t forget to subscribe to

👉 Power BI Publication

👉 Power BI Newsletter

and join our Power BI community

👉 Power BI Masterclass

--

--

Mokkup.ai
Microsoft Power BI

Mokkup.ai is a dashboard wireframing tool that helps you create mock dashboard wireframes in less than 30 minutes.