# Improving the scatter plot

The scatter plot is ubiquitous, and deservedly so. But a simple scatter plot:

has far too little information. The enhanced scatter plot I showed in the first figure adds several things:

- A regression line: That’s the straight line
- A loess smoothed line: That’s the sort of wavy line. It’s a nonparametric smoother of the data. The fact that it doesn’t deviate much from the straight line is an indicator that linear regression is fairly appropriate here.
- Labels for each point. Here, they are states and the variables are unemployment and infant mortality.
- Kernel density plots for each variable, a good univariate graphic.
- A confidence ellipse, letting you see that DC and Mississippi are bivriate outliers.

I created the fancy version in SAS. My code was:

PROC IMPORT OUT= WORK.UnempIM

DATAFILE= "C:\personal\Graphics\UnEmpChildMort.csv"

DBMS=CSV REPLACE;

GETNAMES=YES;

DATAROW=2;

RUN;

which gets the data.

proc template;

define statgraph scatdens2;

begingraph; *BEGIN DEFINING THE GRAPH;

entrytitle "Scatter plot with density plots";

*CREATE A TITLE;

layout lattice/columns = 2 rows = 2

columnweights = (.8 .2) rowweights = (.8 .2)

columndatarange = union rowdatarange = union;

*LAYOUT LATTICE...SETS UP A GRID OF GRAPHS;

*COLUMNWEIGHTS AND ROWWEIGHTS SETS

THE RELATIVE SIZE OF THE INDIVIDUAL

COLUMNS AND ROWS;

columnaxes;

columnaxis /label = 'Unemployment (%)'

griddisplay = on;

columnaxis /label = '' griddisplay = on;

endcolumnaxes;

*COLUMNAXES SETS PARTICULAR

CHARACTERISTICS OF COLUMNS;

*THE SECOND ONE HAS NO LABEL (NONE WOULD FIT);

rowaxes;

rowaxis /label = 'Infant Mortality (per XXX)'

griddisplay = on;

rowaxis /label = '' griddisplay = on;

endrowaxes;

layout overlay; *STARTS THE ACTUAL GRAPHING OF DOTS ETC;

scatterplot x = unemployment y = infantmortality/datalabel = stateab;

*GRAPHS THE DOTS;

loessplot x = unemployment y = infantmortality;

loessplot x = unemployment

y = infantmortality/smooth = 1;

ellipse x = unemployment y = infantmortality

/type = predicted;

entry "Prediction ellipse (" {unicode alpha}"=.05)"/autoalign = auto textattrs = (color = red);

endlayout;

densityplot infantmortality/orient = horizontal kernel();

densityplot unemployment / kernel ();

endlayout;

endgraph;

end;

run;

which sets up a template for use by the graph template language (the stuff after a * is commentary on what that part of the code does. For more, see my paper on scatter plots.

and

options nodate nonumber ;

title;

title2;

ods pdf file = "c:\personal\presentations\SASGF14\scatterdens.pdf";

proc sgrender data = UnempIM template = scatdens2;

*NOW WE RENDER THE TEMPLATE WE CREATED;

run;

ods pdf close;

which makes a plot.

Of course, you might want other information in some cases. If either variable was discrete, you might add histograms instead of density plots. You might want to color code the points (e.g. I could have color coded for region of the country. You might want several different loess lines; or, if there were discontinuities, you might want to use wavelets to generate the line.