Investigating Ocean Temperatures with Mathematica
Mathematica is a wide encompassing program designed for a huge variety of technical computing. While many of its functionalities can be replicated using the open source scientific python stack, Mathematica’s tight integration and consistency across domains provides an elegant platform for investigative data analysis. To demonstrate its flexibility, lets take a look at some ocean temperature data provided online by NOAA.
Importing Data
NOAA publishes historical ocean temperature data for the past year here: https://www.nodc.noaa.gov/dsdt/cwtg/all_meanT.html. This is a webpage the includes a number of tables for different coastal regions throught the United States. Lets import it, and parse the table data:
oceanTemperatureURL =
"https://www.nodc.noaa.gov/dsdt/cwtg/all_meanT.html";data = Import[oceanTemperatureURL, "FullData"];
The Import
function grabs all list and table elements from the webpage, so we need to find the location of the tables we are interested in. All of our tables contain a header row whose first cell has the label “Location” — we’ll use that to find the tables we are interested in. Since Position[data, "Location"]
returns the location of the cell itself, we need to go up a few levels to grab the actual table, that’s the what the Drop
function does.
regionTables =
Extract[data, Drop[#, -3] & /@ Position[data, "Location"]];
regionTables
is a list of tables, with each table still including its header. We only need one header for our final Dataset:
header = regionTables[[1, 1, 1]];
We then merge all of the tables, while first dropping the header row for each:
dataset = Dataset[Flatten[Drop[#, 1] & /@ regionTables, 1]]
Finally we can rename the columns in our Dataset with the header row we extracted before:
dataset = dataset[All, AssociationThread[header -> Range[Length[header]]]]
We now have a Dataset object:
Processing and Cleaning the Data
Our next step is to process the data. Currently the Location column is just a text string of the city and state. We’ll use the builtin CityData
to convert to latitude-longitude coordinates. CityData
expects input in the form of a list as {"city", "state"}
, so we will need to split our current location strings to conform to that. Unfortunately the locations given do not have consistent comma placement, for instance both “Montauk, NY” and “Kings Point NY” appear, so we cannot just split on comma, instead we will completely remove commas from string, then use a regular expression with a lookahead to identify the space before the state to split on. I’ve combined these operations into a function ParseCity
:
ParseCity[cityString_] :=
StringSplit[
StringReplace[cityString, "," -> ""],
RegularExpression[" (?=\\w{2}\\z)"]
]
Now we use ParseCity
and CityData
to add a new column to our dataset:
dataset =
dataset[All, <|#,
"Coordinates" ->
CityData[ParseCity[#Location], "Coordinates"]|> &];
Our final operation is to remove missing data. Some of the months have missing recordings for certain stations, and a few of the cities could not be found using CityData
.
dataset = DeleteMissing[dataset, 1, 1];
We’ll separate this dataset into stations for both the East and West Coasts:
westCoastData = dataset[Select[-126 < #Coordinates[[2]] < -117 &]];
eastCoastData = dataset[Select[-81.5 < #Coordinates[[2]] < -66 &]];
usData = Join[eastCoastData, westCoastData];
Visualizations
Now that we have our data nicely imported, formatted, and cleaned, we can create some visualizations!
First, lets just plot water temperature as a function of latitude for both the East and West Coasts. We’ll wrap our ListPlot
in a Manipulate
to easily change time periods:
monthKeys =
Normal@Select[Keys@First@westCoastData,
StringLength[First[StringSplit[#]]] === 3 &];Manipulate[
ListPlot[
{
westCoastData[All, {#Coordinates[[1]], #[month]} &],
eastCoastData[All, {#Coordinates[[1]], #[month]} &]
},
FrameLabel -> {"Latitude", "Temperature (F)"},
PlotLegends -> {"West Coast", "East Coast"},
PlotRange -> {Full, {0, 100}},
PlotLabel ->
"Average Water Temperature " ~~ Capitalize[ToLowerCase[month]]],
{month, monthKeys}
]
monthKeys
just grabs the column names whose first word is 3 letters long, so it matches columns like “JAN” and “AUG 1–15”. We pipe that into Manipulate
, which gives us a nice select box to adjust the time period of our data:
We immediately notice two things: (i) East Coast water temperatures are generally warmer than their West Coast equivalents, and (ii) there appear to be two patterns driving East Coast temperatures. You can see that at roughly 39 degrees latitude the slope of the East Coast stations becomes much steeper. Sure enough, 39 degrees latitude is roughly where the Gulf Stream veers away from land and heads eastward across the Atlantic.
Lets adjust the plot to show the coldest water temperatures, in winter. Now the West Coast is warmer, and the East Coast is colder, indicating a generally more temperate climate on the West Coast. We can show this explicitly by plotting a histogram of the yearly temperature swings for each station:
SmoothHistogram[
{
Normal@westCoastData[All, #["AUG 1-15"] - #["FEB"] &],
Normal@eastCoastData[All, #["AUG 1-15"] - #["FEB"] &]
},
5,
Filling -> Bottom,
PlotLegends -> {"West Coast", "East Coast"},
PlotLabel -> "Change in Temperature Winter to Summer",
FrameLabel -> {"Temperature Change (˚F)", "PDF"}
]
Finally, we can plot the original temperature data directly on a map for explicit visualization:
Manipulate[
data = usData[All, {GeoPosition[#Coordinates], #[month]} &];
Row[{
GeoBubbleChart[
data,
ColorFunction -> "Rainbow",
ImageSize -> Large,
PerformanceGoal -> "Speed",
GeoGridLines -> Automatic,
PlotLabel -> month
],
BarLegend[{"Rainbow", MinMax[data[All, 2]]}, LegendLabel -> "˚F"]
}],
{month, monthKeys}
]
We once again wrap our graphics calls in a Manipulate
so we can interactively change months. We use GeoBubbleChart
to plot the data, and also append a BarLegend
to the graphics.
In conclusion I’ve demonstrated how to:
- Import data from a website into Mathematica.
- Generate a variety of plots to interactively visualize this data, such as histograms, scatter plots, and even plot the data on a map.
This is a fairly simple example, but it shows Mathematica’s ability to quickly analyze data using a variety of methods and visualizations.