Weather Source on the Snowflake Data Marketplace

Predicting the future starts with understanding the past and this is no different when working with weather data. Just think how much weather affects agriculture, construction, transportation, and even consumer behavior. Let’s focus on Weather Source, one of the premium weather data providers in the Data Marketplace.

Felipe Hoffa
Oct 5 · 4 min read

You can use Snowflake to find correlations between your past data and historical weather data, and also to find a source of weather predictions into the future. I already shared how to access NOAA weather data for free in Snowflake — and today I’m going to focus on Weather Source, one of the premium weather data providers in the Data Marketplace.

Watch on Youtube

With a premium global weather data provider like Weather Source, you can find richer and more localized data and with a deep historical database going back 20 years and into the future with a 42 day forecast of future predictions. Weather Source has all of the data you need to get started creating actionable business intelligence. They even have historical forecast data in a format that is perfect for back-testing.

To get you started, Weather Source publishes a free dataset covering 1,000 zip codes around the US, and 6 world major cities. This sample dataset includes past, present, forecast and climatology weather data in daily format.

For example, these are the average temperatures for each zip code — with some of the hottest in Arizona, Texas, and Florida:

Average temperature over 2 years, for each zip code in the Weather Source sample dataset.

Or we can look at the places with the most wind, and you can see there’s a lot of wind in the middle of the country — like in Oklahoma, Nebraska, and Kansas. Dorothy would know:

Average wind over 2 years, for each zip code in the Weather Source sample dataset.

Another interesting finding is the correlation between solar radiation and cloud cover. On the bottom right, you can see the places with the most solar radiation and the least cloudy days, like some in California and Nevada.

And on the top left you can see the city that is covered most frequently with clouds, and gets the least amount of sunlight. I wonder why they called it “Grayland”:

Average solar radiation vs cloud cover, for each zip code in the Weather Source sample dataset. Grayland, WA is the cloudiest one.

All of this was the average numbers for 2 years of daily observations on the free dataset, but if you go premium you can go deeper, and get hour by hour observations.

As Weather Source says, they offer a rich, statistically consistent dataset with a deep historical database that is homogenized with the forecast and climatology data makes it easy for Snowflake users to perform regression analyses — to quantify the impact of weather using a historical time-series and then create a predictive models using the forecast data.

This will allow you to predict and make better decisions for your business, once you figure out the correlations between weather and the metrics you care about.

How-to

Naming each zip code

The sample dataset has 1,000 zip codes — but it doesn’t have a name for each. To solve this you can load GeoNames into Snowflake:

select *
from temp.public.geonames_us_zip
-- http://download.geonames.org/export/zip/

Then it’s easy to join this table with Weather Source’s data. For example, this query finds the averages plotted earlier and adds a name to each zip code:

select
postal_code
, (
select max(place||', '||admin_name)
from geonames_us_zip
where postal_code=postal
) city
, avg(avg_temperature_air_2m_f) temp
, avg(avg_radiation_solar_total_wpm2) solar_radiation
, avg(avg_cloud_cover_tot_pct) cloud_cover
, avg(avg_wind_speed_10m_mph) wind
, min(date_valid_std) since
, max(date_valid_std) until
from weathersource_tile_sample.standard_tile.history_day
where country = 'US'
group by 1

Predictions for each zip code

This query includes a bit to write “today + X”, so results show that these are predictions:

select postal_code zip
, 'today + ' || (date_valid_std - current_date()) date
, avg_temperature_air_2m_f temp, avg_pressure_2m_mb press
, avg_wind_speed_10m_mph wind, avg_cloud_cover_tot_pct clouds
, avg_radiation_solar_total_wpm2 solar
from WEATHERSOURCE_TILE_SAMPLE.STANDARD_TILE.FORECAST_DAY
where postal_code = '94164'
order by date_valid_std;
Weather predictions for an arbitrary zip code.

Next steps

Check out Weather Source in the Snowflake Data Marketplace — the free dataset is ready for you to query, and is updated daily.

Want more?

I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter and LinkedIn. Check reddit.com/r/snowflake for the most interesting Snowflake news.

Snowflake

Articles for engineers, by engineers.