95% Confidence Interval In Snowflake

Soonmo Seong
Cloud Villains
Published in
3 min readJan 14, 2024

One of the most important concepts in statistics meets one of the most emerging technologies in this blog.

We encounter confidence intervals whenever every election comes to us. That is, our political decisions like presidential election are predicted or impacted by confidence intervals, meaning we should understand this statistical concept correctly for our future.

Snowflake is becoming an easy and fast tool for data handlers. So let’s implement the confidence interval in snowflake. We only use SQL as SQL is the main language in Snowflake.

We are going to use global weather dataset from Weather Source, LLC. Everyone can use this dataset from Snowflake Marketplace. This is free to use.

Let’s compute a confidence interval of Manhattan’s temperature from Jan 14th, 2022 to Jan 13th, 2024.

First, this is the formula of confidence interval. Given 95% confidence, confidence level value is 1.96. And, the same size is 730, which is 2 years.

use database global_weather__climate_data_for_bi;
use schema standard_tile;

select
round(avg(avg_temperature_air_2m_f), 3) average_temperature,
round(stddev(avg_temperature_air_2m_f), 3) std_temperature,
round(average_temperature + (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_upper,
round(average_temperature - (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_lower,
min(date_valid_std) first_day,
max(date_valid_std) last_day
from
history_day
where
postal_code like '10060'
;

Secondly, this is the SQL code that gives mean, standard deviation, and confidence interval as below. You can also see the first day and last day of this dataset.

Thirdly, let’s interpret that. The confidence interval is (56.088, 58.428). This means that we are 95 % confident that population mean is between 56.088 and 58.428. Therefore, we can estimate that last 2 years’ average temperature in NYC is from 56.088℉(13.382℃) to 58.428℉(14.682℃) under 95% confidence.

If we plan on a trip to Manhattan, we should remind this confidence interval for preparing the trip.

We implemented confidence interval using SQL in Snowflake and interpreted this important statistical concept.

--

--