Precipitation in Seattle Over Time

When examining a set of data that requires attention to measurements over time, it is important to examine trends and patterns that are present in the data. For this analysis tutorial, I have chosen to examine a dataset about precipitation in Seattle. This data is interesting to examine because it is well known that it is common for rainfall to occur in Seattle. This dataset was found on Kaggle and a link can be seen here: https://www.kaggle.com/rtatman/did-it-rain-in-seattle-19482017

Now, the first step when examining a dataset is to load in the file and open up the dataset. That step would look something like this:

Once the data is loaded, the next step is to decide what we want to examine in the data. In this example, I want to see the average precipitation per month in Seattle since 1948. The next step would be to get the month from the date column and create a new column that will be able to numerically categorize the months. It should look something like this:

Once that is accomplished, you can then use Timestamp and Timedelta to create a new variable that will start at the first date in the data, which would be January 1, 1948 for this dataset. An example of this can be seen below:

After this step, you want to create a series containing the data points that you desire to examine. In the Seattle dataset, we want to examine the average precipitation, so we will use that for the value, and date for the index. Here is what this should look like:

Now, once this is done, you can begin to look at the data with different visualization techniques. As stated previously, we want to look at the average precipitation in Seattle, so for our first visualization we will look at the “rolling mean” of the precipitation from 1948–2017.

As you can see, this graph has plotted out the mean precipitation of our dataset for the months since 1948. It is clear that there is much variance in precipitation levels throughout history. We also notice a large spike around the year 2000, which is something that we could target in a closer examination of this data. It is also important to see the average plotted out because we get a sense of what is considered the norm for a day of precipitation. Based on this graph alone, it would appear that there is on average around at least 0.1 inches in precipitation. Although this graph did show us a few key points, it is important to look closer at the data to understand its trends and patterns. The next thing we will do is take a closer look at the monthly average of precipitation. Here is the visualization:

This model is tells us much more about the dataset. Here we can quickly and easily see that the month with the highest average precipitation is November, while the lowest average precipitation is in July. This is not at all surprising because we already knew that July is most likely the hottest month of the year, and November has a very high chance of experiencing rain and snow not only occurring but at higher rates. Another point that we can take away from this graph is that from the start of September until January are the most likely times to experience the most amount of precipitation. Viewing this historical data could be helpful for many other ideas of exploration. One way that this data could be used is to predict precipitation averages for the next 10 years. This can be done using code such as this to set up the dataframe for predictions:

Once that code is able to run, the next step would be to actually make the predictions and create a visual to display the findings. This would be helpful to examine trends to see if they continue as well as to predict something such as a huge storm or even a natural disaster. Based on this data, I would say that it is safe to say the same months discovered earlier with the most precipitation, will continue to have the most precipitation. With that being said, I also believe that it will remain common for precipitation to occur in Seattle but nothing like the spike seen in the data visualization above. It is vital to examine historical data such as this because we are able to learn so much about the past, while actively thinking about and predicting the future.

--

--