<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Nathan W. Doctor on Medium]]></title>
        <description><![CDATA[Stories by Nathan W. Doctor on Medium]]></description>
        <link>https://medium.com/@nathanwdoctor?source=rss-114984698fa5------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*oYBslbzadFPnzcHuwsfYZg.png</url>
            <title>Stories by Nathan W. Doctor on Medium</title>
            <link>https://medium.com/@nathanwdoctor?source=rss-114984698fa5------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Thu, 28 May 2026 12:17:11 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@nathanwdoctor/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Visualizing climate data for Ann Arbor, MI]]></title>
            <link>https://medium.com/@nathanwdoctor/visualizing-climate-data-for-ann-arbor-mi-65ecf4220d05?source=rss-114984698fa5------2</link>
            <guid isPermaLink="false">https://medium.com/p/65ecf4220d05</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[Nathan W. Doctor]]></dc:creator>
            <pubDate>Sun, 21 Jun 2020 00:59:36 GMT</pubDate>
            <atom:updated>2020-06-25T15:15:18.518Z</atom:updated>
            <content:encoded><![CDATA[<p>Using an NOAA dataset, we’ll write some python code which returns a line graph of the record high and record low temperatures for Ann Arbor, Michigan, for each day of the year over the period 2005–2014. Then we’ll overlay a scatter of the 2015 data for any points (highs and lows) for which the ten year (2005–2014) record high or record low was broken in 2015.</p><p>The data comes from a subset of The National Centers for Environmental Information (NCEI) <a href="https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt">Daily Global Historical Climatology Network</a> (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe.</p><p>The stations the data comes from are shown on the map below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-_ws1z9emuxys8es1z7HBg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Lzv161p7mvjO15YphfWFZA.png" /></figure><p>To start, let’s take a look at the data.</p><p>The <em>ID </em>column represents the station ID where the temperature was collected. The <em>Element</em> column represents whether the temperature recorded was a maximum or minimum temperature. The <em>Data_Value </em>column represents the temperature in tenths of a degree Celsius.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/887/1*tXmlSwLkDsnwLOFcijFy-w.png" /></figure><p>We’re working with a pretty clean dataframe here, but we’ll need to make some minor adjustments to get everything into a more suitable format.</p><p>The point of the exercise is to compare data from 2005–2014 to 2015. Since there aren’t any leap days in 2015, those will have to be removed. We’ll also separate the Year-Month-Day format to two columns with the Month-Day in one and the Year in another. Lastly, I’m American, so I like Fahrenheit. Data in tenths of degrees Celsius will be converted to whole degrees of Fahrenheit.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/877/1*LKvJrn22ZYuzQtLlopbUeg.png" /></figure><p>Next, let’s split the dataframe into two dataframes, one for data from 2005–2014 and one for data from 2015.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/882/1*f7JwKV103zizLhOcEP-cKg.png" /></figure><p>Looks good to me. Let’s just check to make sure.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/884/1*51gucbawoBcEWriOeXCQUQ.png" /><figcaption>The shapes of the split dataframe matches up with the cleaned version. And just to be extra cautious, I’ll just look to make sure the 2005–2014 dataframe and the 2015 dataframe each have all the right years.</figcaption></figure><p>Now, I’ll create one final dataframe for all the maximum and minimum temperatures.</p><ul><li>The <em>Max_05_14 </em>column will represent the <strong>maximum </strong>recorded temperature for each day (01–01, 01–02, etc.) <strong>between 2005–2014</strong>.</li><li>The <em>Min_05_14</em> column will represent the <strong>minimum </strong>recorded temperature for each day <strong>between 2005–2014</strong></li><li>The <em>Max_15 </em>column will represent <strong>maximum </strong>recorded temperature for each day for <strong>2015</strong></li><li>The <em>Min_15 </em>column will represent the <strong>minimum </strong>recorded temperature for each day for <strong>2015</strong></li><li>The <em>Max_15_Higher_Prev_10_Years </em>column will represent the <strong>maximum </strong>recorded temperature for each day for <strong>2015 but only if it is <em>higher </em>than the maximum temperature from 2005–2014</strong></li><li>The <em>Min_15_Lower_Prev_10_Years </em>column will represent the <strong>minimum </strong>recorded temperature for each day for <strong>2015 but only if it is <em>lower </em>than the minimum temperature from 2005–2014</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/884/1*oLThINjo6dsswWaSBPjgGg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/856/1*MvrpSbKn0tkMc7OEbYgapA.png" /></figure><p>The dataframe has 365 rows, each one representing a day of the year. So, the entry of <strong>60.08</strong> in row<strong> 01–01</strong><em> </em>in column <strong>Max_05_14</strong> means the highest temperature recorded on January 1 between 2005–2014, at any of the varying stations around Ann Arbor, was 60.08 °F.</p><p>The NaN entries signify that, in 2015, there was no recorded high or low that was higher or lower, respectively, than the corresponding entries for 2005–2014. The entry of 4.1 °F on January 5, 2015, is, indeed, lower than the lowest recorded temperature from 2005–2014 of 5.00 °F. Accordingly, that entry will be included.</p><p>Now, let’s get to the main point of this: plotting the data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*k4JwzZ1hpVNjXKxn-elfwA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/789/1*sFdR3b_7BLT__TtjGJlGYw.png" /></figure><p>At last, I’m able to visualize what we’re working with, but this still needs some work. First, the legend is a bit unnecessarily large and comes too close to the record low line. It may even be blocking some dots from the record lows for 2015. Second, while I understand exactly what the [0, 50, 100, 150, 200, 250, 300, 350] values on the x-axis represent, and the audience should be able to get it as well, this is certainly not ideal. The goal of a chart like this is for the viewer to be able to make sense of it in as little time as possible. The more they’re looking at the data and the less they’re reading, the better. Third, it may look nice to have the area between the two lines filled in. Why not try that out? Finally, there’s a lot of unnecessary lines and all the black on the chart is a bit too harsh. Let’s dampen that up a bit.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*I3-7K01PJCrIY10ZWtk-CQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*7OjQVmzpQVDXwRLGp0KAnQ.png" /></figure><p>And that’s it. Thanks for reading.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=65ecf4220d05" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Hypothesis Testing: Are university towns more resilient than non-university towns to recession?]]></title>
            <link>https://medium.com/analytics-vidhya/hypothesis-testing-are-university-towns-more-resilient-than-non-university-towns-to-recession-662286a914ed?source=rss-114984698fa5------2</link>
            <guid isPermaLink="false">https://medium.com/p/662286a914ed</guid>
            <category><![CDATA[economics]]></category>
            <category><![CDATA[housing-prices]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[statistics]]></category>
            <dc:creator><![CDATA[Nathan W. Doctor]]></dc:creator>
            <pubDate>Sat, 20 Jun 2020 05:21:09 GMT</pubDate>
            <atom:updated>2020-06-24T15:46:41.817Z</atom:updated>
            <content:encoded><![CDATA[<h3>Hypothesis Testing: Are university towns more resilient to recession than non-university towns?</h3><p>For an early project, I sought to use Python to examine if university towns are more resilient to economic downturn than non-university towns. More specifically, I asked <em>are the housing prices in university towns less effected by recession?</em></p><p>To start, a <em>university town</em> is a city which has a high percentage of university students compared to the total population of the city.</p><p>The hypothesis is that we can expect housing prices in such cities to be less effected by recession mostly because we should expect similar numbers of students, staff, and other workers connected to university life to live in such towns, regardless of the economic outlook.</p><p>To get a list of university towns, I simply used <a href="https://en.wikipedia.org/wiki/List_of_college_towns#College_towns_in_the_United_States">Wikipedia</a>, which maintains a list of college towns in the United States. For a spreadsheet on housing prices across the United States, I used <a href="https://www.zillow.com/research/data/">Zillow</a>, which included data from 1996–2020 in <a href="http://files.zillowstatic.com/research/public/City/City_Zhvi_AllHomes.csv">City_Zhvi_AllHomes.csv</a>. And lastly, I used the U.S. Department of Commerce, Bureau of Economic Analysis (BEA) to figure out when exactly the ‘Great Recession’ of 2007–2009 started and when the recession reached its bottom i.e. the quarter within the recession which had the lowest GDP. This was necessary because I sought to compare housing prices at the start of the recession to prices at the bottom.</p><p>To match the format of the list of university towns from Wikipedia to the list of all cities on Zillow, I would need to clean the text file derived from Wikipedia a bit.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5s-Mr6MOQdsdLd0fIjGUGA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HkkrdEVGRlfSzsdieGdezw.png" /><figcaption>Not the most elegant solution here, but at least it works..</figcaption></figure><p>Next, to get the start of the recession, let’s load data from the BEA and find the recession’s start. A <em>recession </em>is defined as starting with two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*92IKjbjr6oBWNRf4Bg-ezA.png" /><figcaption>As you can see, we’ll need to clean this dataframe a bit..</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*obE829B7hKqql3GD3UVHWw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Dp71iwNTyXQZHSz-GeQnDA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wDVsqI-uUD21Y8JrClEMvA.png" /></figure><p>Now, let’s convert the housing data from Zillow to quarters.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E4pR8B1eglRzeTl3ilaWtQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zaJKOA0zV0WckKKAzS4LaA.png" /></figure><p>Next, we’ll create new data showing the decline or growth of housing prices<br> between the recession start and the recession bottom.</p><p>And finally, we’ll run a <a href="https://en.wikipedia.org/wiki/Student%27s_t-test">t-test</a> comparing the university town values to the non-university towns values, return whether the alternative hypothesis (that the two groups are the same) is true or not, and the <a href="https://en.wikipedia.org/wiki/P-value">p-value</a> of our confidence.</p><p>The function will return the tuple (different, p, better) where different=True if the t-test is True at a p&lt;0.01 (we reject the null hypothesis), or different=False if otherwise (we cannot reject the null hypothesis). The variable p should be equal to the exact p value returned from scipy.stats.ttest_ind(). The value for better should be either “university town” or “non-university town” depending on which has a lower mean price ratio (which is equivilent to a reduced market loss)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2kOFFjp5DDvXr7tCDhCAag.png" /></figure><p>As we can see, there is a difference between the mean housing prices of university towns and non-university towns. As the p-value is less than .01, we can reject the null-hypothesis (that there is no significant different between university towns and non-university towns). In other words, we can see that there is a difference between university towns and non-university towns and that university towns are, indeed, less effected by recession.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=662286a914ed" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/hypothesis-testing-are-university-towns-more-resilient-than-non-university-towns-to-recession-662286a914ed">Hypothesis Testing: Are university towns more resilient than non-university towns to recession?</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>