<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Felipe Hoffa on Medium]]></title>
        <description><![CDATA[Stories by Felipe Hoffa on Medium]]></description>
        <link>https://medium.com/@hoffa?source=rss-279fe54c149a------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*ahXIMiIgudZTyqJS.jpeg</url>
            <title>Stories by Felipe Hoffa on Medium</title>
            <link>https://medium.com/@hoffa?source=rss-279fe54c149a------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 24 Jul 2021 02:26:44 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@hoffa/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Measuring Customer Engagement with Braze and Snowflake]]></title>
            <link>https://medium.com/snowflake/measuring-customer-engagement-with-braze-and-snowflake-49c37b432db4?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/49c37b432db4</guid>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[customer-engagement]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[open-data]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Wed, 14 Jul 2021 22:58:54 GMT</pubDate>
            <atom:updated>2021-07-16T02:09:06.101Z</atom:updated>
            <content:encoded><![CDATA[<h4>The Braze Engagement Benchmarks give Snowflake users access to industry-by-industry data on message engagement, app retention, user acquisition, and purchasing behavior, updated daily. All data in Benchmarks are anonymized and aggregated. The data are pulled from their customer base of over 1,000 global brands across 14 major industries and encompass the past year from the current date. Find here how to query them.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*QWg_XsOElefCuXJF" /><figcaption>Photo by <a href="https://unsplash.com/@polarmermaid?utm_source=medium&amp;utm_medium=referral">Anne Nygård</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fyxe5M5Ipj0s&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dyxe5M5Ipj0s&amp;image=http%3A%2F%2Fi.ytimg.com%2Fvi%2Fyxe5M5Ipj0s%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/ab57589f833a0255bbb87856f14b4839/href">https://medium.com/media/ab57589f833a0255bbb87856f14b4839/href</a></iframe><p>Let’s say you have a health and fitness app, and you’re wondering: What’s the best day to send notifications to your users? It’s Monday:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/462/0*lBXBkmMs1IXgehgE.png" /><figcaption>Engagement to notifications for health &amp; fitness apps on iOS. Source: Braze</figcaption></figure><p>As the chart shows, Monday is when 7% of users of health apps open their notifications.</p><p>Now, what about gaming apps? The best day to send notifications is Sunday:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/486/0*PQeMIxCCc8Il1HA4.png" /><figcaption>Engagement to notifications for gaming apps on iOS. Source: Braze</figcaption></figure><p>But that’s only on iOS. Android users are most likely to open gaming related notifications on Wednesdays:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/487/0*OWY5K5HMa1j0iTIc.png" /><figcaption>Engagement to notifications for gaming apps on Android. Source: Braze</figcaption></figure><p>This data comes out of hundreds of companies that use <a href="https://www.braze.com/">Braze</a> as their customer engagement platform, and you can use the data as a benchmark for your own applications. You can find the data in<a href="https://www.braze.com/resources/benchmarks"> Braze’s public dashboard powered by Snowflake</a>, or even better, this data is ready for you to<a href="https://www.snowflake.com/datasets/braze-engagement-benchmarks/"> query on the Snowflake Data Marketplace</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/753/0*YMuAvD5YqW5sBvyc.png" /><figcaption>Braze engagement benchmarks in the Snowflake Data Marketplace</figcaption></figure><p>This is how I learned that people finally started reengaging with travel related apps in March — after a year with not much travel:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/0*ZwpEw-d8QSMhDB9j.png" /><figcaption>Engagement to notifications for travel apps on iOS. Source: Braze</figcaption></figure><p>Or that in retail and e-commerce, the first purchase happens in 26 days, while the second one takes 43:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/636/0*5Viuh3GeAlFP9OnF.png" /><figcaption>Average # of days to first and second purchase by industry. Source: Braze</figcaption></figure><p>You don’t need to be a Braze customer to access all this data. It’s all available and updated for free in <a href="https://www.snowflake.com/">the Data Cloud</a>. Just start writing your SQL queries, like this one to check the purchase propensity by industry:</p><pre>// Pull percent of buyers and repeat buyers for each industry included in Benchmarks<br>SELECT<br> industry,<br> COALESCE(SUM(pct_first_time_buyers), 0) AS pct_first_time_buyers,<br> COALESCE(SUM(pct_repeat_buyers), 0) AS pct_repeat_buyers<br>FROM BENCHMARKS_AVG_TIME_TO_PURCHASE<br>WHERE industry!= &#39;All Industries&#39;<br>GROUP BY 1<br>ORDER BY 2 DESC<br>;</pre><p>And if you are a Braze customer, then you can also get <a href="https://www.snowflake.com/datasets/braze-raw-events/">your own data from Braze shared automatically with your Snowflake account</a>.</p><p>Are you ready to query? You will find the Braze benchmarks for 14 different industry segments:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/529/0*xNyqCMavumdEYScl.png" /><figcaption>Propensity to first purchase by industry. Source: Braze</figcaption></figure><p>If you are working to reengage customers, this is the data you want.</p><h4>Learn more</h4><ul><li>Check out <a href="https://www.snowflake.com/podcast/interview-with-jon-hyman/">our podcast with Braze</a> to see why they trust Snowflake to handle their data from billions of monthly active users</li><li>Read “<a href="https://www.braze.com/resources/articles/how-braze-built-support-for-snowflake-data-sharing">How Braze Built Support for Snowflake Data Sharing</a>”.</li></ul><h3>Want more?</h3><ul><li>Try this out with a <a href="http://bit.ly/sf-free-trial">Snowflake free trial</a> account — you only need an email address to get started.</li><li>Find plenty more datasets in the <a href="https://www.snowflake.com/data-marketplace/">Snowflake Data Marketplace</a>.</li></ul><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a> and <a href="https://www.linkedin.com/in/hoffa/">LinkedIn</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=49c37b432db4" width="1" height="1" alt=""><hr><p><a href="https://medium.com/snowflake/measuring-customer-engagement-with-braze-and-snowflake-49c37b432db4">Measuring Customer Engagement with Braze and Snowflake</a> was originally published in <a href="https://medium.com/snowflake">Snowflake</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[NOAA weather data in Snowflake (free)]]></title>
            <link>https://towardsdatascience.com/noaa-weather-data-in-snowflake-free-20e90ee916ed?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/20e90ee916ed</guid>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[open-data]]></category>
            <category><![CDATA[weather]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Tue, 22 Jun 2021 16:45:01 GMT</pubDate>
            <atom:updated>2021-07-14T20:11:13.347Z</atom:updated>
            <content:encoded><![CDATA[<h4>NOAA GSOD’s daily worldwide weather data is updated daily in Snowflake, and in this post we’ll make it even more useful. Check inside for pivots, geo-joins, finding the closest station to each city, and pattern matching with MATCH_RECOGNIZE().</h4><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FBJhY-CL6zjA%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DBJhY-CL6zjA&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FBJhY-CL6zjA%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/36dab5d7c95bbcb76bd38dd44fe02757/href">https://medium.com/media/36dab5d7c95bbcb76bd38dd44fe02757/href</a></iframe><h3>The source</h3><p>To access the daily NOAA GSOD weather data, just go to the Marketplace and <a href="https://docs.snowflake.com/en/user-guide/data-marketplace-intro.html#label-standard-listing">create a database</a> with <a href="https://www.snowflake.com/datasets/knoema-environment-data-atlas/">Knoema’s Environment Data Atlas</a> in your account.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/851/1*BgQlKbRbt0I2gmWKRecv9A.png" /><figcaption><a href="https://www.snowflake.com/datasets/knoema-environment-data-atlas/">Knoema’s Environment Data Atlas</a> in Snowflake</figcaption></figure><p>With a couple of exploratory queries you’ll notice that:</p><ul><li>Having this data automatically refreshed in your account is cool!</li><li>Making this table useful is not that easy, because:</li><li>Stations have a lat, lon — but not a way to tell to which city or zip-code they belong.</li><li>The original NOAA rows for each day have been split into multiple rows for each day — with each row containing only one value for each different measure that a station can produce.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ygaPVZf23OKOiZTd9kfsAQ.png" /><figcaption>NOAA GSOD transformed by Knoema: One row for each measure</figcaption></figure><p>So let’s transform this data here, to make it easier to incorporate in your data pipelines.</p><h3>Pivot multiple rows into one</h3><p>This was easy with Snowflake. The following query will take the multiple rows per day per station, and transform them into only one:</p><pre>--create or replace table noaa_gsod<br>--cluster by (station_id, date)<br>--as (<br>    select *<br>    from (<br>        select &quot;Stations&quot;, &quot;Date&quot;, &quot;Stations Name&quot;, &quot;Country&quot;, &quot;Indicator Name&quot;, &quot;Value&quot;<br>        from KNOEMA_ENVIRONMENT_DATA_ATLAS.ENVIRONMENT.NOAACD2019R<br>        where &quot;Date&quot;&gt;=&#39;2020-01-01&#39;<br>        and &quot;Measure&quot;=&#39;M1&#39;<br>        -- and &quot;Country&quot;=&#39;US&#39;<br>    )<br>    pivot(max(&quot;Value&quot;) for &quot;Indicator Name&quot; in (&#39;Mean visibility (miles)&#39;,&#39;Maximum temperature (Fahrenheit)&#39;,&#39;Mean dew point (Fahrenheit)&#39;,&#39;Maximum wind gust (Number)&#39;,&#39;Minimum temperature (Fahrenheit)&#39;,&#39;Maximum sustained wind speed (knots)&#39;,&#39;Mean wind speed (knots)&#39;,&#39;Mean station pressure (millibars)&#39;,&#39;Precipitation amount (inches)&#39;,&#39;Mean temperature (Fahrenheit)&#39;,&#39;Mean sea level pressure (millibars)&#39;,&#39;Snow depth (inches)&#39;))<br>    as p(station_id, date, name, country_fips, visibility, max, dew, wind_max, min, wind_sustained_max, wind_mean, pressure, rain, temp, pressure_sea, snow_depth)<br>--)<br>;</pre><p>Since Snowflake supports native pivots, we were able to merge multiple rows into only one, creating new columns and renaming them for easier usage:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TelQdgq_KVhIEcexk7Jaqg.png" /><figcaption>Pivoted NOAA GSOD by Knoema back into a useful wide table</figcaption></figure><h3>Find cities</h3><p>Now, how do we find the city closest to each station? Or the zip code?</p><p>Let’s start by finding a list of worldwide cities. In this case I used data from Wikidata (we can talk more about Wikidata on Snowflake in a future post).</p><h3>Geolocate stations</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/624/0*JtLo7emVRASGlIU0.png" /><figcaption>Big cities with NOAA weather stations</figcaption></figure><p>With a list of cities and Snowflake’s GIS support we can find the city closest to each station:</p><pre>create or replace table stations_city<br>as (<br>    select label city, station, st_distance(st_makepoint(a.lon, a.lat), st_makepoint(b.lon, b.lat)) distance, country_fips, b.country<br>        , station_id<br>    from weather_stations a<br>    join wikimedia.public.cities b<br>    on a.country=b.country_fips<br>    and st_distance(st_makepoint(a.lon, a.lat), st_makepoint(b.lon, b.lat)) &lt; 50000<br>    qualify row_number() over(partition by station order by distance) = 1<br>    order by views desc<br>)<br>;</pre><p>In that SQL query we limited the search of st_distance with a radius of 50km. This geo-join performed pleasantly fast: 3 seconds with a small warehouse.</p><p>Note that the qualify row_number() over(partition by station order by distance) = 1 brings us the city closest to each station. To get the station closest to each city we would need to change it to over(partition by city order by distance).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*R6hLAHGzGsRMZqw7soK40A.png" /><figcaption>City closest to each station.</figcaption></figure><h3>Pattern matching: Rainiest cities</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/517/0*Bi8sg8fAh6a5KuDN.png" /><figcaption>Photo by <a href="https://www.flickr.com/photos/donabelandewen/3908987446">Ewen Roberts</a></figcaption></figure><p>Now let’s have some fun with pattern matching, thanks to our <a href="https://hoffa.medium.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1">new MATCH_RECOGNIZE powers</a>.</p><p>For example let’s find the cities in the US that had the most number of consecutive days with rain:</p><pre>   select *<br>    from (select * from noaa_gsod where country_fips=&#39;US&#39;)<br>    match_recognize(<br>        partition by station_id, name<br>        order by date<br>        measures avg(temp) avg_temp, sum(rain) total_rain, min(rain) min_rain, count(*) rainy_days, min(date) date<br>        one row per match<br>        pattern(rain{5,})<br>        define<br>            rain as rain &gt; 0.1<br>    )</pre><p>I love how easy is to define a pattern between rows with <a href="https://hoffa.medium.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1">MATCH_RECOGNIZE</a>:</p><ul><li>First we define rain (for the pattern) as any row that has rain &gt; 0.1 inches. If the rain that day didn’t hit that minimum, we won’t count it as a rainy day.</li><li>Then we ask Snowflake to look for any series of rows that see that definition of rain for at least 5 consecutive days with pattern(rain{5,}).</li><li>Check in the query how the combination of one row per match, partition by station_id, name order by date, and the measures min(date) date will help us get the desired results.</li></ul><p>If we order the weather stations by most consecutive rainy days in a row, these are the results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZjJjbi1st-LYZDbF6QhQPg.png" /><figcaption>Stations in the US with the most consecutive # of rainy days</figcaption></figure><p>As you can see that Quillayute Airport saw 25 days of consecutive rain in January 2020, and then 16 days in December 2020. That’s a lot of non-stop rain.</p><p>But where is that airport? What’s the closest city? We can solve that with a join to our previous results:</p><pre>select *<br>from (<br>    select *<br>    from (select * from noaa_gsod where country_fips=&#39;US&#39;)<br>    match_recognize(<br>        partition by station_id, name<br>        order by date<br>        measures avg(temp) avg_temp, sum(rain) total_rain, min(rain) min_rain, count(*) rainy_days, min(date) date<br>        one row per match<br>        pattern(rain{5,})<br>        define<br>            rain as rain &gt; 0.1<br>    )<br>) a<br>join (<br>    select *<br>    from stations_city<br>) b<br>using (station_id)<br>order by rainy_days desc, total_rain desc<br>;</pre><p>Turns out these are the rainiest cities in the US. Can you guess before you look?</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ofxUhL67t8oL9m7D-KLFDQ.png" /><figcaption>Rainiest cities in the US</figcaption></figure><p>Before looking, my guess was Seattle. And it was correct. Seattle followed by Tacoma, its twin city. They truly deserve the title “top rainy cities in the US, by number of consecutive rainy days”.</p><p>But where did Quillayute airport go? Turns out it’s not in the 50km radius to one of the major cities I had in my query above. If we grow that radius to 200km, these are the results:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nKzqdclTn5KBvdUOUGSxiQ.png" /><figcaption>Rainiest cities in the US (weather stations within 200km radius)</figcaption></figure><p>And there it is, Seattle and Tacoma again. It sure rains a lot around Washington state.</p><h3>Everything together in dbt</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1018/1*sTnosu7KNjliK-4_l8Dh_w.png" /><figcaption>Data lineage in this project, by dbt</figcaption></figure><p>Check out my <a href="https://github.com/Snowflake-Labs/fhoffa/tree/main/knoema_noaa_dbt">GitHub project with a full dbt project</a>.</p><p>(Thanks to Noel Gomez and Mike Weinberg for their review on the dbt Slack)</p><h3>Next steps</h3><ul><li>Explore premium weather providers in the Snowflake Data Marketplace.</li><li>Try this out with a <a href="http://bit.ly/sf-free-trial">Snowflake free trial</a> account — you only need an email address to get started.</li><li>Find plenty more datasets in the <a href="https://www.snowflake.com/data-marketplace/">Snowflake Data Marketplace</a>.</li></ul><h3>Want more?</h3><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a> and <a href="https://www.linkedin.com/in/hoffa/">LinkedIn</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=20e90ee916ed" width="1" height="1" alt=""><hr><p><a href="https://towardsdatascience.com/noaa-weather-data-in-snowflake-free-20e90ee916ed">NOAA weather data in Snowflake (free)</a> was originally published in <a href="https://towardsdatascience.com">Towards Data Science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[New in Snowflake: Java UDFs (with a Kotlin NLP example)]]></title>
            <link>https://medium.com/snowflake/new-in-snowflake-java-udfs-with-a-kotlin-nlp-example-e52d94d33468?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/e52d94d33468</guid>
            <category><![CDATA[kotlin]]></category>
            <category><![CDATA[java]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[sql]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Wed, 16 Jun 2021 20:49:05 GMT</pubDate>
            <atom:updated>2021-07-14T20:10:54.785Z</atom:updated>
            <content:encoded><![CDATA[<h4>Snowflake now let’s you easily create Java UDFs, which is an incredibly powerful and versatile feature. Let’s check it out with by running a library written in Kotlin — to detect written languages. Out of GitHub and into your SQL code, in 3 easy steps.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/688/1*yC-L_0er1BlFCrBNMCPQpg.png" /><figcaption>Detect written languages with a Java UDF</figcaption></figure><h3>Quick example: Detect written languages</h3><p>Snowflake has made it really easy to <a href="https://docs.snowflake.com/en/developer-guide/udf/java/udf-java-creating.html">create Java UDFs</a>. You just need to do something like this:</p><pre>create function add(x integer, y integer)<br>returns integer<br>language java<br>handler=&#39;Test.add&#39;<br>as<br>$$<br>    class Test {<br>        public static int add(int x, int y) {<br>          return x + y;<br>        }<br>    }<br>$$;</pre><p>Then you can use that function in your SQL statements:</p><pre>select add(1, 3)</pre><pre>// 4</pre><p>That’s easy — and then the real power comes with the ability to load and use jar packages. For example, to detect written languages using <a href="https://github.com/pemistahl/lingua">Lingua</a>:</p><pre>create or replace function detect_lang(x string)<br>returns string<br>language java<br>imports = (&#39;@~/lingua-1.1.0-with-dependencies.jar&#39;)<br>handler=&#39;MyClass.detect&#39;<br>as<br>$$<br>import com.github.pemistahl.lingua.api.*;<br>import static com.github.pemistahl.lingua.api.Language.*;</pre><pre>class MyClass {<br>     static LanguageDetector detector = LanguageDetectorBuilder.fromLanguages(ENGLISH, FRENCH, GERMAN, SPANISH).build();</pre><pre>    public static String detect(String x) {<br>          return detector.detectLanguageOf(x).toString();<br>        }<br>    }<br>$$;</pre><p>If you have the right jar staged into your Snowflake account, that’s all it takes:</p><pre>select $1, detect_lang($1)<br>from values(&#39;languages are awesome&#39;),(&#39;hola mi amigo&#39;),(&#39;hallo Freunde&#39;),(&#39;je ne parle pas&#39;);</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/470/1*9t7V5di1qKHbZdotxFaCcw.png" /></figure><h4>Notes</h4><ul><li>I love the ability to write custom Java code while defining UDFs within SQL (Snowflake takes care of compiling it). This allows you to handle all your glue code in one place, within your SQL scripts and dbt projects.</li></ul><h3>Setup</h3><p><a href="https://github.com/pemistahl/lingua">Lingua</a> is “<em>The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike</em>” (according to them). It’s written in Kotlin, which is a language that runs on the JVM.</p><p>To package this library into a single fat jar for Snowflake, I followed these steps:</p><ol><li>Clone the git project</li><li>Build a jar with dependencies with Gradle.</li></ol><pre>git clone https://github.com/pemistahl/lingua.git<br>./gradlew jarWithDependencies</pre><p>Then you can PUT this jar into your Snowflake account with SnowSQL:</p><pre>put &#39;file://build/libs/lingua-1.1.0-with-dependencies.jar&#39; @~;</pre><h3>Read more</h3><ul><li>“<a href="https://www.snowflake.com/blog/welcome-to-snowpark-new-data-programmability-for-the-data-cloud/">Welcome to Snowpark: New Data Programmability for the Data Cloud</a>”</li><li>“<a href="https://stackoverflow.com/questions/68013613/what-jdk-version-does-snowflake-support-for-java-udfs">What JDK version does Snowflake support for Java UDFs?</a>”</li><li><a href="https://stackoverflow.com/questions/68029047/how-can-i-parse-html-with-a-snowflake-udf">How can I parse HTML with a Snowflake UDF?</a></li><li><a href="https://businesstechnologiesjourney.blogspot.com/2021/06/lets-create-dataset-for-demonstrations.html">Let’s create fake data with few lines of SQL and Snowflake Java UDF</a></li><li><a href="https://docs.snowflake.com/en/developer-guide/udf/java/udf-java-creating.html">Java UDF Snowflake docs</a></li><li><a href="https://github.com/pemistahl/lingua">pemistahl<strong>/</strong>lingua</a> on GitHub</li></ul><h4>Acknowledgements</h4><p>Java UDF support is in active development by a great team at Snowflake, including <a href="https://twitter.com/ElliottBrossard">Elliott Brossard</a> and <a href="https://twitter.com/isaackunen">Isaac Kunen</a>. Stay tuned for more!</p><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/Di_Ku/status/1405224399238623234">@Di_Ku</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><p>Thanks to Peter M. Stahl for the awesome Lingua library:</p><p><a href="https://github.com/pemistahl/lingua">pemistahl/lingua</a></p><h3>Want more?</h3><ul><li>Try this out with a <a href="http://bit.ly/sf-free-trial">Snowflake free trial</a> account — you only need an email address to get started.</li><li>Find plenty of interesting datasets in the <a href="https://www.snowflake.com/data-marketplace/">Snowflake Data Marketplace</a>.</li></ul><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a> and <a href="https://www.linkedin.com/in/hoffa/">LinkedIn</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><ul><li><a href="https://towardsdatascience.com/forecasts-in-snowflake-facebook-prophet-on-cloud-run-with-sql-71c6f7fdc4e3">Forecasts in Snowflake: Facebook Prophet on Cloud Run with SQL</a></li><li><a href="https://medium.com/snowflake/dynamic-data-masks-protect-sensitive-data-with-snowflake-superhero-edition-20a00773f951">Dynamic Data Masks: Protect sensitive data with Snowflake (superhero edition)</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e52d94d33468" width="1" height="1" alt=""><hr><p><a href="https://medium.com/snowflake/new-in-snowflake-java-udfs-with-a-kotlin-nlp-example-e52d94d33468">New in Snowflake: Java UDFs (with a Kotlin NLP example)</a> was originally published in <a href="https://medium.com/snowflake">Snowflake</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Lessons learned, 10 years after moving to the US to join Google]]></title>
            <link>https://hoffa.medium.com/lessons-learned-10-years-after-moving-to-the-us-to-join-google-92a213dc93f9?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/92a213dc93f9</guid>
            <category><![CDATA[software-engineering]]></category>
            <category><![CDATA[google]]></category>
            <category><![CDATA[developer-relations]]></category>
            <category><![CDATA[careers]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Wed, 02 Jun 2021 01:18:43 GMT</pubDate>
            <atom:updated>2021-06-02T06:49:20.247Z</atom:updated>
            <content:encoded><![CDATA[<h4>It’s exactly 10 years since my first day at Google. What I’ve learnt:</h4><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/felipehoffa/status/1399881435457069058">@felipehoffa</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><ol><li>In 2011 there was a career path for me, but <strong>I had to leave Chile to find it</strong>.</li><li><strong>There is a career path outside management</strong>: At Google I discovered that software engineers could keep moving forward without the need to become a manager.</li><li><strong>There are jobs that combine passions with skills</strong>: I wasn’t the best SWE at Google, but someone saw my passions and skills, and invited me to join Developer Relations.</li><li><strong>Know your strengths</strong>: Taking the <a href="https://www.gallup.com/cliftonstrengths/en/252137/home.aspx">Clifton StrengthsFinder test</a> changed how I viewed myself. The results made sense, and gave me a way to communicate to others and myself who I was and what I could do.</li><li><strong>Know what you want to do</strong>: If I had to summarize the 3 things that have motivated me the most throughout life it’s Travel, Storytelling, and Analyzing Data. DevRel was the perfect fit for me.</li><li><strong>Choose a manager that values your strengths</strong>: You won’t always have a choice of who your next manager will be, but make sure to establish as early as possible what your strengths are and what you want to do.</li><li><strong>There is life outside Google</strong>: After almost 10 years at Google with 13 different managers, I finally had enough. I used all I had learned to find my new home at <a href="https://www.snowflake.com/">Snowflake</a>, and it has proven to be <a href="https://www.snowflake.com/blog/5-reasons-why-i-joined-snowflake-after-google/">the right choice</a>.</li><li><strong>Making mistakes is ok</strong>: I’ve made people feel uncomfortable. I’ve missed deadlines. I should have replied to more emails and messages than I have. Hopefully my contributions have been bigger than my mistakes, and I’m grateful for everyone that has forgiven me.</li><li><strong>Learn from your own mistakes and shortcomings</strong>. Have empathy for those that are still learning from their own.</li><li><strong>Showing up and building relationships is more important than a plan</strong>: I never planned to move to the US, join Google, nor DevRel, nor Snowflake. All these happened thanks to friends that I made along the way. I’m grateful to them for opening doors and inviting me to join.</li><li><strong>Pay it forward</strong>. Open doors. Share what you learned. Understand that everyone has different strengths. Forgive.</li></ol><p><a href="https://www.linkedin.com/posts/hoffa_its-exactly-10-years-since-my-first-day-activity-6805650636831641600-Dp96">Felipe Hoffa on LinkedIn: It&#39;s exactly 10 years since my first day at Google. What I&#39;ve learnt</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*bJHCof75hq-haC5h" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*LR62W3vVpQzELLmw" /></figure><p><a href="https://www.snowflake.com/blog/5-reasons-why-i-joined-snowflake-after-google/">5 Reasons Why I Joined Snowflake After Google | Snowflake Blog</a></p><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/felipehoffa/status/1211739663632863235">@felipehoffa</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/felipehoffa/status/1314377101617754112">@felipehoffa</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/felipehoffa/status/1298300356204535809">@felipehoffa</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/felipehoffa/status/1294347399494688768">@felipehoffa</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=92a213dc93f9" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Car rental prices at record highs: Explore the trends with data from QL2 and Snowflake]]></title>
            <link>https://medium.com/snowflake/car-rental-prices-at-record-highs-explore-the-trends-with-data-from-ql2-and-snowflake-ba0f3a946782?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/ba0f3a946782</guid>
            <category><![CDATA[visualization]]></category>
            <category><![CDATA[travel]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[data]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Thu, 27 May 2021 18:02:30 GMT</pubDate>
            <atom:updated>2021-05-27T18:02:30.633Z</atom:updated>
            <content:encoded><![CDATA[<h4>Car rental rates are surprisingly high — and I have the data to prove it. In this post learn where to find all this data and how to analyze it with SQL, thanks to QL2 and Snowflake.</h4><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fa-3LmloaExc&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Da-3LmloaExc&amp;image=http%3A%2F%2Fi.ytimg.com%2Fvi%2Fa-3LmloaExc%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/9560f69ea437373e2840ae2bb5af2490/href">https://medium.com/media/9560f69ea437373e2840ae2bb5af2490/href</a></iframe><p>Have you tried renting a car lately? The rates are surprisingly high. And I have the data to prove it:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*uV1vWcRb5gTiD50P" /><figcaption>Average car rental rates in Las Vegas, LA, Miami, Chicago, and San Francisco — starting in 2019.</figcaption></figure><p>Each of these lines represents the average quoted price for car rentals in LA, Las Vegas, Miami, Chicago, and San Francisco — starting in 2019, and ending in record high prices right now.</p><p>The highlight of this story is having access to all this data, and wondering what else you could do with it. So let me introduce <a href="https://www.snowflake.com/datasets/ql2-consumer-travel-data/">QL2</a>, a company whose mission is to provide competitive pricing data across many industries, including airlines, hotels, and a lot more — including car rental prices.</p><p>If you want to play with their data, you can <a href="https://www.snowflake.com/datasets/ql2-consumer-travel-data/">find them in the Snowflake Data Marketplace</a>, within the Travel category. You can get immediate access to their sample dataset, which has hotel pricing around Las Vegas. And you can request access to one of their premium datasets, like the car rental pricing dataset we are exploring here.</p><p>Then you can use SQL to analyze several years of car rental pricing data from all over the world. That’s how we can tell that people in Sydney, Australia have seen a similar pattern of high prices, while in Paris, France prices have been kind of stable:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*RQXiEYMkE1DEiwHZ" /><figcaption>Average car rental rates in Sydney and Paris — starting in 2019.</figcaption></figure><h3>Queries</h3><h4>Weekly summary</h4><p>My first step was to reduce the billions of rows that QL2 offers to a weekly summary:</p><pre>create table ql2_cars_weekly<br>cluster by (location, week)<br>as<br>select location, currency, rate_type, mileage, site<br>  , date_trunc(week, qts) week<br>  , exp(avg(ln(rate))) avg_price<br>  , count(*) requests<br>from ql2_cars.public.raw_carrental<br>where qts&gt;&#39;2019-01-01&#39;<br>and rate&gt;0<br>and location rlike &#39;[A-Z][A-Z][A-Z]&#39;<br>group by 1,2,3,4,5,6<br>having requests&gt; 150;</pre><p>Notes:</p><ul><li>To avoid the effect of outliers exp(avg(ln(rate))) gives a <a href="https://jlmc.medium.com/understanding-three-simple-statistics-for-data-visualizations-2619dbb3677a">geometric mean</a> instead of a simple average.</li><li>The dataset includes multiple locations, so to extract only airports we look for 3 letters only with location rlike ‘[A-Z][A-Z][A-Z]’.</li><li>We’ll keep groups of location, currency, rate_type, mileage, site only where there are more than 150 samples per week.</li></ul><h4>Pick currency and car rental type per location</h4><p>Within the weekly summary we find that each location has multiple combinations of rate_type, mileage, and currency.</p><p>The following query filters to keep only daily rates, unlimited mileage, and guesses the preferred currency per airport:</p><pre>create or replace table ql2_cars_weekly_viz<br>as<br>select location, currency, week, avg(avg_price) avg_rate, sum(requests) requests<br>from (<br>    select *<br>    from ql2_cars_weekly<br>    --where week = (select max(week) from ql2_cars_weekly)<br>    where rate_type like &#39;D%&#39;<br>    and mileage like &#39;U%&#39;<br>    qualify currency = first_value(currency) over(partition by location order by iff(site=&#39;XP_US&#39;, 0, requests) desc)<br>    order by sum(requests) over(partition by location) desc, requests desc<br>)<br>group by 1,2,3;</pre><p>These are the results that we ended up visualizing with a simple filter:</p><pre>select location, week, avg_rate, requests, currency<br>from ql2_cars_weekly_viz<br>where location IN <br>-- (&#39;SFO&#39;, &#39;MIA&#39;, &#39;LAX&#39;, &#39;LAS&#39;, &#39;ORD&#39;)<br>   (&#39;CDG&#39;, &#39;SYD&#39;)<br>;</pre><h3>Next steps</h3><p>Think about how your business could use these datasets. Competitive pricing data for travel means you’ll be able to understand trends as people start planning their vacations and future business trips. What are they planning, where do they want to go, where are prices increasing due to higher demand — or constraints — and all updated daily in the Data Cloud.</p><ul><li>Try this out with a <a href="http://bit.ly/sf-free-trial">Snowflake free trial</a> account — you only need an email address to get started.</li><li>Find plenty more datasets in the <a href="https://www.snowflake.com/data-marketplace/">Snowflake Data Marketplace</a>.</li></ul><h3>Want more?</h3><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a> and <a href="https://www.linkedin.com/in/hoffa/">LinkedIn</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><p><a href="https://github.com/Snowflake-Labs/awesome-snowflake">Snowflake-Labs/awesome-snowflake</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/0*D9beHvwqMibIq8IO.jpg" /><figcaption>Photo: <a href="https://pixabay.com/photos/car-keys-open-handle-unlock-5246178/">Pixabay</a></figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ba0f3a946782" width="1" height="1" alt=""><hr><p><a href="https://medium.com/snowflake/car-rental-prices-at-record-highs-explore-the-trends-with-data-from-ql2-and-snowflake-ba0f3a946782">Car rental prices at record highs: Explore the trends with data from QL2 and Snowflake</a> was originally published in <a href="https://medium.com/snowflake">Snowflake</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Forecasts in Snowflake: Facebook Prophet on Cloud Run with SQL]]></title>
            <link>https://towardsdatascience.com/forecasts-in-snowflake-facebook-prophet-on-cloud-run-with-sql-71c6f7fdc4e3?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/71c6f7fdc4e3</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[time-series-analysis]]></category>
            <category><![CDATA[facebook-prophet]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Fri, 14 May 2021 17:06:18 GMT</pubDate>
            <atom:updated>2021-05-14T22:07:52.922Z</atom:updated>
            <content:encoded><![CDATA[<h4>By creating a Snowflake external function we can get predictions out of Facebook Prophet for any time series. Learn how to easily integrate these forecasts into your SQL pipelines — with Snowflake connecting to Prophet running on a Docker container inside Google Cloud Run.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/956/1*yG1KfBEQLjHTuvCQb_U9hg.png" /><figcaption>Photo: <a href="https://pixabay.com/photos/woman-spirit-ball-5439365/">Pixabay</a></figcaption></figure><p>The goal for this post is to build a function you could use within Snowflake to forecast time series. A great open source tool for this is <a href="https://facebook.github.io/prophet/">Facebook Prophet</a>, and we just need a way to use it within our Snowflake environment. This is easy, with Snowflake’s ability to run <a href="https://docs.snowflake.com/en/sql-reference/external-functions-introduction.html">external functions </a>— hence we only need to host an instance of Prophet and add the necessary plumbing to end up with a prophetize(timeseries) function within Snowflake.</p><p>Let’s start with a demo, any time series in Snowflake will do: For example, the temperature around New York City in the Newark airport since 2018:</p><pre>select date, temp<br>from noaa_gsod<br>where country_fips=&#39;US&#39;<br>and station = &#39;NEWARK LIBERTY INTERNATIONAL AP&#39;;</pre><pre>-- TODO: Show how to get NOAA GSOD from Knoema in a future post.</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kifPyhE9OloDlmBmS3psyg.png" /><figcaption>Daily temperature at Newark Airport</figcaption></figure><p>Then we can call our function prophetize() (see below how to create it) by aggregating the previous time series into an array with dates and values:</p><pre>select prophetize(array_construct(<br>        array_agg(temp::float) <br>          within group(order by date)<br>        , array_agg(date::date) <br>          within group(order by date))::string) str<br>from table(result_scan(last_query_id(-1)));</pre><p>And that’s it. What we get back is an array with predictions. An easy way to visualize these results is to combine the values of the previous two queries:</p><pre>select date, temp::float avg_temp<br>    , 0 forecast, 0 fore_min, 0 fore_max<br>from table(result_scan(last_query_id(-2)))<br>union all<br>select x.value[0]::date, 0, x.value[1]::int forecast, x.value[2]::int fore_min, x.value[3]::int fore_max<br>from table(result_scan(last_query_id(-1))) a, table(flatten(a.$1)) x;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6fS-3Tvz-cz_fxBKgdpKcg.png" /><figcaption>Daily temperature at Newark Airport, plus forecast bands for the next 365 days.</figcaption></figure><p>Interesting notes on the above:</p><ul><li>Prophet was able to easily detect seasonal patterns and predict future values.</li><li>A lot of how Prophet works is tunable, but what you get out of the box works too.</li><li>On Snowflake I used last_query_id(-1) and last_query_id(-2) to combine the results of the previous two queries. It’s a nifty feature.</li></ul><p>Now let’s check the details on how to connect a function in the Snowflake SQL world prophetize() to Facebook Prophet running on Cloud Run.</p><h3>How to</h3><h4>Deploy Prophet on a container to Cloud Run</h4><p>The requirements.txt to build this container are straightforward:</p><pre>flask==1.1.4<br>requests==2.25.1<br>gunicorn==20.1.0<br>pandas==1.2.4<br>pystan==2.19.1.1  # &lt;3.0<br>fbprophet==0.7.1</pre><p>As the Dockerfile:</p><pre>FROM python:3.8</pre><pre># Allow statements and log messages to immediately appear in the Cloud Run logs<br>ENV PYTHONUNBUFFERED True</pre><pre>COPY requirements.txt .<br>RUN pip install -r requirements.txt</pre><pre>ENV APP_HOME /app<br>WORKDIR $APP_HOME<br>COPY main.py ./</pre><pre>CMD exec gunicorn --bind :$PORT --workers 1 --threads 1 --timeout 0 main:app</pre><p>And this is main.py, a basic web server that parses incoming arrays into a dataframe that Prophet uses to forecast an arbitrary number of periods. Then it returns a serialized array with a forecast and uncertainty intervals that Snowflake will receive:</p><pre>import json<br>import logging<br>import os</pre><pre>from fbprophet import Prophet<br>from flask import Flask, request<br>import pandas as pd</pre><pre>log = logging.getLogger()<br>app = Flask(__name__)</pre><pre>def forecast(df: pd.DataFrame, periods=365) -&gt; pd.DataFrame:<br>    df[&quot;ds&quot;] = pd.to_datetime(df[&quot;ds&quot;])</pre><pre>    model = Prophet()<br>    model.fit(df)</pre><pre>    future_df = model.make_future_dataframe(<br>      periods=periods, include_history=False)</pre><pre>    return model.predict(future_df)[[&quot;ds&quot;, &quot;yhat&quot;, &quot;yhat_lower&quot;, &quot;yhat_upper&quot;]]</pre><pre><a href="http://twitter.com/app">@app</a>.route(&quot;/&quot;, methods=[&quot;POST&quot;])<br>def index():<br>    payload = request.get_json()<br>    logging.info(payload)<br>    # <a href="https://docs.snowflake.com/en/sql-reference/external-functions-data-format.html">https://docs.snowflake.com/en/sql-reference/external-functions-data-format.html</a></pre><pre>    rows = payload[&quot;data&quot;]<br>    return_value = []<br>    for row in rows:<br>        row_number = row[0]<br>        function_input = json.loads(row[1])<br>        df = pd.DataFrame({&#39;ds&#39;: function_input[1], &#39;y&#39;: function_input[0]})<br>        fc = forecast(df)<br>        fc[&#39;ds&#39;] = fc[&#39;ds&#39;].dt.strftime(&#39;%Y-%m-%d&#39;)<br>        row_to_return = [row_number, fc.to_numpy().tolist()]<br>        return_value.append(row_to_return)</pre><pre>    json_compatible_string_to_return = json.dumps({&quot;data&quot;: return_value})<br>    return (json_compatible_string_to_return)</pre><pre>if __name__ == &quot;__main__&quot;:<br>    app.run(debug=True, host=&quot;0.0.0.0&quot;, port=int(os.environ.get(&quot;PORT&quot;, 8080)))</pre><p>If we want to build and run this container on Google Cloud run, we need to run:</p><pre>gcloud builds submit --config cloudbuild.yaml;<br>gcloud run deploy --image gcr.io/fhoffa/prophetize --platform managed</pre><p>Building the image on Cloud Build is slow the first time, as it takes time to compile Prophet — but this cloudbuild.yaml makes it fast with an image cache on further builds:</p><pre>steps:<br>- name: &#39;gcr.io/cloud-builders/docker&#39;<br>  entrypoint: &#39;bash&#39;<br>  args:<br>  - &#39;-c&#39;<br>  - |<br>    docker pull gcr.io/fhoffa/prophetize:latest || exit 0</pre><pre>- name: &#39;gcr.io/cloud-builders/docker&#39;<br>  args: [<br>          &#39;build&#39;,<br>          &#39;--cache-from&#39;, &#39;gcr.io/fhoffa/prophetize:latest&#39;,<br>          &#39;-t&#39;, &#39;gcr.io/fhoffa/prophetize:latest&#39;,<br>          &#39;.&#39;<br>        ]</pre><pre>images: [&#39;gcr.io/fhoffa/prophetize:latest&#39;]</pre><h4>Cloud Run behind Google API Gateway</h4><p>One of the main goals I had behind this project was to celebrate that <a href="https://docs.snowflake.com/en/sql-reference/external-functions-creating-gcp.html">Snowflake now supports external functions for GCP</a>. Hence my choice to deploy on Cloud Run.</p><p>Now, to run external functions through GCP, we need to set up a connection from Snowflake to Google API Gateway, and from API Gateway to Cloud Run.</p><p>First we need a gateway.yaml for API Gateway to know that it will act as a proxy to the service we deployed on Cloud Run:</p><pre>swagger: &#39;2.0&#39;<br>info:<br>  title: API Gateway config for Snowflake external function.<br>  description: This configuration file connects the API Gateway resource to the remote service (Cloud Run).<br>  version: 1.0.0<br>schemes:<br>  - https<br>produces:<br>  - application/json<br>paths:<br>  /test:<br>    post:<br>      summary: Prophetize<br>      operationId: prophetize<br>      x-google-backend:<br>        address: <a href="https://prophetize-zqnzinxyeq-wl.a.run.app/">https://prophetize-zqnzinxyeq-wl.a.run.app/</a> <br>        protocol: h2<br>      responses:<br>        &#39;200&#39;:<br>          description: &lt;DESCRIPTION&gt;<br>          schema:<br>            type: string</pre><p>Then you can follow the GCP docs to <a href="https://console.cloud.google.com/api-gateway/gateway/create">create an API Gateway</a> with this configuration. Oh, and make sure to replace the values above with your own service URLs.</p><h4>Snowflake to Google API Gateway</h4><p>This is how I connected the dots on Snowflake, to create an integration with API Gateway:</p><pre>use role accountadmin;<br>use schema temp.public;</pre><pre>create or replace api integration prophet_test<br>    api_provider = google_api_gateway<br>    google_audience = &#39;test1-3s4aecfho43ih.apigateway.fhoffa.cloud.goog&#39;<br>    api_allowed_prefixes = (&#39;<a href="https://prophetize-4r3ddv95.wl.gateway.dev&#39;">https://prophetize-4r3ddv95.wl.gateway.dev&#39;</a>)<br>    enabled = true;<br>    <br>describe integration prophet_test;<br>create or replace external function prophetize(x string)<br>    returns variant<br>-- IMMUTABLE<br>    api_integration = prophet_test<br>    as &#39;<a href="https://prophetize-4r3ddv95.wl.gateway.dev/test&#39;">https://prophetize-4r3ddv95.wl.gateway.dev/test&#39;</a> ;<br>grant usage on function prophetize(string) to role sysadmin;</pre><p>And that’s all you need, now you can call the just minted prophetize() as in any other query in Snowflake:</p><pre>select prophetize(&#39;[[41,43,62,43],[&quot;2019-12-30&quot;,&quot;2020-01-06&quot;,&quot;2020-01-13&quot;,&quot;2020-01-20&quot;]]&#39;);</pre><p>Which gives results like:</p><pre>[<br>  [<br>    &quot;2020-01-21&quot;,<br>    51.3641167654911,<br>    40.85673826625397,<br>    61.745184538148166<br>  ],<br>  [<br>    &quot;2020-01-22&quot;,<br>    51.72223221323965,<br>    41.87259513681375,<br>    61.29144225035811<br>  ],<br>  [<br>    &quot;2020-01-23&quot;,<br>    52.0803476609882,<br>    41.66374622035821,<br>    61.55883149200517<br>  ], [...]<br>]</pre><h4>Securing the flows</h4><p>You might have noticed many URLs in my configs above — now that you’ve seen them, you might want to start calling my functions from your accounts. That would be fine, but I’d rather protect them.</p><p>Snowflake makes this easy. Once you create the integration above, a service account for GCP will be automatically provisioned. You can get its value with describe integration prophet_test — and then use that service account to update the gateway.yaml so no one else can call it:</p><pre>swagger: &#39;2.0&#39;<br>info:<br>  title: API Gateway config for Snowflake external function.<br>  description: This configuration file connects the API Gateway resource to the remote service (Cloud Function).<br>  version: 1.0.0<br>securityDefinitions:<br>  snowflakeAccess01:<br>    authorizationUrl: &quot;&quot;<br>    flow: &quot;implicit&quot;<br>    type: &quot;oauth2&quot;<br>    x-google-issuer: &quot;ftyqmxcfyo@sfc-prod2-1-947.iam.gserviceaccount.com&quot;<br>    x-google-jwks_uri: &quot;<a href="https://www.googleapis.com/robot/v1/metadata/x509/">https://www.googleapis.com/robot/v1/metadata/x509/</a>ftyqmxcfyo@sfc-prod2-1-947.iam.gserviceaccount.com&quot;<br>schemes:<br>  - https<br>produces:<br>  - application/json<br>paths:<br>  /test:<br>    post:<br>      summary: Prophetize.<br>      operationId: prophetize<br>      security:<br>        - snowflakeAccess01: []<br>      x-google-backend:<br>        address: <a href="https://prophetize-zqnzinxyeq-wl.a.run.app/">https://prophetize-zqnzinxyeq-wl.a.run.app/</a> <br>        protocol: h2<br>      responses:<br>        &#39;200&#39;:<br>          description: &lt;DESCRIPTION&gt;<br>          schema:<br>            type: string</pre><p>Then follow these <a href="https://docs.snowflake.com/en/sql-reference/external-functions-creating-gcp-set-security-policy.html#reload-the-updated-configuration-file">Snowflake docs to update your GCP API Gateway</a> with the above secured config.</p><p>Note that this GCP service account is provisioned by Snowflake regardless of what cloud you are using to host your Snowflake account. In this case I ran this whole demo on Snowflake on AWS, and it was able to call the GCP services effortlessly.</p><p>Meanwhile in Cloud Run, make sure to stop allowing unauthenticated invocations. With this only calls authorized through the API Gateway will be served:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wCmG7goQ9O02E8ou8UPidA.png" /><figcaption>Stop allowing unauthenticated invocations on Cloud Run.</figcaption></figure><h3>Learn more</h3><p><a href="https://facebook.github.io/prophet/">Facebook Prophet</a> is a versatile tool with many levers and ways to tune:</p><blockquote>Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.</blockquote><p>Read more from:</p><ul><li>“<a href="https://blog.doit-intl.com/seven-tips-for-predicting-the-future-with-facebooks-prophet-51cf8d6c89dc">Seven Tips for Forecasting Cloud Costs (with FB’s Prophet)</a>” from <a href="https://medium.com/u/b45fa95a7293">Gad Benram</a>.</li><li>“<a href="https://towardsdatascience.com/building-a-serverless-containerized-batch-prediction-model-using-google-cloud-run-and-terraform-82c97ab17515">Building a serverless, containerized batch prediction model using Google Cloud Run, Pub/Sub, Cloud Storage and Terraform</a>” by <a href="https://medium.com/u/b7f2b6fc948e">Sebastian Telsemeyer</a>.</li><li><a href="https://github.com/ahmetb/cloud-run-faq">Cloud Run costs and features</a>, by <a href="https://medium.com/u/2cac56571879">Ahmet Alp Balkan</a></li></ul><h3>Next steps</h3><ul><li>Try this out with a <a href="http://bit.ly/sf-free-trial">Snowflake free trial</a> account — you only need an email address to get started.</li><li>I shall set up a GitHub project with all these files (and update here when done)</li></ul><h4>Play with Prophet: Bitcoin predictions</h4><p><a href="https://medium.com/snowflake/knoema-datasets-and-the-snowflake-data-marketplace-challenge-95ff0d7ac6eb">Knoema</a> already has the valuation of multiple crypto coins in the Snowflake Marketplace, thus building a time series forecast with Prophet is straightforward:</p><pre>-- define the time series<br>select &quot;Value&quot; value, &quot;Date&quot; date<br>from KNOEMA_FINANCE_DATA_ATLAS.FINANCE.CMCCD2019 <br>where &quot;Cryptocurrency Name&quot; = &#39;Bitcoin (btc)&#39; <br>and &quot;Measure&quot; = &#39;PriceUSD&#39; <br>and date &gt; &#39;2017-01-01&#39;<br>order by &quot;Date&quot;;</pre><pre>-- prophetize<br>select prophetize(array_construct(<br>        array_agg(value::float) within group(order by date)<br>        , array_agg(date::date) within group(order by date))::string) str<br>from table(result_scan(last_query_id(-1)));</pre><pre>-- prepare the viz<br>select date, value, 0 forecast, 0 fore_min, 0 fore_max<br>from table(result_scan(last_query_id(-2)))<br>union all<br>select x.value[0]::date, 0, x.value[1]::int forecast, x.value[2]::int fore_min, x.value[3]::int fore_max<br>from table(result_scan(last_query_id(-1))) a, table(flatten(a.$1)) x;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/833/0*Qkr8NN9tNnnNo34q" /><figcaption>Prophet’s forecast for Bitcoin’s price — ran on Snowflake with data from <a href="https://medium.com/snowflake/knoema-datasets-and-the-snowflake-data-marketplace-challenge-95ff0d7ac6eb">Knoema</a></figcaption></figure><p>The more you play with Prophet, the more you’ll see that the forecast will depend heavily on what values you feed it, and how you tune it. In this case, the forecasts varied widely depending on what starting date I used for the time series — starting in 2016 meant Prophet would observe a more complex pattern that feeding it data starting in 2018, and so forth.</p><p>Best part for me? I can pull all this off without leaving my comfy Snowflake SQL web UI:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kBU4pf-7szi12-SjG8zRXw.png" /><figcaption>Visualizing forecasts inside Snowsight, Snowflake’s web UI</figcaption></figure><h3>Want more?</h3><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a> and <a href="https://www.linkedin.com/in/hoffa/">LinkedIn</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><ul><li><a href="https://github.com/Snowflake-Labs/awesome-snowflake">Snowflake-Labs/awesome-snowflake</a></li><li><a href="https://medium.com/snowflake/knoema-datasets-and-the-snowflake-data-marketplace-challenge-95ff0d7ac6eb">Knoema Datasets and the Snowflake Data Marketplace Challenge</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=71c6f7fdc4e3" width="1" height="1" alt=""><hr><p><a href="https://towardsdatascience.com/forecasts-in-snowflake-facebook-prophet-on-cloud-run-with-sql-71c6f7fdc4e3">Forecasts in Snowflake: Facebook Prophet on Cloud Run with SQL</a> was originally published in <a href="https://towardsdatascience.com">Towards Data Science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Dynamic Data Masks: Protect sensitive data with Snowflake (superhero edition)]]></title>
            <link>https://medium.com/snowflake/dynamic-data-masks-protect-sensitive-data-with-snowflake-superhero-edition-20a00773f951?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/20a00773f951</guid>
            <category><![CDATA[privacy-protection]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[compliance]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[sql]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Tue, 04 May 2021 14:33:31 GMT</pubDate>
            <atom:updated>2021-05-05T15:26:14.130Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/960/0*VXFxBQXgUDPd6baT.png" /><figcaption>Source: <a href="https://pixabay.com/vectors/mask-black-costume-superhero-304132/">Pixabay</a></figcaption></figure><h4>Your customers trust you to protect their privacy and identities — and Snowflake makes this easy with <a href="https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html">Dynamic Data Masks</a>. With this you can get rid of complex systems of secure views or copies of data: Just create policies that allow everyone to share the same tables and queries, while protecting sensitive data.</h4><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FOjm11Wq-wR4%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DOjm11Wq-wR4&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FOjm11Wq-wR4%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/be36cee9f4b207901b45405894dc456e/href">https://medium.com/media/be36cee9f4b207901b45405894dc456e/href</a></iframe><p>Let’s say your customers are <a href="https://community.snowflake.com/s/dataheroes">superheroes</a>, and you have a table that contains data about them. You know a lot about your customers, and they trust you to protect their identities and privacy — not only from third parties, but also from people in your company that need this data to do their job, but not all of this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*sQmBRfeqV5Ar-N6uZemo9g.png" /></figure><p>For example, your lawyers need to see the real names of your heroes, but not their superhero identities. And customer service — they need to see the first name to talk to them, but not their full identity. Maybe they need to know how famous each of them is, but not necessarily the exact number. And so on.</p><p>So we need to solve this data problem, and we don’t want to make multiple copies of the data and we don’t want to create a complex system of views.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*ExR5OcTyS193W3U7vcMCFg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*kV4ZCoOnTqplToVJm0Rk0g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*wdSs4_g865I8UxALdJgOsA.png" /></figure><p>What we want here is <a href="https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html">dynamic masks</a>. With dynamic masks, we can define policies in one place and Snowflake will make sure that people in your company can share tables and queries, but they will only see what they’re allowed to see, depending on their role.</p><p>So I wrote this policy in SQL that only allows people with the lawyer role to see the full name. Meanwhile people with the customer service role only see the first name — and no one else will be able to see the legal name of our customers:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*rA2rdR8csooh2jX6MaHBlw.png" /></figure><p>Let’s try it out with a SELECT *: I get some data, and if I change my role, the same SELECT * gives me different results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/696/1*nw35SIrHTpqmO5Zvx6yxWQ.gif" /></figure><p>It’s the same table. It’s the same query. My role determines what I see. Even better, these masks are applied at query time. They even work with external tables that might live in your Data Lake.</p><p>And it’s that simple because Snowflake cares about your data and the privacy of your customers. And this is just a start. We will go deeper in the near future but in the meantime, check below the full SQL script to reproduce these results.</p><h3>Script</h3><p>For dynamic masks to work, you need to use <a href="https://docs.snowflake.com/en/user-guide/intro-editions.html#enterprise-edition">Snowflake’s Enterprise edition</a> or higher:</p><pre>use role sysadmin;</pre><pre>// Let&#39;s assume this database and a warehouse exists, otherwise create them.<br>use masks_db;</pre><pre>// Quick creation of base data<br>create or replace table customers<br>as<br>select $1[0]::string hero_name, $1[1]::string birth_name, replace($1[2], &#39;,&#39;)::int fame, $1[3]::string birth_place<br>from (<br>  select split(x.value, &#39;|&#39;)<br>  from table(split_to_table(&#39;Batman|Bruce Wayne|104,302|Gotham City<br>  Iron Man|Tony Stark|64,098|Manhattan<br>  Wolverine|James Howlett|43,482|Alberta<br>  Black Widow|Natasha Romanoff|42,453|Volgograd<br>  Catwoman|Selina Kyle|40,763|Gotham City<br>  Magneto|Max Eisenhardt|35,691|Dusseldorf<br>  Doctor Manhattan|Jonathan Osterman|30,225|Heidelberg<br>  Kitty Pryde|Katherine Anne Pryde|28,226|Deerfield<br>  Spawn|Albert Francis Simmons|28,080|Detroit<br>  Daredevil|Matthew Michael Murdock|27,574|New York City<br>  Martian Manhunter|J\’onn J\‘onzz|27,547|Mars<br>  Luke Cage|Carl Lucas|24,799|New York City<br>  Storm|Ororo Munroe|24,221|New York City<br>  Wolfsbane|Rahne Sinclair|23,413|Ullapool<br>  Colossus|Пётр Николаевич Распутин|22,823|Siberia<br>  Legion|David Charles Haller|22,769|Haifa<br>  Tim Drake|Timothy Jackson Drake|21,957|Gotham City<br>  Sunspot|Roberto da Costa|21,872|Rio de Janeiro<br>  Elasti-Girl|Rita Farr|20,650|<br>  Polaris|Lorna Dane|20,532|San Francisco<br>  Zatanna|Zatanna Zatara|19,782|Gotham City<br>  Emma Frost|Emma Grace Frost|19,427|Boston<br>  Cable|Nathan Christopher Charles Summers|18,894|Westchester County<br>  Nightcrawler|Kurt Wagner|18,886|Bavaria<br>  Spider-Woman|Gwendolyne Maxine Stacy|18,375|Forest Hills<br>  Cyclops|Scott Summers|18,344|Anchorage<br>  Psylocke|Elizabeth Braddock|18,168|Maldon&#39;, &#39;\n&#39;)) x<br>)<br>;</pre><pre>// securityadmin can create roles<br>use role securityadmin;</pre><pre>create role marketing;<br>create role cust_service;<br>create role lawyer;</pre><pre>// our user &#39;felipe&#39; will be able to use these roles<br>grant role marketing to user felipe;<br>grant role cust_service to user felipe;<br>grant role lawyer to user felipe;</pre><pre>// create a role that we&#39;ll use for common permissions<br>create role masks_demo_role;<br>grant role masks_demo_role to role marketing;<br>grant role masks_demo_role to role cust_service;<br>grant role masks_demo_role to role lawyer;</pre><pre>// back to sysadmin, to grant permissions to tables and create policies<br>use role sysadmin;<br>grant usage on database masks_db to masks_demo_role;<br>grant usage on schema masks_db.public to masks_demo_role;<br>grant select on table masks_db.public.customers to masks_demo_role;</pre><pre>// creating and applying &quot;empty&quot; policies, making it easier to modify later<br>create or replace masking policy mask_hero as (val string) returns string -&gt; val;<br>create or replace masking policy mask_name as (val string) returns string -&gt; val;<br>create or replace masking policy mask_fame as (val int) returns int -&gt; val;<br>alter table customers modify column hero_name set masking policy mask_hero; <br>alter table customers modify column birth_name set masking policy mask_name; <br>alter table customers modify column fame set masking policy mask_fame;</pre><pre>// these are the real policy implementations, we can play here with their code<br>alter masking policy mask_hero set body -&gt;<br>case <br>when current_role() in (&#39;MARKETING&#39;, &#39;SYSADMIN&#39;) then val<br>else &#39;&#39;<br>end;</pre><pre>alter masking policy mask_name set body -&gt;<br>case <br>when current_role() in (&#39;LAWYER&#39;, &#39;SYSADMIN&#39;) then val<br>when current_role() in (&#39;CUST_SERVICE&#39;) then regexp_substr(val, &#39;[^ ]*&#39;)<br>else &#39;&#39;<br>end;</pre><pre>alter masking policy mask_fame set body -&gt;<br>case <br>when current_role() in (&#39;MARKETING&#39;, &#39;SYSADMIN&#39;) then val<br>when current_role() in (&#39;CUST_SERVICE&#39;) then pow(10, round(log(10, val)))::int<br>else -1<br>end;</pre><pre>// try the different roles and see what results they get<br>// <a href="https://i.imgur.com/q1jGMZe.gif">https://i.imgur.com/q1jGMZe.gif</a><br>use role marketing;<br>use role cust_service;<br>use role lawyer;</pre><pre>select *<br>from customers;</pre><pre>// introspection: what policies were created and applied<br>use role sysadmin;<br>describe table customers;<br>show masking policies;</pre><h3>Next steps</h3><ul><li>Check the <a href="https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html">docs</a>.</li><li>Discover further features, like <a href="https://www.snowflake.com/blog/masking-semi-structured-data-with-snowflake/">dynamic masks on semi-structured data</a>.</li><li>Can we go beyond roles? In future posts we can review masking data based on the person running the query individual characteristics.</li><li>New in Snowflake: <a href="https://docs.snowflake.com/en/user-guide/security-row-intro.html">Row level policies</a>.</li><li>Check how <a href="https://www.youtube.com/watch?v=NPQOTm1XITw">JetBlue Secures and Protects Data Using dbt and Snowflake</a>.</li></ul><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FNPQOTm1XITw%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DNPQOTm1XITw&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FNPQOTm1XITw%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/4079ca074005c1bcb28133660daa9c6c/href">https://medium.com/media/4079ca074005c1bcb28133660daa9c6c/href</a></iframe><h3>Want more?</h3><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a> and <a href="https://www.linkedin.com/in/hoffa/">LinkedIn</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><p><a href="https://github.com/Snowflake-Labs/awesome-snowflake">Snowflake-Labs/awesome-snowflake</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=20a00773f951" width="1" height="1" alt=""><hr><p><a href="https://medium.com/snowflake/dynamic-data-masks-protect-sensitive-data-with-snowflake-superhero-edition-20a00773f951">Dynamic Data Masks: Protect sensitive data with Snowflake (superhero edition)</a> was originally published in <a href="https://medium.com/snowflake">Snowflake</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Open source boards and tenure terms: The FSF has stalled]]></title>
            <link>https://medium.com/swlh/open-source-boards-and-tenure-terms-e8cadb02035?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/e8cadb02035</guid>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[visualization]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[board-of-directors]]></category>
            <category><![CDATA[open-source]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Wed, 14 Apr 2021 03:49:09 GMT</pubDate>
            <atom:updated>2021-04-14T20:26:22.556Z</atom:updated>
            <content:encoded><![CDATA[<h4>There’s plenty of research establishing that boards should refresh their members regularly. Let’s see the numbers for the Free Software Foundation, Apache Software Foundation, Python Software Foundation and Open Source Initiative.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/591/1*eLOqrnSxK1k8Y32Fx02bIg.png" /></figure><h3>Rotation in open source boards: Average # of years served by members</h3><p>The ASF, PSF, and OSI keep renewing their board members. On average each member serves less than 8 years.</p><p>Meanwhile the FSF average # of years served per board members keeps growing — currently &gt;13.83 years. Or more, if the count started before 2001.</p><h3>Why boards should rotate</h3><p>The following research articles show why institutions need to keep boards fresh, regardless if they are non-profit or not:</p><p><a href="https://www.boardeffect.com/blog/best-practices-nonprofit-board-term-limits/">Best Practices for Nonprofit Board Term Limits | BoardEffect</a></p><blockquote>Rotating new board directors into the boardroom and on committees prevents the board from becoming stale. The IRS favors term limits because they believe that static board membership leads to unhealthy attitudes, which can cause boards to govern out of self-interest rather than community interest. Boards that have a majority of longstanding members may intimidate newer members, causing them to hold back with new thoughts and ideas.</blockquote><p><a href="https://corpgov.law.harvard.edu/2017/02/09/board-refreshment-trends-at-sp-1500-firms/">Board Refreshment Trends at S&amp;P 1500 Firms</a></p><blockquote><strong>Tenure Trends Reversing… And May Reverse Again: </strong>Investors’ concern — warranted or not — over rising director/board tenure is based in reality. Average boardroom tenure steadily rose from 8.4 years in 2008 to a peak of nine years in 2013 before slowly reversing course from 2014 to 2016 (YTD). As a result, average director tenure at S&amp;P 1500 firms now stands at a level — 8.7 years — last recorded in 2010. Moving in a similar pattern, median board tenure across all S&amp;P 1500 directorships rose from six years to seven years in 2009, but has remained steady from 2010 to 2016. Absent intervention by boards, however, structural issues — especially rising mandatory retirement ages — could cause average and median tenures to climb again in a few years.</blockquote><h3>Source data</h3><p><a href="https://medium.com/u/1b52561d9756">Brian Fitzpatrick</a> researched and shared this data in a Google sheet:</p><blockquote>Thread: This tweet got me to thinking: is there a way to show just how broken the FSF is as an organization? Is there a way to show how their board isn’t really a functioning board, but rather an enabling body for RMS? Turns out, there is! And I will show you in *four* tweets.</blockquote><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/therealfitz/status/1382020165664190464">@therealfitz</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/therealfitz/status/1382021585050173445">@therealfitz</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><h3>Sheet to SQL</h3><p>I found it difficult to manipulate these numbers within the format that Brian chose for these sheets, so I brought everything to SQL.</p><p>I <a href="https://github.com/fhoffa/snowflake_snippets/blob/main/open_source_boards/board_tenure.sql">copy pasted the whole sheets into Snowflake, parsed with SQL</a>, and added window functions to count the number of years that each member served:</p><pre>// see <a href="https://github.com/fhoffa/snowflake_snippets/blob/main/open_source_boards/board_tenure.sql">https://github.com/fhoffa/snowflake_snippets/blob/main/open_source_boards/board_tenure.sql</a></pre><pre>unpivot as (<br>    select f<br>        , split(y.value, &#39;\t&#39;)[0]::string who, z.index+2000-5 year<br>        , row_number() over(partition by f, who order by year) years_served<br>    from sheet4<br>        , table(split_to_table(x, &#39;\n&#39;)) y<br>        , table(split_to_table(y.value, &#39;\t&#39;)) z <br>    where z.index&gt;5<br>    and z.value=&#39;1&#39;<br>)</pre><pre>-- select *<br>-- from unpivot;</pre><pre>select f, year, avg(years_served)<br>from unpivot<br>group by 1,2<br>order by 1,2</pre><h3>Above average terms</h3><p>With counts starting in 2001:</p><p><strong>FSF</strong>: Gerald Sussman (21), Geoffrey Knauth (21), Richard Stallman (20), Henry Poole (19), Benjamin Mako Hill (13), Hal Abelson (10)</p><p><strong>ASF</strong>: Jim Jagielski (17), Greg Stein (16), Sam Ruby (14), Bertrand Delacretaz (11), Shane Curcuru (10), Roy T. Fielding (9), Brett Porter (9)</p><p><strong>PSF</strong>: Tim Peters (13), Martin von Löwis (11), Van Lindberg (9), Steve Holden (9)</p><p><strong>OSI</strong>: Simon Phipps (9)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/544/1*FLMWUD07UwePOJnvRcOMqQ.png" /><figcaption>Board members that have served for 9 years or more</figcaption></figure><h3>Shared sheet</h3><p>Find the<a href="https://docs.google.com/spreadsheets/d/1cjPQSEaDIzDzSbtp3kjSMlw1tu6YN1Q6hc1m2Lp7yMY/edit#gid=0"> unpivoted data on my own shared sheet</a>:</p><p><a href="https://docs.google.com/spreadsheets/d/1cjPQSEaDIzDzSbtp3kjSMlw1tu6YN1Q6hc1m2Lp7yMY/edit#gid=0">Open source boards and tenure terms</a></p><h3>Learn more</h3><p>I’m looking forwards to <a href="https://medium.com/u/6da8e4bf1445">Matt Asay</a> take on this data!</p><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/mjasay/status/1382150951562215427">@mjasay</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e8cadb02035" width="1" height="1" alt=""><hr><p><a href="https://medium.com/swlh/open-source-boards-and-tenure-terms-e8cadb02035">Open source boards and tenure terms: The FSF has stalled</a> was originally published in <a href="https://medium.com/swlh">The Startup</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Eva’s first days of Snowflake+Tableau visualizations]]></title>
            <link>https://medium.com/snowflake/evas-first-days-of-snowflake-tableau-visualizations-604ffcaf6e4d?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/604ffcaf6e4d</guid>
            <category><![CDATA[open-data]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[visualization]]></category>
            <category><![CDATA[tableau]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Mon, 12 Apr 2021 19:35:58 GMT</pubDate>
            <atom:updated>2021-04-12T19:37:32.324Z</atom:updated>
            <content:encoded><![CDATA[<h3>Eva’s first 7 days of Snowflake+Tableau visualizations</h3><h4>Eva’s challenge: Produce a new daily visualization with Snowflake and Tableau for 30 days. Let’s check on her progress during her first week of this ongoing challenge.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*kaqbcbDX6g5i8FeZ.png" /><figcaption><a href="https://www.youtube.com/watch?v=D4QLU_JFD2M">Visualizing Snowflake’s Marketplace Data in Tableau</a></figcaption></figure><p><a href="https://twitter.com/TriMyData">Eva Murray</a> is a Tableau Zen master and a leader for #<a href="https://www.makeovermonday.co.uk/">MakeOverMonday</a> — and she just joined Snowflake! She loves data storytelling, and when she found thousands of datasets in the Snowflake Data Marketplace — she decided to take this challenge. We taped <a href="https://www.youtube.com/watch?v=D4QLU_JFD2M">a call discussing her first 7 days of visualizations </a>— and you can find it here:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FD4QLU_JFD2M%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DD4QLU_JFD2M&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FD4QLU_JFD2M%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/24594c0abd4a7e6d756f15f0a69ea123/href">https://medium.com/media/24594c0abd4a7e6d756f15f0a69ea123/href</a></iframe><p>Let’s check some of her visualizations and my thoughts:</p><h3>The visualizations</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*u871Syc-24QlHzt5jnuQYQ.png" /><figcaption>Eva’s first 7 days of visualizations</figcaption></figure><ul><li><a href="https://www.linkedin.com/pulse/employment-sports-data-marketplace-day-1-eva-murray/">Employment in Sports</a></li><li><a href="https://www.linkedin.com/pulse/sports-footwear-exports-data-marketplace-day-2-eva-murray/">Sports Footwear Exports</a></li><li><a href="https://www.linkedin.com/pulse/stanley-cup-winners-data-marketplace-day-3-eva-murray/">Stanley Cup Winners</a></li><li><a href="https://www.linkedin.com/pulse/employment-sports-2019-gender-split-data-day-4-eva-murray/">Employment in Sports 2019 (Gender Split)</a></li><li><a href="https://www.linkedin.com/pulse/twenty20-cricket-rankings-2020-data-marketplace-day-eva-murray/">Twenty20 Cricket Rankings in 2020</a></li><li><a href="https://www.linkedin.com/pulse/starbucks-visits-during-start-pandemic-data-day-6-eva-murray/">Starbucks visits during the start of the pandemic</a></li><li><a href="https://www.linkedin.com/pulse/tourism-brand-performance-2019-data-marketplace-day-eva-murray/">Tourism Brand Performance 2019</a></li></ul><h4>My Notes</h4><p>First I’m in awe of her courage to attempt this challenge. As you can see above, every day she picks up a new domain — so she needs to quickly understand it enough to be work within it. Then she digs into the data to find an interesting story and a suitable visualization. And then she performs the most dangerous step: Posting the visualizations and findings on the Internet. This means she opens up to criticism and complaints, from anyone on the Internet and with the desire to reply. Fortunately her experience with the #MakeOverMonday project has prepared her exactly for this — leveraging the power of community to learn and improve.</p><h4>My least favorite visualization: “<a href="https://www.linkedin.com/pulse/employment-sports-2019-gender-split-data-day-4-eva-murray/">Employment in Sports 2019 (Gender Split)</a>”</h4><p>As I looked at this visualization I started wondering — “why are these slopes and not bars?”, “what is the pattern?”, “is there an interesting story here?”.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*x9-XQ8qFfc-INxjp" /><figcaption><a href="https://www.linkedin.com/pulse/employment-sports-2019-gender-split-data-day-4-eva-murray/">Employment in Sports 2019 (Gender Split)</a> by Eva Murray</figcaption></figure><p>As you look at this data you’ll find that the differences between countries and genders are pretty small — all around 0.3% and 2.3%. To me this looks almost like a random distribution, where any gender could have the lead by arbitrary reasons.</p><p>But then that becomes a highlight of Eva’s chosen visualization: It makes it easy for me to figure that out. Thanks to this visualization choice and annotations I could quickly figure out that all numbers are constrained to a small range — and the slope shows that when there’s a difference in many cases it’s almost non noticeable. And then, where there’s a significant difference, the slope quickly reveals it.</p><p>All these thoughts and ideas that first made this one my “least favorite” ended up making it one of the most interesting choices, and full of lessons for me.</p><h4>My favorite visualization: “<a href="https://www.linkedin.com/pulse/sports-footwear-exports-data-marketplace-day-2-eva-murray/">Sports Footwear Exports</a>”</h4><p>This one I found fascinating: By playing with the data, Eva was able to uncover a major unnoticed trend — in less than 10 years someone in Belgium was able to create an industry that makes almost 3 billion euros a year by importing and exporting shoes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*PzchO3-Xjo8bduok" /><figcaption><a href="https://www.linkedin.com/pulse/sports-footwear-exports-data-marketplace-day-2-eva-murray/">Sports Footwear Exports</a> by Eva Murray</figcaption></figure><p>How did this happen and what’s the story behind? This chart has <a href="https://www.worldfootwear.com/news/belgium-continues-to-strength-its-position/2893.html">made me really curious to learn more</a>. I foresee many business schools teaching this story for years, once they reach this same discovery.</p><h3>How-to</h3><h4>Tableau and Snowflake</h4><p><a href="https://help.tableau.com/current/pro/desktop/en-us/examples_snowflake.htm">Connecting Tableau and Snowflake</a> is really easy — and also navigating all the metadata to find interesting datasets. Check the video where we show how Eva managed to discover these datasets within Tableau.</p><h4>Datasets</h4><p>Most of the data for these visualizations came out of the Knoema listings in the Snowflake Data Marketplace. To learn more about Knoema — <a href="https://medium.com/snowflake/knoema-datasets-and-the-snowflake-data-marketplace-challenge-95ff0d7ac6eb">check my previous posts</a>, and <a href="https://www.youtube.com/watch?v=cxirafrDXkc">my conversation with Knoema’s CEO</a>:</p><p><a href="https://medium.com/snowflake/knoema-datasets-and-the-snowflake-data-marketplace-challenge-95ff0d7ac6eb">Knoema Datasets and the Snowflake Data Marketplace Challenge</a></p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FcxirafrDXkc&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcxirafrDXkc&amp;image=http%3A%2F%2Fi.ytimg.com%2Fvi%2FcxirafrDXkc%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/5e8d03ff624bba648c3154d89776a0f9/href">https://medium.com/media/5e8d03ff624bba648c3154d89776a0f9/href</a></iframe><h3>Want more?</h3><p>Follow Eva on <a href="https://twitter.com/TriMyData">Twitter</a> and <a href="https://www.linkedin.com/in/evamurray1/">LinkedIn</a>.</p><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a>, <a href="https://www.linkedin.com/in/hoffa/">LinkedIn</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><p><a href="https://github.com/Snowflake-Labs/awesome-snowflake">Snowflake-Labs/awesome-snowflake</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=604ffcaf6e4d" width="1" height="1" alt=""><hr><p><a href="https://medium.com/snowflake/evas-first-days-of-snowflake-tableau-visualizations-604ffcaf6e4d">Eva’s first days of Snowflake+Tableau visualizations</a> was originally published in <a href="https://medium.com/snowflake">Snowflake</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Funnel analytics with SQL: MATCH_RECOGNIZE() on Snowflake]]></title>
            <link>https://towardsdatascience.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1?source=rss-279fe54c149a------2</link>
            <guid isPermaLink="false">https://medium.com/p/8bd576d9b7b1</guid>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[google-analytics]]></category>
            <dc:creator><![CDATA[Felipe Hoffa]]></dc:creator>
            <pubDate>Fri, 26 Mar 2021 16:18:28 GMT</pubDate>
            <atom:updated>2021-03-29T19:56:38.337Z</atom:updated>
            <content:encoded><![CDATA[<h4>SQL is a great tool, but it can’t do everything. For example, figuring out a way to analyze funnels and sessions with SQL is truly difficult — until now. Let’s discover the power of MATCH_RECOGNIZE(), by analyzing a Google Analytics e-commerce funnel with Snowflake.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/0*MAjUuFLLpv85R1sK.jpg" /><figcaption>Image: <a href="https://pixabay.com/photos/architecture-buildings-cars-city-1837176/">Pixabay</a></figcaption></figure><p>Check <a href="https://medium.com/u/eb13b4c116f4">TJ Murphy</a>’s reaction when he discovered that Snowflake now has support for MATCH_RECOGNIZE:</p><style>body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}</style><blockquote class="twitter-tweet" data-conversation="none" data-align="center" data-dnt="true"><p>&#x200a;&mdash;&#x200a;<a href="https://twitter.com/teej_m/status/1374868964598194178">@teej_m</a></p></blockquote><script src="//platform.twitter.com/widgets.js" charset="utf-8"></script><script>function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});</script><script>if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500  < maxWidth) {window.frameElement.setAttribute("width", "500");}}</script><h3>Sample use case: Google Analytics e-commerce funnel with Snowflake</h3><h4><strong>How would you answer this funnel question with SQL:</strong></h4><blockquote>“We have an e-commerce shop. We need to identify sessions that land into the website from an arbitrary referrer (Instagram, Facebook, etc), and that end up buying something. How many steps did it take for these users to buy something, and how many times they tried to pay before success?”</blockquote><p>That’s not an easy question to answer with plain SQL. But MATCH_RECOGNIZE finally gives us an elegant way to find answers to question like this one.</p><p>With MATCH_RECOGNIZE we can simply define the steps we want to find within a series of events, and then we can define any pattern for these steps — and then MATCH_RECOGNIZE does the rest.</p><p>For this example we’re going to use a Google Analytics 360 dataset from an e-commerce shop. Nothing better than real data to prove the power of our tools.</p><h4>Step 1: Load Google Analytics data into Snowflake</h4><p>Taking data out of BigQuery and into Snowflake is easy. You might need tools to build a reliable pipeline, but the basic steps are easy:</p><ul><li>Export from BigQuery to GCS: You can use your own GA data, or a <a href="https://support.google.com/analytics/answer/7586738?hl=en">sample dataset</a> with data from the Google Merchandise store.</li><li><a href="https://stackoverflow.com/a/64168195/132438">Load GA data into Snowflake</a>: This is straightforward. Snowflake will happily ingest files from GCS, even if your Snowflake account lives in AWS. Loading JSON works like magic too. You might want to set-up authentication and permissions for maximum security, while for sample data I just used a public bucket.</li></ul><pre>create or replace table ga_demo2(src variant);</pre><pre>copy into ga_demo2<br>from &#39;gcs://fhoffa/ga360/ga_sessions000000000000&#39;<br>file_format=(type=&#39;JSON&#39;);</pre><ul><li>Query in Snowflake: Notice above that we loaded the JSON rows into a variant type. Snowflake makes working with semi-structured data a joy; you can easily navigate the Google Analytics schema. We can also create a view out of the basic query, to make all the following steps easier to follow:</li></ul><pre>create or replace view ga360_parsed<br>as<br>select src:visitStartTime::timestamp ts<br>    , hit.value:hitNumber::int hit<br>    , src:visitId::string visit<br>    , hit.value:eCommerceAction.action_type::int action_t<br>    , hit.value:eCommerceAction.step::int action_s<br>    , hit.value:eCommerceAction.option::string action_o<br>    , hit.value:page.pagePath::string path<br>    , hit.value:referer::string referer<br>    , src:fullVisitorId::string visitor<br>    , src:totals.transactions::int total_transactions<br>from ga_demo2, lateral flatten(input =&gt; src:hits) hit</pre><h4>Step 2: Querying the funnel is fun</h4><p>This “straightforward” query answers the question we were looking to answer:</p><pre>select count(*) visits<br>  , avg(steps) avg_steps<br>  , avg(payment_attempts) avg_pay_attempts<br>  , max(payment_attempts) max_pay_attempts<br>from ga360_parsed<br>match_recognize(<br>    partition by visit<br>    order by hit<br>    measures count(*) steps, count(payment.*) payment_attempts<br>    one row per match<br>    pattern(from_plex (payment | anything)*  completed)<br>    define<br>        from_plex as referer = &#39;<a href="https://mall.googleplex.com/&#39;">https://mall.googleplex.com/&#39;</a><br>        , completed as path like &#39;/ordercompleted.html&#39;<br>        , payment as action_o = &#39;Payment&#39;<br>);</pre><p>Turns out 21 visits landed from that specific referrer into our site, and ended up purchasing something. On average it took them 35.6 steps to do so, and 1.7 attempts to pay — with at least one session having nine pay attempts before success.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/676/1*PMkINOqoHFWTNMo0VukFdQ.png" /><figcaption>Results for our funnel question</figcaption></figure><h3>Understanding MATCH_RECOGNIZE</h3><p>You can follow the <a href="https://docs.snowflake.com/en/user-guide/match-recognize-introduction.html">official docs</a> to discover all the powers of MATCH_RECOGNIZE — here we’ll look at the basics of the query above.</p><h4>define</h4><p>Let’s start with the define section, that is used to define steps in the user’s funnel:</p><pre>define<br>        from_plex as referer = &#39;<a href="https://mall.googleplex.com/&#39;">https://mall.googleplex.com/&#39;</a><br>        , completed as path like &#39;/ordercompleted.html&#39;<br>        , payment as action_o = &#39;Payment</pre><p>You can see that we gave an arbitrary name to each step, and that the conditions can be whatever you need them to be. One step for us looks at the referer, another one looks at the users landing on a specific path, and the payment steps are identified by the action variable that Google Analytics stores.</p><h4>pattern</h4><p>The pattern section is basically regular expressions for funnel steps:</p><pre>pattern(from_plex (payment | anything)*  completed)</pre><p>This says “<em>look for sessions that start with the </em><em>from_plex step, that are then followed by any number of </em><em>payment or </em><em>anything steps, and that end up in the </em><em>completed state.</em>”</p><h4>match_recognize</h4><p>With define andpattern established, we can do a basic match_recognize query:</p><pre>select classified_as, hit, visit, action_o, referer ref, path<br>from ga360_parsed<br>match_recognize(<br>    partition by visit<br>    order by hit<br>    measures classifier() as classified_as<br>    all rows per match<br>    pattern(from_plex (payment | anything)*  completed)<br>    define<br>        from_plex as referer = &#39;<a href="https://mall.googleplex.com/&#39;">https://mall.googleplex.com/&#39;</a><br>        , completed as path like &#39;/ordercompleted.html&#39;<br>        , payment as action_o = &#39;Payment&#39;<br>);</pre><p>We added 3 constructs to that query. First, how to split and order our sessions: partition by visit order by hit. Then an added column that will show how each row is classified in our funnel: measures classifier(). And then our desire to see all rows of each funnel, instead of just a one-row summary: all rows per match. That’s what allows us to see the following results, with the steps classified as “payment” highlighted:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TYqDcTo8RA3tmBFluXexJg.png" /><figcaption>Showing all rows per match</figcaption></figure><h4>Answering funnel questions</h4><p>With each funnel step identified, we can answer the opening question. Instead of selecting all rows, we can ask for one per funnel with a summary: one row per match. We add the measures we are interested in: measures count(*) steps, count(payment.*) payment_attempts. MATCH_RECOGNIZE does the rest of the job, and we can go back to our usual SQL ways to get counts, averages, and other metrics.</p><h3>Next steps</h3><p>Funnel analytics is just one example. Think of all the possibilities with MATCH_RECOGNIZE in your toolbox: Threat detection, finding trends within weather and stock prices trend lines, and much more.</p><p>Let’s explore these topics in future posts, and share your best ideas and results.</p><h3>Read more</h3><ul><li>Official Snowflake MATCH_RECOGNIZE() <a href="https://docs.snowflake.com/en/user-guide/match-recognize-introduction.html">docs</a>.</li></ul><h3>Want more?</h3><p>I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can <a href="https://twitter.com/felipehoffa">follow me on Twitter</a>, and check <a href="https://www.reddit.com/r/snowflake/">reddit.com/r/snowflake</a> for the most interesting Snowflake news.</p><p><a href="https://github.com/Snowflake-Labs/awesome-snowflake">Snowflake-Labs/awesome-snowflake</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8bd576d9b7b1" width="1" height="1" alt=""><hr><p><a href="https://towardsdatascience.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1">Funnel analytics with SQL: MATCH_RECOGNIZE() on Snowflake</a> was originally published in <a href="https://towardsdatascience.com">Towards Data Science</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>