Two years ago, I was listening to Dylan Baker — my future co-founder — speak on a panel at a meetup. Dylan was venting about a frustrating experience LookML developers encounter daily: the drudgery of manually testing dimensions and measures from the Explore page to make sure they don’t error.
Dylan sighed, “I would pay for a tool that would go through each explore, click on every dimension, and run it to make sure there are no SQL errors.”
After the meetup, I found Dylan and said, “That tool you described… I think we can build it. Should we…
This post is adapted from an article I wrote for Looker about increasing data literacy and usage at Milk Bar.
“Wait, why does a bakery need a data engineer?” I get that a lot. I’m a data team of one at Milk Bar, the popular dessert brand by chef Christina Tosi, of Chef’s Table and MasterChef fame. In addition to sampling literally every cake, cookie, or truffle that our R&D team sends over to our office, I’m responsible for wrangling information across our omni-channel business. …
I occasionally need to grant a non-technical colleague the ability to input information into our data warehouse on an ad-hoc basis. For example, our customer service team at Milk Bar maintains a list of special wedding cake orders in Google Sheets that we need to collect data from for downstream calculations.
This is a tricky problem for a data engineer — my colleagues don’t have the technical skill to interact directly with our data stack, and I don’t want to have to support my own web form or similarly involved infrastructure to collect this information. …
When it comes to customer lifetime value (CLV), most people are doing it wrong, according to Wharton marketing professor Peter Fader. At face value, CLV is an easy concept to understand —it’s a measurement of how much a business’s customers are worth over their lifetime. In practice, it’s deceptively hard to implement in a way that accurately captures the variation in customer behavior. CLV is so valuable to every business that it’s worth putting in the time and study to estimate it properly.
To help you estimate CLV the right way, we’ll walk through the formal definition, examine the pitfalls…
At Milk Bar, we use Looker to serve up business intelligence across our company. Looker is our data buffet, and I expect our department heads to be able to self-serve the majority of their data requests using Looker. It’s been immensely popular, but I’ve also noticed that some people are slower to adopt Looker than others. Why might this be?
Looker provides some helpful usage charts in Admin > Usage
via the i__looker
Explore. Using this Explore, I can monitor usage across the company, but it’s hard to know why someone with low usage is not using the tool. …
There’s no time like the present to teach yourself data science, analytics, or engineering. A quick search on Udemy shows over 2,000 results for courses about “data.” People have even compiled their own Master’s degree programs in data science comprised entirely of free online courses.
In my experience as a self-taught data engineer, taking dozens of massive open online courses (MOOCs) is not the best approach. It didn’t work for me.
I didn’t have hours every night and weekend to spend studying. The lectures didn’t feel practical enough to launch me from a non-technical field to a job in data…
Like many data professionals at small and mid-size companies, I’m a data team of one at Milk Bar. As the first data hire, I’ve had the rather terrifying privilege of building our data stack from the ground up. I’ve spent the first few months of my time here building data loaders, modeling our data in BigQuery and dbt, and deploying and training our teams on Looker. Now that our data stack is functional, it’s time to plan for next quarter.
The value of business intelligence and analytics is quickly becoming apparent at Milk Bar, and more people are coming to…
Over time, software engineers have developed a strong philosophy for testing applications. Concepts like unit testing, the test pyramid, code coverage, and continuous integration have made application testing robust and have established solid design patterns. Good testing practices are taught and practiced in most computer science programs.
In my experience, a unified testing philosophy is missing in the data world. As a data professional, I tell people that my goal is to provide accurate and timely information to enhance decision-making. However, if I supply our decision-makers with inaccurate data, they might make far-reaching, strategic mistakes. If our website goes down…
tl;dr: BuzzFeed published an interesting collection of data today — disciplinary case files for about 1,800 New York Police Department (NYPD) employees who were “accused of misconduct.” I wrote a scraper to download the data in PDF and plain text format for large-scale analysis.
Unfortunately, the case files aren’t stored in a way that makes large-scale analysis very easy. Each case file is stored as a separate PDF, but there’s no clear way to download all of them. The raw text is stored behind a tab in a JavaScript interface.
Co-founder of @SpectaclesCI, analytics engineering @Spotify