In contrast with many analytics groups in the industry, Ro’s data analytics warehouse is updated close to something resembling “real time.” Ro’s Data team supports almost all departments in the company, including all operations team. As a simple example of real-time data for operations: the pharmacy operations team watches Looker dashboards all day to see if pharmacy order backlogs warrant an intervention; we promise 2-day delivery and we are good on our word!
The data team is often asked some variant of the question “How current is the data that I’m looking at?”
The answer is dependent on how many processes are required to get the data from its source to its destination. It may be anywhere from one process (a periodic syncing process) to several (simple example: periodic syncing process plus a periodic chain of processing tasks to refresh several linked derived tables). Abstractly, a zoom-out here will look like several processes whose timing is uncoupled and with each running periodically (ex: every ~15 minutes or so) but with no guarantee of starting or ending at a specific point on the hour. Essentially, data traveling to an access point in the data warehouse goes through a number of independently-timed processes, with each process recurring at a fixed rate. …
Estimated time for completion: 1.5 days
Before starting, make an account on linuxacademy.com .
Start by running through AWS basic concepts:
You can skim the “Conclusion” section
The worst but most important part of AWS is permissions (called IAM — Identity Access Management). Do the “Identity and Access Management” section here, including the lab:
The takeaway from this section will not be that you are a master AWS permissions admin but that you remember the general ideas about how policies and roles work. Pay special attention to the idea of an IAM user vs. …
Embedded Analysts work with a specific business team. They “solid line” report to the Data Team and “dotted line” report to their Stakeholder Team. An Embedded Analyst’s Dotted Line Manager will work with the Solid Line Manager and the analyst to set the analyst’s work priorities. An Embedded Analyst will spend large amounts of time physically sitting with their Stakeholder Team. Their top level goal is to move their Stakeholder Team forward.
Here’s our pitch
Have we reached out to you for a Data position? Do you think that Ro could be the place for you?
We believe that Ro can offer you the following:
Ro is a direct to consumer telehealth company that launched two years ago with its first digital health clinic, Roman (maybe you’ve seen the ads?). Since then, we’ve added two more clinics, many more medical conditions, and some very effective team members. We are steadily tracking on our very ambitious goals for improving the patient experience — 2020 is set to be quite a year, with several major product launches that will increase the scope of our offerings and operations well beyond even what we’ve built so far. …
Here on Ro’s data team we’ve implemented what we call the “Project Ownership” model. Each of our quarterly OKRs and projects is assigned an owner from the team who, along with their manager(s), bears ultimate responsibility for the success of said goal/project.
These individual contributors (“ICs”) are owners of the project regardless of how many other individuals — from the data team or otherwise — need to contribute work and thought towards it. …
Here are some of the processes, tools, and resources we provide to business units at Ro to get them the data support they need:
Every day at 10:30am and 4pm we hold “data clinic” in the company kitchen area. This is an office-hours style resource where anyone can come for any sort of data assistance, from pulling a specific number from Looker to unstructured training to Excel tips and tricks to sound boarding for analytic brainstorms.
Data clinic was implemented for two main reasons. The first is to provide easy access to the data team for anyone at Ro, no matter what it is you want to figure out or get done. The second is to keep the data team generally focused on the projects that have been strategically prioritized by all of the business departments; the data team is a helpful group, and we were having difficulty not always treating ad hoc questions we were getting via Slack / in-person as the most urgent thing that we needed to be doing. With data clinic, business departments get their day-to-day questions answered and the data team gets to be its happy and helpful self while still remaining focused on top strategic projects. …
On a quarterly basis, every individual on the Ro data team engages in ~2 days of self-guided training. The curriculum and materials for this training are decided on in advance, either by the individual or, more often, with the help of his/her colleagues. The curriculum is designed to augment that individual’s knowledge and skills in a manner that is practical for their long-term work at Ro; in certain cases, that team member may also be the first individual on the team to acquire said skills.
The training period culminates in a “sip-and-share” session, where each member of the team informally discusses the topics that they covered during their training. It usually doesn’t take much more than an example or a graph before the rest of the team dives headfirst into the topic at hand, demanding everything from practical tips to esoteric and completely unnecessary derivations. The “lesson” will often transition into a group discussion as different team members contribute answers to questions posed. …
This winter we switched from Redshift to Snowflake for our data analytics warehouse. Our top drivers in switching to Snowflake were:
Real-time data. Snowflake’s architecture “separates storage from compute”, meaning that reading and writing can occur in complete parallel without interfering with each other. With Snowflake, there is no performance impact if we have real-time data syncing - all data in our warehouse is current to within 30 minutes.
Handling concurrent queries. A Looker user refreshing a dashboard might generate 15–25 queries at once. …
There can be a lot of ambiguity around definitions for user retention and value. Here is the most cogent combination of definitions that we’ve found.
Member Churn: A member has churned if you believe that that member is gone for good. Churn is a boolean value (i.e. yes a customer has churned, or no a customer has not churned) — there is no in between.
Member Retention is the inverse of churn. A member is retained if they have not churned. …
(article #1 in series)
Ro uses Looker as its BI tool. Looker is based on the idea of analysts building Looker “models” written in its LookML language; these models describe a graphical point-and-click interface that a business-side individual can use to query the specified data sets without needing to write code.
Operations, data models additions/modifications, product offerings, and medical condition offerings are scaling far too quickly at Ro for the standard Looker development model to be sustainable without having a small army of a data team to provide the labor required. Applying our general analytic tendencies to the situation, we found that much of the creation and maintenance labor required for LookML models was repetitive and error prone. Separately, LookML lacked certain features that we were having trouble doing without, particularly around documentation and the idea of view blocks. …