EA Certification Study Guide Part 3: Einstein Discovery Story Design

Kelsey Shannon
5 min readMay 12, 2019

--

For this edition of the study guide we will be reviewing the area of Einstein Analytics I knew the least about going into this exam: Einstein Discovery. Recently included in the “Einstein Analytics Plus” bundle by Salesforce, Einstein Discovery is a self serve data science tool which allows anyone to analyze and find predictions in a dataset.

Source: Einstein Analytics Tech Lounge, 4/4/2019

Sources:

  1. Trailhead Badges
    -Salesforce Einstein Basics
    -Einstein Discovery Basics
    -Einstein Discovery Stories
    -Einstein Discovery Story Insights
  2. Salesforce Help Documentation
    -Einstein Discovery
  3. Documentation in Exam Guide
    -Salesforce Winter ’19 Release Notes
  4. Einstein Analytics Training Videos
    -Video 15: Einstein Discovery
  5. External Resources
    -Salesforce Einstein…What?
    -The Big Book of Customer Predictions
    -Einstein Discovery Demo
    -Einstein Analytics Winter ’19 Sneak Peak
    -Einstein Discovery
    -What is Einstein Discovery and What Can It Do For Your Business?
    -How to set up an Einstein Discovery Writeback

For this section, I recommend not only going through the Trailheads but also uploading a dataset from your Salesforce environment and seeing what insights you can find.

Notes:

Einstein Discovery

  • Data science tool that predicts outcomes, analyzes data, and finds trends
    — Utilizes data from Einstein Analytics datasets
    — Allows you to quickly generate insights and drill down from there
    — Use as an additional team member, not a replacement for your BI team
  • Setting up ED
    — Users must be granted the Einstein Analytics Plus User or Einstein Analytics Plus Admin permission set
    — Next, enable Einstein Discovery in your Einstein “Getting Started” settings
  • Terms
    Story: A collection of insights that help you explore relationships between a business-relevant metric and possible contributing factors
    Model metrics: Describes the performance of the predictive model and it’s underlying statistical details
    Outcome variable: The single, predominant measure used as the focus of the story

Prepare Data

  • Data for the story is generated from a single dataset in Einstein Analytics. This could be an individually imported CSV table or sourced from a dataflow. Limitations:
    — ED doesn’t respect security predicates during the analysis and you must have “ignore predicate” permissions to create a story that contains predicates
    — Datasets with inherited sharing won’t work with ED
    — The chosen dataset must have at least 10,000 rows
    — Chosen variables in the dataset must have at least 25 populated rows
  • Choosing your variables
    — Start with a few columns and continually expand the variables as you iterate
    — After selecting the outcome variable, choose input variables that could contribute to that outcome to include in the dataset. Ideally try to get the maximum amount of information from a minimal amount of variables
    Overfitting: Common mistake of choosing too many variables in a model. Makes noise in the algorithm and overcomplicates the results.
    Underfitting: Excessively simplifying the model, making it hard for the algorithm to find patterns
  • Prepare your data ahead of time to get better results. High quality input = high quality output
    — Remove extreme values and outliers
    — Repair or remove records with missing or incorrectly input values
    — Review distributions for skew
    — Don’t include fields that contain too many unique values. Ex: zip codes, names
    — Minimize duplicate or highly correlated values

Analyze Results

  • An Einstein Discovery story displays four insight types users can navigate through and analyze.
    — Insights are ordered from the most to least statistically significant
    — Insights at the top explain the most variation for the outcome variable
    — In a chart, a grey bar means values aren’t statistically significant or are statistically unsound

Insight types:

  • What Happened — Descriptive insight, finds elements that have the most statistical significance. Terms:
    — Difference from overall: How far above or below average the category is
    — Average: Sum of every value divided by the count of values
    — Standard deviation: How much items in the category differ from the average
What Happened Insight — Source
  • Why It Happened — Diagnostic insight, what factors led to this outcome? Why = a high correlation, not causation. Terms:
    — Impact: The effect the variable has on the outcome
    — Coefficient: The value you attribute the analysis
    — Precluded sum/count: Impact for the average variable that isn’t being included in this particular analysis
    — Global frequency: How often the variable appears in the entire dataset
    — Conditional frequency: How often the variable appears in this section of the dataset
Why It Happened Insight: Source
  • Predictions and Improvements — Predictive insight
    — Predictions: Waterfall charts
    — Improvements: Bar charts
    — In this section select different variables to build out recommendations and “what if” analyses
Prediction Insight: Source
  • What is the Difference — Comparative insight
    — Helps you understand the relationship between variables using waterfall charts
    — Similar to predictions and improvements, you must choose a variable for an analysis to appear
    — Can choose a secondary variable and compare the two
    — Optionally, add a filter to further focus the analysis to a subset
What is the Difference Insight: Source
  • Other Terms
    First order analysis: Type of insight that examines how one variable explains the variation in the outcome variable
    Second order analysis: Type of insight that examines how two variables together explain variation in the outcome variable

Adjust Parameters and Alter Data

  • Continuously improve
    — Can schedule the analysis to continually add new data to the model
    — Periodically review and update model variables over time
  • Improvements toolbar: ED feature that detects extreme or identical values, variables with little variation and highly correlated variables. Can utilize to improve your story.

Writeback to Salesforce

  1. Install Einstein Discovery Managed Package
  2. Create a story for the Salesforce object in question and and use the “Deploy Model” feature
  3. Create three custom fields on the object to display the outcome, explanation, and prescriptive information from the model
  4. Inside settings, connect the ED module to these custom fields
  5. Create an Apex Trigger to write to these fields (can use Bulk API with data loader or workbench to kick off this trigger on all existing records)
  6. Add the Einstein Discovery Recommendation widget to the object’s lightning page

Utilize the Model Manager inside ED to manage the predictions and models that have been deployed to Salesforce.

  • Other display options:
    — Export story to Quip
    — Display in Einstein Analytics
    — Export results into a PowerPoint, Word Document, or PDF

Limitations of Einstein Discovery

  • Max of 20 stories run per day
  • Max of 500 stories created per month
  • Max of 2 concurrent stories per org

Next: Dashboard Implementation

--

--