EA Certification Study Guide Part 3: Einstein Discovery Story Design
For this edition of the study guide we will be reviewing the area of Einstein Analytics I knew the least about going into this exam: Einstein Discovery. Recently included in the “Einstein Analytics Plus” bundle by Salesforce, Einstein Discovery is a self serve data science tool which allows anyone to analyze and find predictions in a dataset.
Sources:
- Trailhead Badges
-Salesforce Einstein Basics
-Einstein Discovery Basics
-Einstein Discovery Stories
-Einstein Discovery Story Insights - Salesforce Help Documentation
-Einstein Discovery - Documentation in Exam Guide
-Salesforce Winter ’19 Release Notes - Einstein Analytics Training Videos
-Video 15: Einstein Discovery - External Resources
-Salesforce Einstein…What?
-The Big Book of Customer Predictions
-Einstein Discovery Demo
-Einstein Analytics Winter ’19 Sneak Peak
-Einstein Discovery
-What is Einstein Discovery and What Can It Do For Your Business?
-How to set up an Einstein Discovery Writeback
For this section, I recommend not only going through the Trailheads but also uploading a dataset from your Salesforce environment and seeing what insights you can find.
Notes:
Einstein Discovery
- Data science tool that predicts outcomes, analyzes data, and finds trends
— Utilizes data from Einstein Analytics datasets
— Allows you to quickly generate insights and drill down from there
— Use as an additional team member, not a replacement for your BI team - Setting up ED
— Users must be granted the Einstein Analytics Plus User or Einstein Analytics Plus Admin permission set
— Next, enable Einstein Discovery in your Einstein “Getting Started” settings - Terms
— Story: A collection of insights that help you explore relationships between a business-relevant metric and possible contributing factors
— Model metrics: Describes the performance of the predictive model and it’s underlying statistical details
— Outcome variable: The single, predominant measure used as the focus of the story
Prepare Data
- Data for the story is generated from a single dataset in Einstein Analytics. This could be an individually imported CSV table or sourced from a dataflow. Limitations:
— ED doesn’t respect security predicates during the analysis and you must have “ignore predicate” permissions to create a story that contains predicates
— Datasets with inherited sharing won’t work with ED
— The chosen dataset must have at least 10,000 rows
— Chosen variables in the dataset must have at least 25 populated rows - Choosing your variables
— Start with a few columns and continually expand the variables as you iterate
— After selecting the outcome variable, choose input variables that could contribute to that outcome to include in the dataset. Ideally try to get the maximum amount of information from a minimal amount of variables
— Overfitting: Common mistake of choosing too many variables in a model. Makes noise in the algorithm and overcomplicates the results.
— Underfitting: Excessively simplifying the model, making it hard for the algorithm to find patterns - Prepare your data ahead of time to get better results. High quality input = high quality output
— Remove extreme values and outliers
— Repair or remove records with missing or incorrectly input values
— Review distributions for skew
— Don’t include fields that contain too many unique values. Ex: zip codes, names
— Minimize duplicate or highly correlated values
Analyze Results
- An Einstein Discovery story displays four insight types users can navigate through and analyze.
— Insights are ordered from the most to least statistically significant
— Insights at the top explain the most variation for the outcome variable
— In a chart, a grey bar means values aren’t statistically significant or are statistically unsound
Insight types:
- What Happened — Descriptive insight, finds elements that have the most statistical significance. Terms:
— Difference from overall: How far above or below average the category is
— Average: Sum of every value divided by the count of values
— Standard deviation: How much items in the category differ from the average
- Why It Happened — Diagnostic insight, what factors led to this outcome? Why = a high correlation, not causation. Terms:
— Impact: The effect the variable has on the outcome
— Coefficient: The value you attribute the analysis
— Precluded sum/count: Impact for the average variable that isn’t being included in this particular analysis
— Global frequency: How often the variable appears in the entire dataset
— Conditional frequency: How often the variable appears in this section of the dataset
- Predictions and Improvements — Predictive insight
— Predictions: Waterfall charts
— Improvements: Bar charts
— In this section select different variables to build out recommendations and “what if” analyses
- What is the Difference — Comparative insight
— Helps you understand the relationship between variables using waterfall charts
— Similar to predictions and improvements, you must choose a variable for an analysis to appear
— Can choose a secondary variable and compare the two
— Optionally, add a filter to further focus the analysis to a subset
- Other Terms
— First order analysis: Type of insight that examines how one variable explains the variation in the outcome variable
— Second order analysis: Type of insight that examines how two variables together explain variation in the outcome variable
Adjust Parameters and Alter Data
- Continuously improve
— Can schedule the analysis to continually add new data to the model
— Periodically review and update model variables over time - Improvements toolbar: ED feature that detects extreme or identical values, variables with little variation and highly correlated variables. Can utilize to improve your story.
Writeback to Salesforce
- Install Einstein Discovery Managed Package
- Create a story for the Salesforce object in question and and use the “Deploy Model” feature
- Create three custom fields on the object to display the outcome, explanation, and prescriptive information from the model
- Inside settings, connect the ED module to these custom fields
- Create an Apex Trigger to write to these fields (can use Bulk API with data loader or workbench to kick off this trigger on all existing records)
- Add the Einstein Discovery Recommendation widget to the object’s lightning page
Utilize the Model Manager inside ED to manage the predictions and models that have been deployed to Salesforce.
- Other display options:
— Export story to Quip
— Display in Einstein Analytics
— Export results into a PowerPoint, Word Document, or PDF
Limitations of Einstein Discovery
- Max of 20 stories run per day
- Max of 500 stories created per month
- Max of 2 concurrent stories per org
Next: Dashboard Implementation