How to deliver supercharged self-service insights with ThoughtSpot and Snowflake

Steve McGill
Cervello, a Kearney Company
16 min readJul 31, 2020

Data practitioners have all lived through the dashboard re-development, re-designs and re-visioning cycles that plague business intelligence initiatives. The reality is that there is “no one size fits all” solution. End users come with varied needs and capabilities both from an insights perspective as well as a visualization maturity perspective. This, in conjunction with the massive amounts of data being analyzed and reviewed, creates endless development cycles. Unfortunately for business users, there is only a limited number of data-able team members to support them.

One platform seeks to change this paradigm, ThoughtSpot. Today, I will tell you what ThoughtSpot is and how to quickly get it up and running using Snowflake data.

Introduction to ThoughtSpot

ThoughtSpot is a next-generation business intelligence (BI) platform that allows end users to generate insights directly from database contents using Google-like search expressions. Results can be displayed in tables, charts and pinboards. ThoughtSpot also generates automated AI driven insights into the data.

ThoughtSpot is a game-changer. ThoughtSpot is one of the few (or only) new entrant to be rated in the Leaders Gartner quadrant, and has been so rated for multiple consecutive years.

Image of Magic Quadrant from Qlik (https://www.qlik.com/us/gartner-magic-quadrant-business-intelligence)

ThoughtSpot & Snowflake

ThoughtSpot is a Snowflake technology partner and can be integrated with Snowflake using point-and-click selections, no coding. The advantages of integrating ThoughtSpot with Snowflake are:

  • Business users have direct, live access to a Snowflake data warehouse.
  • It no longer takes months, weeks or days to generate reports. Reports can be generated at the speed of thought.

The remainder of this article provides step-by-step instructions for integrating ThoughtSpot with Snowflake. Two sets of instructions are provided: a simple integration and a more robust integration. Both provide direct, live access to Snowflake data. The difference is that:

  • The simple integration assumes that business semantics have been captured in Snowflake, as they would be in a star-schema data mart.
  • The robust integration adds business semantics and user-based security within ThoughtSpot.

If you are interested in learning more about ThoughtSpot, feel free to reach out to me. I’ll be happy to provide you with a quick demo or, even better, help you understand how you can leverage it to superpower your organization’s access to data.

Simple Integration

The simple integration involves two basic steps: connect and search. You search the Snowflake data the exact same way that you would search data stored within ThoughtSpot. A few Snowflake prerequisites are suggested to provide a smooth integration.

Prerequisites. A ThoughtSpot integration requires an existing Snowflake account, user, role and warehouse to support data queries. We assume you already have a Snowflake account. If a suitable user, role and warehouse does not already exist, we suggest that:

  • A dedicated user be created within Snowflake named ThoughtSpot. This user should be assigned the role that will be used to execute ThoughtSpot queries.
  • A dedicated role be created within Snowflake named THOUGHTSPOT_ANALYST. This role should be assigned usage and read accesses to the databases, schemas and tables that will be queried using ThoughtSpot.
  • A dedicated warehouse be created within Snowflake named THOUGHTSPOT_QUERIES. The warehouse should be sized as small as possible and configured for auto-suspend and auto-resume.
  • The auto-suspend time limit should probably be set to at least 30 minutes so that the warehouse cache can support successive queries. This cache is cleared when the warehouse suspends. The likelihood is that this warehouse will be active for long periods of time since it will support multiple business users on the ThoughtSpot side.

Step 1. Connect — Create a ThoughtSpot connection to Snowflake. Within ThoughtSpot, navigate to the Data > Connections web page and click on the + Add Connection button. The image shown below is displayed.

Enter a connection name and optional description and select the Snowflake icon. Click on the Continue button in the upper right (not shown in graphic). The image below is displayed.

Enter a valid account name, user, password, role and warehouse.

Optionally enter a specific database and schema. If none are entered, ThoughtSpot will display a list of available database and schemas, as depicted below.

Click on the ▼ icons to see the tables in specific databases and schemas. Select the desired tables. When a list of columns is displayed, select the desired columns, and then click on Create Connection. The new connection will be display on the ThoughtSpot Data > Connections page.

Step 2. Search — You search the Snowflake data the exact same way that you would search data stored within in ThoughtSpot. Navigate to the ThoughtSpot Search page and click on Choose Sources.

A list of available sources will be displayed, similar to the image shown below.

Select the desired Snowflake data sources and click on Close. The names of the selected data sources will be displayed in the left-hand pane. Click on the ▼ icons to see the columns in each data source.

Double click on any desired columns or type the column names directly into the search bar. In the search bar, add any desired data values, like south (show below), and keywords, like top 5 (show below). Data values act as filters. Keywords are used by ThoughtSpot to refine the search. ThoughtSpot has over a hundred keywords, ranging from simple things like top 5 or bottom 5 (the value 5 can be changed) to more complex things like year over year and growth of.

As soon as a measure is entered, ThoughtSpot automatically generates a total. When a measure and at least one attribute are entered, ThoughtSpot automatically generates a chart. The chart may change as additional measures and attributes are entered or if the new attributes are geographic identifiers (like state, country, latitude or longitude). A simple search bar and its associated chart are shown below.

There is a great deal more that can be done to search the data. A full description of search capabilities is beyond the scope of this article.

Robust Integration

The more robust integration involves four steps: connect, configure, secure and search.

Prerequisites — The same prerequisites apply as described in the simple integration above.

Step 1. Connect — The same connect procedures apply as described in the simple integration above.

Step 2. Configure — A variety of things can be done to add business semantics to the data retrieved from Snowflake. All are dynamic, in the sense that rules are applied at the metadata layer, not to specific records and values. Each of the sub steps listed below can be applied independently based on the needs of the Snowflake data model.

The sub-steps below apply equally to data retrieved from Snowflake and data stored within ThoughtSpot. Emphasis is given to features that have special significance to Snowflake.

2.1 Define Joins between tables — To translate Search bar entries into a database query, ThoughtSpot needs to know how the different Snowflake tables join to one another. This is true for both star schema and third normal form (3NF) data models. Joins will be automatically downloaded from Snowflake. The issue is that Snowflake does not enforce primary key (PK) and foreign key (FK) constrains, so most Snowflake developers do not build these into their data models. Instead, joins based on PK — FK relationships must be defined in ThoughtSpot. To define a join between two Snowflake source tables, navigate to the ThoughtSpot Data > Tables > table name > Schema web page, were table name identifies the table on the many side of a many-to-one join, such as a fact table. Click on the È Add Join button in the upper right. An Add Join dialog box will open, similar to the one shown below.

The Source Table field will be pre-populated with the table represented by table name. Enter a Destination Table name and select Source Columns and Destination Columns from the pick lists provided. Typically, source and destination fields will have the same name, but this is not necessary. Click on the Next button. The next Add Join dialog box will open, similar to the one shown below.

Enter a join name and optionally a join description. Click on the Next button. You will be returned to the Schema web page, but this time, the join you just created will be displayed, similar to the image shown below.

2.2 Review Column Properties — ThoughtSpot automatically assigns metadata properties to database columns based on the limited information it retrieves from Snowflake, namely data type. (For data loaded directly into Snowflake, column properties are based on data values.) For simple data models, this may be sufficient. For most models, these properties must be manually reviewed and edited. Column properties can be edited by navigating to the ThoughtSpot Data > Tables > table name > Columns web page. A typical page is shown below. The actual web page scrolls to the right and contains a total of 18 properties.

Typical edits that must be performed within ThoughtSpot are:

  • Hidden Columns — Record IDs are generally database generated and don’t mean anything to users. Examples are sales order ID, customer ID, product ID, purchase order ID and so on. A corresponding “name” field is used instead. For fields that have no meaning to users, set the HIDDEN property to YES.
  • Attributes vs. Measures — By default, ThoughtSpot will designate numeric data types that are not measures, like sales order number, as measures. To fix this, change the COLUMN TYPE property to ATTRIBUTE and the ADDITIVE property to NO.
  • Default Aggregation — The default aggregation for all measures is SUM. This is how the measure will be aggregated if not explicitly overridden in the Search bar. For some measures, it may be appropriate to change the default aggregation to average, or unique count. To effect these choices, the AGGREGATION field can be changed to COUNT, COUNT DISTINCT, SUM, AVERAGE, MIN, MAX, STD_DEVIATION or VARIANCE.
  • Default Number Format — The default format for displaying numeric quantities can be set using the FORMAT PATTERN field. Format patterns are defined as a string of zeros, pound signs (#), percent signs (%), periods and commas, similar to MS Excel formatting. Date patterns are defined use the symbols y, M, D, d, H, h, m, s, and a.
  • Currency Format — The default format for displaying currency values can be set by using the CURRENCY TYPE field. This can be set to NONE, SPECIFY ISO CODE from a pick list, INFER FROM BROWSER properties, or infer FROM A COLUMN in the data that contains a currency code. The latter is useful when different records contain different currency values.
  • Identify Columns That contain Geographic Values — ThoughtSpot contains 3 types of geography-based charts: Geo Area, Geo Bubble, and Geo Heatmap. To use these charts, columns that contain geographic values (points or areas) must be manually identified. This does not happen by default. Set the GEO CONFIG property using the pick lists provided. Choices are None, Latitude, Longitude, Country, State, County, and Zip Code. The latter 3 choices are known as “sub-nation region” designations and change from country to country. Some values for non-US countries are Postal Code, Province, Census Division, District, Region, Prefecture, and so forth.
  • Search Suggestions — ThoughtSpot suggests read-ahead values as users type into the Search bar. For example, if you type an “S”, ThoughtSpot may suggest that the full text is Sales (a measure), State (an attribute), South (a value), or Sausage (a product … maybe). The order in which choices are presented is determined by usage-based ranking. If you normally type in Sales, then Sales will appear as your first choice. If most people in your group type in State, then State will appear as your second choice. Two column properties can be set that affect the order of search suggestions. Change the INDEX PRIORITY to push a column to the head of the list. The default priority is 1. Higher values take precedence. Columns with the same priority remain prioritized by usage-based ranking. Change the INDEX TYPE property to determine how ThoughtSpot matches the text you type into column values. This field only applies to textual values and functions differently for columns with short text strings, long text strings, or high cardinality (over 10 million rows). Please refer to the online help for details. In short, options are available that tell ThoughtSpot to only match text at the beginning of column value, the beginning of any word in a column value, or any text in the column value.

2.3 Resolve Chasm Traps — A chasm trap results when a user’s search parameters cause ThoughtSpot to report on measures from two fact tables, generally connected by a shared dimension. A report that asks for amount sold versus amount ordered, by product, would do this since amount sold comes from a sales transaction and amount ordered comes from a purchase order. A SQL query that uses PK-FK relationships to join the dimension and two fact tables creates a Cartesian product that joins each row from the 1st fact table to each row from the 2nd fact table. This results in severe over counting. ThoughtSpot may be the only product where this is an issue since it is the only product that offers this kind of flexibility. ThoughtSpot also automatically fixes the issue by not using a “normal” SQL query. ThoughtSpot pre-aggregates each measure before doing the join. An issue can still occur if the two fact tables have more than one shared dimension. In the above example, sales transactions and purchase orders share both the product and date dimensions. In this situation, ThoughtSpot cannot determine which dimension to use to link the two fact tables. To resolve this issue, navigate to the ThoughtSpot Data > Tables > table name > Columns web page (graphic provided above) and use the ATTRIBUTE DIMENSION property to indicate that the date field in each fact table should not be used to resolve chasm traps; set the value to NO.

2.4 Define Business Friendly Column Names — The columns names in most data warehouses are anything but business friendly. Business friendly names can be defined in ThoughtSpot by navigating to the ThoughtSpot Data > Tables > table name > Columns web page (graphic provided above). Change the COLUMN NAME attribute to the business-friendly name. A link to the original Snowflake name is maintained behind the scenes. Additional business friendly names can be entered into the SYNONYMS column. Some best practice guidelines for defining business friendly names are:

  • Use spaces instead of underscores between names
  • Use initial caps instead of all caps
  • Prefix each attribute with name of its dimension — separate Store City from Customer City
  • Spell out all acronyms — each acronym loses a percentage of the audience
  • Universally drop the suffix Name — no one says what “state name” do you live in
  • Name measure fields the way business users expect them — drop the suffix “amount”

2.5 Define Worksheet for Specific User Groups — Worksheets contain subsets of the data in the data model. For many business users, offering a list of several hundred attributes is confusing when they only use a dozen or so. Additionally, different user groups aggregate data using different dates. Sales users want to see aggregates based on transaction date. Finance users want to see aggregates based on posting date. Worksheets can be defined containing just the columns a specific user group wants to see. Worksheets can also be created to isolate chasm traps for users interested in such things.

Worksheet creation can be quite involved and is fully described in the online ThoughtSpot documentation. The basic procedures are described here. To create a worksheet, navigate to the ThoughtSpot Data web page, click on the ▪▪▪ icon in the upper left followed by the Create Worksheet menu item. A web page similar to the one shown below will be displayed.

Click on the + icon to the right of the word Sources. Select the Snowflake sources needed to support the worksheet using the checkboxes to the left of each source name. A sample image is shown below.

Click on the Close button in the upper right and a web page that allows you to select specific columns will be displayed, similar to the image shown below.

Click on the ▼ icon to the left of a source name to see the columns in that source. Select a source and click on the È Add columns button to add that column to the worksheet. When all columns have been added, click on the ▪▪▪ icon in the upper left followed by the Save menu item. Give the worksheet a name and optional description and click on the Save button.

Step 3. Security — A ThoughtSpot data connection retrieves Snowflake data using a single Snowflake user name. This data may support many ThoughtSpot business users. There are several ways of limiting access to the data based on ThoughtSpot user name and user group.

3.1 Different Connections — Different connections can be defined for different ThoughtSpot user groups. This will generate different source tables for each user group. Access to a data source can be restricted to a one or more user groups by navigating to the ThoughtSpot Data > Tables table name > Share web page. A sample image is shown below.

Click on + Add users or groups to share this data source (make it accessible) with specific user groups. Specific columns can be shared if desired.

This may seem like the simplest method of managing table access. On the plus side, it keeps the Snowflake security model in sync with the ThoughtSpot security model. On the negative side, the ThoughtSpot users and groups have probably not been defined in Snowflake and doing so will cause significant administrative overhead. It also may generate highly redundant data sources and mandate redundant data configuration steps (step 2 above). Other methods of defining user security are described below.

The use of multiple connections is the only method of applying data security that is unique to integrating ThoughtSpot with Snowflake (vs. storing data within ThoughtSpot).

3.2 Different Worksheets — Different worksheets can be defined for different ThoughtSpot user groups. Each ThoughtSpot user group is only given access to their worksheets. The procedures for limiting worksheet access are the same as the procedures for limiting table access, except that a worksheet is selected instead of a table. Also note, sharing a worksheet does not share the underlying tables.

3.3 Row Level Security — Rules can be defined that filter the data retrieved based on ThoughtSpot user name or user group. These rules are applied automatically to any query before the data is exposed to users. Rules are defined on the source tables, but they affect everything derived from those tables: worksheets, views, search suggestion, answers and pinboards.

To define a rule, navigate to the ThoughtSpot Data > Tables > table name > Row Security web page and click on the È Add users or groups button. Enter rules using a popup similar to the one shown below.

Rules are defined using the ts_username and ts_groups functions provided by ThoughtSpot. ts_username returns the name of the user current logged into ThoughtSpot. ts_groups returns a list of the groups to which that user belongs. Rule expressions take one of the forms shown below. Note that multiple clauses can be connected by “ands” or “ors” and if-then-else logic is supported. Any expression containin ts_groups is evaluated once for each group in its list.

ts_groups = [column name] or ts_username != “abc”

if (ts_groups = “abc”) then true else false

The popup includes a Rule assistant in the upper right and provides read-ahead suggestions as you type. When you see Good to go! in the lower left of the popup, enter a rule name and optional description. Click on the Done and Save buttons.

Step 4. Search — The same search procedures apply as described in the simple integration above.

I hope you found this helpful! Please comment below if you have any questions.

About Cervello, a Kearney company

Cervello, is a data and analytics consulting firm and part of Kearney, a leading global management consulting firm. We help our leading clients win by offering unique expertise in data and analytics, and in the challenges associated with connecting data. We focus on performance management, customer and supplier relationships, and data monetization and products, serving functions from sales to finance. We are a Solution Partner of Snowflake due to its unique architecture. Find out more at Cervello.com.

About Snowflake

Thousands of customers deploy Snowflake’s cloud data platform to derive insights from their data by all their business users. Snowflake equips organizations with a single, integrated platform that offers the data warehouse built for the cloud; instant, secure, and governed access to their network of data; and a core architecture to enable many other types of data workloads, such as developing modern data applications. Find out more at Snowflake.com.

--

--