Zendesk & Matillion Integration — Challenges & Solution- Saama Analytics

Saama Analytics
Saama Analytics
Published in
7 min readMay 7, 2021

Zendesk provides a complete customer service solution that’s easy to use and scales with business. The conversations are seamlessly connected, irrespective of chosen medium, be it messaging , email or voice. It builds an integrated help center and community forum. Custom can resolve issues at their own pace — while reducing resolution times for agents. It also gives a picture of Team’s bandwidth and SLA clarity by business. How effective a team is in delivering to customers. It also helps agents, administrators and internal teams do their best work. offers robust out-of-the-bos tools that are quick to set up and easy to use.

Matillion provides a powerful integration query component retrieves data from Zendesk database and loads into a table. We will take an example of Ticket [ important table ] and design ingestion pipeline end to end. For this use case Matillion with Redshift is used.

Challenges with Ticket table with Matillion

  • As part of dataware , if ticket volume is more than 70K + , it goes into a Timeout issue.
  • Instead Zendesk provides — endpoint _Incremental Export to get items changed or were created since the last request. For e.g.

— Request at 5pm: “Give me all the tickets that have changed since noon today.”

— Response: “Here are the tickets that changed since noon up until, and including, 5pm.”

— Request at 7pm: “Give me the tickets that have changed since 5pm.”

— Response: “Here are the tickets that changed since 5pm up until, and including, 7pm.”

  • Now while getting data via Connector , we always have to give parameter as start_time in advanced mode

— Select * from ticket_incremental where start_time = ‘2021–03–31’

  • Since in DWH , we need the entire export to store or at-least incremental export to capture all changes done in a given window so no data is missed. With the above point , we might run into data missing conditions as start_time has to be provided as Value.
  • There is Column called “Custom_Fields” which is a dynamic Array of key value pairs of custom fields filled in Zendesk tickets , we need to extra parser to fetch these custom fields in Columnar format. As this is dynamic in nature , columns can be added or removed.

To overcome challenges mentioned above , we can design custom ticket load orchestration jobs to load data for range and also parse custom fields.

  1. Define following metadata which holds Custom Fields ID / Value and column Name

— Contains table name and schema name where ticket table will be stored

— ID , custom field name and data type [ based on content of data ]

— Custom_column_name can be derived by replacing _ , ( , ) , space , > , ? in custom_field_name with “_” and appending custom_ to it

— This can be derived from end point in zendesk = ticket_fields [ https:// {subdomain }.zendesk.com/api/v2/ticket_fields.json ] It contains 2 fields i.e ID , Raw_Title which Custom Field Name

2. Matillion Pipeline Step by Step Process Design

  • Drop all intermediate /temp table created
  • Get Date Range for which it needs to run via HWM set in Metadata from last run . Added -1 as safety Net
  • With Diff_Days , get the Loop Iterator and derive start time to get data from Zendesk Connector
  • Load_ticket_landing will calculate the start time as start_range + Loop Indicator passed from Step 3 to run iteratively and load data in Pre Staging table

— select * from ${jv_source_table_name} where start_Time = ‘${jv_start_time}’

— Please note — Set ReCreate_Table to OFF

— Custom Fields after staging are as follows , key/value pair

  • Remove duplicates if any and store the result in temp table
  • Set the Partition to run as Custom fields Json Parser Code in parallel. As this is Nested JSON, it might run into performance issues if Number of custom fields for a given Zendesk is more. It can consume more credits or Redshift can timeout. Running it No of records will result in performance improvement. For this example jv_common_partition_cnt is set to 2000
  • This is a very important step. This code parse Custom Fields from Json to Column dynamically depending on Step 1, metadata defined and store result into temp table
  • Query Used :

— This code will convert custom fields into valid Json

— Using Json extract path text , it will parse into Rows

— Audit_Config.Zendesk_Number_Config — should be a simple table with one column i.e. Ordinal which contains only numbers of custom fields i.e. 0 to 100 if 100 custom fields are there in ticket. So set a higher number to automatically get all custom added or removed without manual intervention.

INSERT INTO landing_incremental.ticket_zendesk_desktopsupport_custom

SELECT * FROM (

select id , ‘{“custom_fields”:’|| custom_fields || ‘}’ as custom_fields , created_at

from landing_incremental.ticket_zendesk_desktopsupport_sub

where unq_key >= ($jvLoop-$jvpartitionset)*2000 and unq_key < (1)*${jvpartitionset}

, ticket_dtl as (

json_extract_path_text(custom_fields, ‘custom_fields’, true ) as custom_fields , created_at

from tickets )

, joined_tickets as (

json_array_length(ticket_dtl.custom_fields, true) as number_of_items,

json_extract_array_element_text(

ticket_dtl.custom_fields,

numbers.ordinal::int,

) as custom_fields

from ticket_dtl

cross join ( Select Ordinal as ordinal From audit_config.zendesk_numbers_cfg

order by 1 ) numbers

where numbers.ordinal <

json_array_length(ticket_dtl.custom_fields, true)

joined_tickets.id as ticket_id,

cfg.custom_column_name as custom_column_name,

json_extract_path_text(custom_fields, ‘id’)::numeric as custom_field_id,

json_extract_path_text(custom_fields, ‘value’) as value

FROM joined_tickets , audit_config.etl_zendesk_custom_field_config cfg

WHERE custom_field_id = cfg.id and cfg.is_active = true )

WHERE value IS NOT NULL;

Output ⇒

  • Now step to PIVOT the Custom column to store on TICKET level. Above steps will get the data set in Row Format. This again has to be dynamic in order to support addition of new columns.

Pivot Redshift Query Dynamic Derivation on Custom Metadata Data

jv_custom_select : SELECT

‘SELECT ticket_id,’

|| RTRIM ( LISTAGG ( ‘MAX ( CASE WHEN custom_column_name = “‘ || custom_column_name || “‘ THEN value ELSE null END )

AS ‘ || custom_column_name ||’, ‘) ,’, ‘)

|| ‘ FROM ${jv_landing_schema_name}.${jv_landing_table_name}_unq_custom

ticket_id’ as select_

FROM ( select distinct custom_column_name from audit_config.etl_zendesk_custom_field_config

WHERE datalake_schema_name = ‘${jv_datalake_schema_name}’

AND datalake_table_name = ‘${jv_datalake_table_name}’

AND is_Active)

  • Once this custom create table DDL is automatically generated, create and store results of final Pivoted data into Custom Final table

Output generated in Pivot Format

  • Last step

— Get all column names for custom columns dynamically so when joined that listagg value can be used.

  • Join the Ticket table with the Main Custom Table and Final Table is ready for use on Ticket ID

Final Pipeline -

Here comes the Final Output -

Towards the end of this implementation , the advantages as follows but not limited

  • As Matillion does not have a direct way to flatten this out , the above method gives flexibility to PIVOT columns to further processing.
  • Since it is a metadata drive , adding any column will need to Insert in Metadata — No code change , removing any column to set is_Active flag in Metadata from True to false — No code change is required.
  • In case of Custom Fields are excessive in numbers, parallel partitioned ensures best performance to PIVOT columns
  • All intermediate tables where results are stored are dropped towards the end.
  • This provides more modularity to derive Custom Fields which in Json Format
  • This does not require any special processing/component , all redshift inbuilt query power and metadata set drives this solution.

Author: Prajakta Borkar
Enterprise Datawarehouse Architect, 14+ years of experience a host of technologies including Oracle Exadata, Snowflake, SQL-server, Shell/Powershell/PLSQL script, Datastage, Ab-Initio and specialized in modern cloud technologies like Snowflake, Matillion, Denodo, PowerBI, Business Objects with Data Modelling, Data analytics etc.

Originally published at https://saamaanalytics.com on May 7, 2021.

--

--