Tableau REST API : Update data in published data source
Learn how to use the new endpoints offered in REST API to update a published Hyper data.
Tableau REST APIs helps to manage and automate tasks in Tableau Server, Online or Prep Conductor resources. With the new Tableau version of 2021.2 REST API 3.12 was introduced, it had multiple new features like
· You can now modify (insert, update, upsert, replace, and delete) data in a published data source that is a live-to-Hyper connection.
· With the appropriate permissions, you can batch add (or update), and delete multiple data quality warning (DQW) items to and from different content and external assets using the following methods: Batch Add Data Quality Warnings and Batch Delete Data Quality Warnings.
· Server and Site administrators can now directly find the Tableau resource responsible for a problematic data source query when they enable query tagging(Link opens in a new window) through update workbook connections and update data source connections. They can view enablement status through query workbook connections and query data source connections.
You can refer to the above links to get an idea for each but for the scope of this blog we will go in detail over the new feature to update data in published data source and guide you with a sample code to help you start.
Use case : Source Hyper to Target Hyper in Published Data Source
Let us assume that we have a published Hyper data source consisting of 1 million rows in the Tableau Server and we need to update on some specific rows and insert a few records maybe say 1000 rows. Now to achieve an incremental load with update before we were forced to download the whole hyper file, unpack it and then update and insert after which it was also needed to be compressed.
All the above processes are highly memory intensive often being practically impossible to do with local systems. Apart from that it would be quite network intensive to download and publish the hyper files as they often will be in GBs.
Above image is from DATAdev Day 2021, which shows the overall flow for the new REST endpoint. As with the scenario mentioned earlier, now we would need to create only the smaller .hyper file with the changed dataset. We need to pass the JSON Content type to this request which would include the actions we want to be taken with the data on the target Hyper data source. Once the request is pushed to Tableau server, the Backgrounder service comes into action and applies the action requested in the target Hyper.
As the required operation is pushed to Tableau it helps us to avoid the entire memory intensive operation and thus any failures along with it. Additionally the network usage is much less as we only send the changed dataset over network without working on the target large hyper file.
Code Overview :
At this time of writing the blog, The Tableau Server Client(TSC) library which forms an abstraction on Tableau Server REST API is yet to be updated with this new feature and hence we would not be leveraging the library for this walk through. However once it is updated would surely update the blog with github links for the code of the same.
You can still get a good starting point with the sample code on the link which uses the “requests” module of Python to make the HTTP requests.
As mentioned before Tableau REST API makes HTTP requests to work with the available endpoints doing tasks like login, setting permissions, publishing workbook or data source, etc. We would focus on the upload of data sources for this use case. The single upload method has a limit of 64 MB thus for file size greater than that we break it into standard 5 MB chunks. We generate an upload id for all these chunks by initializing a session to help identify that all the chunks are part of a single file. Each of these chunks are then upload to push the entire .hyper file with the help of PUT requests.
With the new feature of REST API we can now include a PATCH request with the data being sent in JSON Content-type which includes “action-type” to define the operations needed to be performed on the Hyper file uploaded. The available action types are as below which is further documented in Tableau documentation link. Note that we can even take multiple actions from below and formulate our json content to achieve complex operations.
- insert — Inserts rows from a named source-table in your data source into a named target-table in your published Hyper. A simplified example is:{“action”: “insert”, “source-table”: “added_users”, “target-table”: “current_users”}
- update — Updates existing tuples from a named source-table in your data source into a named target-table in your published Hyper. The tuples to be updated are defined in a condition declaration. A simplified example is:{“action”: “update”, “target-table”: “my_data”, “source-table”: “uploaded_table”, “condition”: {“op”: “eq”, “target-col”: “row_id”, “source-col”: “update_row_id”}}
- upsert — Updates a tuple inside the target-table, if such a tuple exists. If no such tuple exists, a new tuple will be inserted. The tuples to be updated are defined in a condition declaration. A simplified example is : {“action”:“upsert”,“target-table”: “my_data”,“source-table”: “uploaded_table” , “condition”: {“op”: “gt”, “target-col”: “row_id”, “source-col”: “update_row_id”}}
- replace — Deletes all tuples from a target-table and inserts all data from a source-table with a matching schema. A simplified example is: {“action”:“replace”, “source-table”: “added_users”,“target-table”: “current_users”}
- delete — Deletes tuples from a target-table. The tuples to be deleted are defined in a condition declaration. A simplified example is: {“action”:“delete”, “target-table”: “my_extract_table”, “source-table”: “deleted_row_id_table”, “condition”: {“op”: “gt”, “target-col”: “id”, “source-col”: “deleted_id”}}
Once the source hyper file is uploaded and the JSON Content is formulated, we need to create a PATCH request providing the target data source id, the source upload id along with the JSON content in the request body.
The sample code is available in the github link which shares the .hyper file creation along with the rest_api.py file which contains a sample code inserting some data from .csv file to a published hyper file.
Thanks for reading. Please post your experiences, questions, and feedback below!