Deploying Fivetran HVR for Google Cloud Cortex Framework

Published in

Google Cloud - Community

11 min readAug 7, 2023

Fivetran HVR (High-Volume Replicator) enables users to manage the complex, high-volume data needs essential for Google Cloud Cortex Framework and turn SAP data into faster insights.

[September 2024] This post has been edited to reflect the name change of ‘Local Data Processing’ back to ‘HVR’.

Most users will know Fivetran for their automated, no-code and fully-managed data movement platform. With the acquisition of HVR in 2021 and subsequent integration with the rest of Fivetran’s platform, Fivetran now offers another scalable, secure and efficient option for working with SAP data in Google Cloud Cortex Framework.

We successfully leveraged Fivetran HVR in a recent proof of concept. At first glance, the requirements and components seem somewhat daunting — a hub and repository database, an agent, ODBC drivers — but we have put together the walkthrough below so that you can quickly get up and running.

Before starting, you will need:

A non-production SAP HANA instance.
A Google Cloud project, with access to Compute Engine, Cloud SQL and BigQuery — you can get started for free if you don’t already have a account.
A Fivetran account — you can use your existing account if you already use Fivetran today, or sign up for a new one for free. The Fivetran team will need this account to provide you with a trial license and downloads for HVR.

Simplified component diagram showing HANA source system, Fivetran High Volume Agent and Hub, and BigQuery destination with staging bucket in Cloud Storage. — What you will be deploying — the Fivetran Hub and High Volume Agent

Create a Hub machine

The HVR Hub provides a UI and access point for remote connection to the HVR installation. It can be installed on Windows or Linux. We chose a e2-standard-8 Compute Engine VM running Red Hat Enterprise Linux 9. To simplify access and license management, we configured a public IP address and internet connectivity for the VM.

You should create your Hub VM the same region as you plan to use for BigQuery, or a region that is close to both source and destination. For our POC, we used a machine size that corresponds to a “small” sized hub in Fivetran’s guidelines, but production deployments may require a larger machine.

Create a repository database

For trial and POC purposes, you could use a local database installation, but a fully-managed service like Cloud SQL is a good option if you want to quickly get started with minimal fuss.

We created a Cloud SQL instance for PostgreSQL in the same region as the Hub VM, a database user with built-in authentication and a database for use as the repository database for the Hub. You can also use any other supported repository database version if you prefer.

Install the Hub software

We had all our downloads (e.g. Fivetran package, BigQuery ODBC driver tarball) copied to the home directory of the Fivetran user, but you can edit the sample commands below to match your own environment. If in doubt, you can also refer to the step-by-step instructions from Fivetran’s documentation.

# Install Perl (the Hub needs this)
sudo yum install -y perl

# Create Fivetran user and switch to the user
sudo useradd fivetran
sudo passwd fivetran
su fivetran

# Add these lines to the end of .bashrc as well
# e.g. vi ~/.bashrc
export HVR_HOME=/home/fivetran/hvr_home
export HVR_CONFIG=/home/fivetran/hvr_config
export HVR_TMP=/home/fivetran/hvr_tmp
export PATH=$PATH:$HVR_HOME/bin

# Extracing the Hub installation files
umask 022
mkdir $HVR_HOME
cd $HVR_HOME
tar -xvf \
../fivetran-6.1.0_30-hub_and_agent-linux_glibc2.12-x64-64bit_ga_patch.tar.gz

# Configuring the Hub to use HTTP only - not recommended for production use
# To use HTTPS, provide a private key and public certificate
# and follow the steps in Fivetran documentation
hvrhubserverconfig HTTP_Port=4340
hvrhubserver

At this point, the Hub service will be running and you can navigate to the UI at http://< HUB IP OR HOSTNAME >:4340 (if using HTTP) or https://< HUB IP OR HOSTNAME >:4341 (if using HTTPS) and configure the hub following the instructions in Fivetran’s documentation. You will need to connect the installation to your Fivetran account to apply a license at this point.

Once you have the license applied to your Hub installation, you can then configure the connection to the repository database and user authentication for the Hub UI. We went with a local user for simplicity, but other authentication methods are also supported.

Once you are done, you should return to the terminal and configure the service to ensure it restarts after a server reboot. The Fivetran documentation provides a sample systemd unit file that you can copy and paste into your own:

sudo su
touch /etc/systemd/system/hvrhubserver.service
vi /etc/systemd/system/hvrhubserver.service
# Paste the contents of the unit file

chmod 644 /etc/systemd/system/hvrhubserver.service
systemctl enable hvrhubserver
systemctl start hvrhubserver

Working with SELinux

If SELinux is in Enforcing mode, you may see errors similar to below:

Job for hvrhubserver.service failed because the control process exited with error code.
See “systemctl status hvrhubserver.service” and “journalctl -xe” for details.

You will need to set SELinux to a more permissive mode, or set the SELinux context (think semanage fcontext) of the Fivetran binaries.

Installing the ODBC driver for BigQuery

We are also using the Hub as the destination agent to write to BigQuery. This requires the ODBC driver for BigQuery to be installed on the Hub machine. We use the Fivetran OS user to perform the installation.

This section uses version 3.0.0.1001 of the driver and version 3.0.0 of the ODBC driver installation guide. You can install the ODBC driver at any time before or after the HVR Hub installation, there is no need to restart the Hub service or machine after completing these steps.

# Install unixODBC if you do not already have a driver manager installed
sudo yum install unixODBC

# Extract the downloaded tarball to a temporary directory
cd ~/temp
tar -xvf ../SimbaODBCDriverforGoogleBigQuery_3.0.0.1001-Linux.tar.gz

# Copy contents of extracted inner tarball to install location
mkdir ~/bq_odbc
cp -r SimbaODBCDriverforGoogleBigQuery64_3.0.0.1001/* ~/bq_odbc/

# Copy required contents of main tarball to install location
cp -r SimbaODBCDriverforGoogleBigQuery_3.0.0.1001-Linux/Tools ~/bq_odbc/
cp SimbaODBCDriverforGoogleBigQuery_3.0.0.1001-Linux/GoogleBigQueryODBC.did \
~/bq_odbc/lib/

# Copy ini files to install location
cp SimbaODBCDriverforGoogleBigQuery_3.0.0.1001-Linux/setup/odbcinst.ini \
~/bq_odbc/
cp SimbaODBCDriverforGoogleBigQuery_3.0.0.1001-Linux/setup/\
simba.googlebigqueryodbc.ini ~/bq_odbc/lib/

Updating ini files

After installation, you must edit the following ini files to update the driver installation directory location. If you have used the example above, the installation directory will be /home/fivetran/bq_odbc.

odbcinst.ini (Located at /home/fivetran/bq_odbc/odbcinst.ini)
🛈 NOTE 🛈 this is a complete sample, you can replace the entire contents of the existing file with the sample.

# To use this INI file, replace <INSTALLDIR> with the 
# directory the tarball was extracted to.

[ODBC Drivers]
Simba ODBC Driver for Google BigQuery 32-bit=Installed
Simba ODBC Driver for Google BigQuery 64-bit=Installed

[Simba ODBC Driver for Google BigQuery 32-bit]
Description=Simba ODBC Driver for Google BigQuery(32-bit)
Driver=<INSTALLDIR>/lib/libgooglebigqueryodbc_sb32.so

[Simba ODBC Driver for Google BigQuery 64-bit]
Description=Simba ODBC Driver for Google BigQuery(64-bit)
# Update this line
Driver=/home/fivetran/bq_odbc/lib/libgooglebigqueryodbc_sb64.so

## The option below is for using unixODBC when compiled with -DSQL_WCHART_CONVERT.
## Execute 'odbc_config --cflags' to determine if you need to uncomment it.
# IconvEncoding=UCS-4LE

simba.googlebigqueryodbc.ini (Located at /home/fivetran/bq_odbc/simba.googlebigqueryodbc.ini)
🛈 NOTE 🛈 this is a complete sample, you can replace the entire contents of the existing file with the sample.

# To use this INI file, replace <INSTALLDIR> with the 
# directory the tarball was extracted to.

[Driver]
# Encoding should be UTF-16 if using unixODBC
DriverManagerEncoding=UTF-16
# Update with installation directory
ErrorMessagesPath=/home/fivetran/bq_odbc/ErrorMessages
LogLevel=0
LogPath=


## - Note that the path to your ODBC Driver Manager must be specified in LD_LIBRARY_PATH.

Configuring the BigQuery connection

At this point, you are ready to configure the BigQuery location in the Hub UI. Expand the “Advanced Settings” section and ensure that the following fields are filled:
LINUX / UNIX ODBC DRIVER MANAGER LIBRARY PATH: Can be left blank to use the default (/usr/lib64). If there is an existing ODBC driver installation at a different path, update this field accordingly.
LINUX / UNIX ODBCSYSINI: Should be the path to odbcinst.ini.
ODBC Driver: If the path to the ini files is correct, there should be an option for “Simba ODBC Driver for Google BigQuery 64-bit”. Select that option.

If you have followed the instructions here, the configuration should look like this:

An example configuration for BigQuery in the Hub UI, showing the path to the ini file and the driver selection. — Sample configuration for BigQuery location in Hub UI

Make sure to check “Test connection before saving”.

You will also need a staging bucket on Cloud Storage to load data into BigQuery. Select “Google Cloud Storage (HTTPS)” and provide the name of a bucket and a staging directory within the bucket. The staging directory does not need to be created in advance.

The credentials provided above for BigQuery will also be used to access the staging bucket. Ensure these credentials have all required permissions.

Required permissions

HVR requires a number of permissions for BigQuery, and Storage Admin for Cloud Storage. A reasonably minimal set of roles for the service account would therefore be:

BigQuery Job User (for bigquery.jobs.create)
BigQuery Data Editor
BigQuery Resource Viewer (to query INFORMATION_SCHEMA.JOBS_BY_PROJECT)
Storage Admin

The Hub UI will perform tests when creating the BigQuery location to ensure that the provided credentials have sufficient permissions. If any permissions are missing, an error message will be displayed explaining what permissions are missing.

Troubleshooting DBMS error number [50404]

If you see an error similar to:

DBMS error [[]. DBMS error number [50404]. SQL State [H]. DBMS version [unknown] Last query [none]

while saving and testing the connection to BigQuery, it could be that the DriverManagerEncoding parameter in simba.googlebigqueryodbc.ini has not been set correctly. The encoding should be:

unixODBC: UTF-16
iODBC: UTF-32
unixODBC compiled with -DSQL_WCHART_CONVERT: UTF-32

There is no need to reinstall the ODBC driver or restart the Hub service after updating the simba.googlebigqueryodbc.ini file. You can simply return to the location connection dialog and retry saving the connection configuration.

Installing the High Volume Agent

For our HANA data source, we install a High Volume Agent (HVA) on the same local machine as HANA. The installation package for Linux is the same as for the Hub, although there are also agent-only installation packages for AIX, Solaris and PowerPC.

Like the Hub, we create a separate OS user to install and run the HVA:

# Create Fivetran user
sudo useradd fivetran
sudo passwd fivetran
su fivetran

# Add these lines to the end of .bashrc as well
export HVR_HOME=/home/fivetran/hvr_home
export HVR_CONFIG=/home/fivetran/hvr_config
export HVR_TMP=/home/fivetran/hvr_tmp
export PATH=$PATH:$HVR_HOME/bin

# Extracing the Hub installation files
umask 022
mkdir $HVR_HOME
cd $HVR_HOME
tar -xvf \
../fivetran-6.1.0_30-hub_and_agent-linux_glibc2.12-x64-64bit_ga_patch.tar.gz

We need to configure the HVA to listen on a TCP/IP port. There are several options for this, we chose to use systemd so that the agent would also continue functioning after a server reboot. As before, Fivetran’s documentation includes sample files that we can simply copy:

# Setup HVA listener using systemd
sudo su
cd /etc/systemd/system
touch hvr.socket
vi hvr.socket
# Copy file contents from the link above
chmod 644 hvr.socket
touch hvr@.service
vi hvr@.service
# Copy file contents from the link above
chmod 644 hvr@.service
systemctl enable hvr.socket
systemctl start hvr.socket

The HVA will initially be in setup mode, and we can connect to it by accessing the Hub UI and creating a new location, then follow the documented steps to configure HVA from a browser. For our POC environment, we decided to keep things simple:

Do not limit agent access to specific hub systems.
Configure the connection authentication mode to “All connections require an agent user”.
Create two separate users, both using “Local user” authentication.
One user e.g. agent-admin should only have “AgentAdmin” access.
The other user e.g. agent-user should only have “Use for current location” selected.
Check “Test agent connection before confirming” and click on “Confirm Connection Method”.

Once we know that the agent is working and can be reached from our Hub machine, we can cancel the location creation and come back to it later. We can still use the agent-user we created later when configuring our connection to SAP HANA.

Troubleshooting agent connection issues

You may see an error message similar to below after clicking on “Confirm Connection Method” above if SELinux is in enforcing mode on the HVA machine:

Example error message from the Hub UI when connecting to a HVA running on a machine with SELinux in enforcing mode. — TLS errors if SELinux is in enforcing mode

Running systemctl status hvr.socket on the HVA machine shows that the socket is active (listening) but systemd logs may contain lines similar to:

hvr@3–10.0.0.7:4343–10.0.0.3:44250.service: Failed at step EXEC spawning /home/fivetran/hvr_home/bin/hvragent: Permission denied
fivetran-hva2 systemd[1]: hvr@3–10.0.0.7:4343–10.0.0.3:44250.service: Main process exited, code=exited, status=203/EXEC

We have already discussed above (see “Working with SELinux”) how to solve this for the Hub, and the same options can be used for the HVA.

Installing the client and ODBC driver for SAP HANA

The HVA requires the the ODBC driver for HANA to be able to capture changes from the database. HANA installation will typically include the client and driver, and SAP documentation covers how to install the driver if needed. You will also need to install a driver manager such as unixODBC.

After the driver is installed, you do not need to create or edit odbc.ini, but you must create or edit odbcinst.ini to have a section similar to what is shown below. You can either edit the existing system file (e.g. in /etc) or create a separate one exclusively for use with the HVA.

[HDBODBC]
Description=64-bit HANA ODBC Driver
Driver=< CLIENT INSTALL LOCATION >/hdbclient/libodbcHDB.so
FileUsage=1

Configuring the SAP HANA location connection

When configuring the SAP HANA location in the Hub UI, expand the “Advanced Settings” section and ensure that the following fields are filled:

LINUX / UNIX ODBC DRIVER MANAGER LIBRARY PATH: can be left blank to use the default (/usr/lib64). If there is an existing ODBC driver installation at a different path, update this field accordingly.
LINUX / UNIX ODBCSYSINI: Should be the path to odbcinst.ini.
ODBC Driver: If the path to the odbcinst.ini file is correct, and it has been created following the example above, there should be an option for “HDBODBC”. Select that option.

As before, be sure to check “Test connection before saving”.

Before activating replication: HANA column names with slash “/” character

Some column names in HANA contain a slash “/” character, which is not allowed for column names in BigQuery. You will get error messages when attempting to integrate changes from source tables with disallowed characters in the column name to BigQuery.

To avoid this, consider adding tables to the channel but not activating replication. This will allow you to review the column names in each table. If there are column names with disallowed characters, you can use the ColumnProperties action with the Name and BaseName parameters to rename the column. For example, you can replace the slash character with an underscore.

SAP HANA high isolation level

Some HANA systems may have high isolation level configured. Fivetran’s documentation regarding OS permissions for HANA as source will only work if the HANA system is using the default (low) isolation level. In high isolation level, each tenant DB will use a different, dedicated OS user and group and log files will only be readable by members of that group, as shown in the example below.

drwx------ 2 sysadmin sapsys 180 Jun 18 18:35 hdb00001
drwxrwx--- 2 tenant01 grp01  195 Jun 18 23:21 hdb00002.00003
drwxrwx--- 2 tenant02 grp02  160 Jun 19 01:41 hdb00002.00004
drwxrwx--- 2 tenant03 grp03  300 Jun 19 01:34 hdb00002.00005
drwxrwx--- 2 tenant04 grp04  160 Jun 19 02:04 hdb00002.00006
drwx------ 2 sysadmin sapsys 195 Jun 18 15:04 hdb00003.00003

If you have high isolation level configured on HANA, you will see file permission errors in the Hub logs after activating replication (specifically capture) on the source:

Open of file ‘/< PATH TO LOGS >/hdb00002.00003/ logsegment_000_directory.dat’ for operation ‘read’ failed. Error: Permission denied.

If this happens, check with your HANA administrator to confirm that high isolation level is configured. If it is, you can work with them to add the HVA user to the various OS groups as secondary groups:

sudo usermod -a -G grp01, grp02, grp03, grp04 fivetran

Congratulations!

You have now setup Fivetran HVR and are ready to activate replication from your SAP HANA tables to BigQuery. You can now kickstart insights with Google Cloud Cortex Framework with predefined data models, data processing templates, example ML code with Vertex AI and sample Looker dashboards.