How to consume data from AWS Marketplace using Teradata Vantage

AWS Data Exchange and Teradata VantageCloud

Sebastian
Teradata
18 min readJan 5, 2023

--

Photo by Shubham Dhage on Unsplash

Hello, data folks! If you are here, it’s probably because you may be interested in integrating Teradata Vantage with Amazon Web Services (AWS) First Party Services. This story will help you connect Teradata Vantage with the AWS Data Exchange service. Let’s review the procedure!

Note: The procedure offered in this guide has been implemented and tested by Teradata. However, it is offered on an as-is basis. Neither AWS nor Teradata provides validation of Teradata Vantage with AWS Data Exchange. This guide includes content from both AWS and Teradata product documentation.

Overview

This post describes the process to subscribe to a dataset in AWS Data Exchange, exporting it to Amazon S3, and then querying it with Teradata Vantage, either leveraging Native Object Store (NOS) capability or the Teradata Parallel Transporter (TPT) utility.

About AWS Data Exchange

AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud. Today, AWS Data Exchange contains data products from a broad range of domains, including financial services, healthcare, and life sciences, geospatial, weather, and mapping. Once subscribed to a data product, you can use the AWS Data Exchange API to export data into Amazon S3, and then use Teradata Vantage to explore and analyze the data.

About Teradata Vantage

Teradata Vantage is the cloud data analytics platform that unifies everything — data lakes, data warehouses, analytics, and new data sources and types. Leading the way with multi-cloud environments and priced for flexibility, Vantage leverages 100 percent of a business’s data, regardless of scale, volume, or complexity.

Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, machine learning (ML) functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.

Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores, and integrating their analytic workloads. Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.

Teradata Vantage Native Object Store (NOS) can be used to explore data in external object stores, like Amazon S3, using standard SQL. No special object storage-side compute infrastructure is required to use NOS. You can explore data located in an Amazon S3 bucket by simply creating a NOS table definition that points to your bucket. With NOS, you can quickly import data from Amazon S3 or even join other tables in the database.

Alternatively, the Teradata Parallel Transporter (TPT) utility can be used to import data from Amazon S3 to Teradata Vantage in a bulk fashion. Once loaded, data can be efficiently queried within Vantage.

Prerequisites

You are expected to be familiar with the AWS Data Exchange service, Teradata Vantage, and (optionally) Teradata Tools and Utilities (TTU).

You will need the following accounts and systems:

If you will be using NOS:

  • A Teradata Vantage 2.0 instance (version 17.0) or later

If you will not be using NOS:

  • A Teradata Vantage 1.1 instance (version 16.20) or later
  • An account on the Teradata Developer website
  • A client system to run Teradata Parallel Transporter scripts

Procedure

Once you have met the prerequisites, follow these steps:

1. Create an Amazon S3 bucket

2. Subscribe to a Data Product on AWS Data Exchange

3. Export a dataset to your S3 bucket

If you have NOS capability, you may follow these steps:

4. Configure NOS access to S3

5. Query the dataset in S3

6. Load data from S3 into Vantage (optional)

If you do not have NOS capability or do not want to use NOS, follow these steps:

7. Configure Teradata Parallel Transporter

8. Load data from S3 into Vantage using TPT

9. Query the dataset in Vantage

Create an Amazon S3 bucket

Open the Amazon S3 console in a browser. Chrome, Firefox, and Safari work well.

Amazon S3 is region sensitive. Ensure that your console reflects the region where you want to access a dataset.

AWS Data Exchange is also region sensitive. You will want to coordinate and collocate your Amazon S3 bucket and Data Exchange data in the same region.

You may use an existing bucket. We will create a new bucket.

AWS is enhancing the Amazon S3 console. These screenshots might not match what you see exactly.

Click the Create bucket button.

Figure 1 Create Amazon S3 bucket button

Enter a bucket name. We will use dataexchange-dataset in this article.

Figure 2 Create Amazon S3 bucket dialog

Amazon S3 bucket names must be globally unique. You will not be able to use the same bucket name as in this article. Instead, you will need to create a different bucket name and use it in place of our bucket name throughout this article.

Click the Create button.

It is beyond the scope of this article to discuss permissions for the Amazon S3 bucket needed by Vantage.

Subscribe to a Data Product on AWS Data Exchange

Open the AWS Data Exchange console in a browser.

Click on Explore available data products to search for a dataset.

Enter “historical weather data” and click the Search button.

Scroll down to the “OnPoint Historical Weather Data” dataset. Ensure that the price is Free.

Datasets are products, many with recurring fees. Some can be quite expensive (multiple thousands of dollars). Ensure that you are selecting a free dataset for this article to avoid being charged.

Figure 3 OnPoint Historical Weather Data dataset in AWS Data Exchange

Click on the OnPoint Historical Weather Data dataset.

Click on Continue to subscribe.

Ensure that the price is zero.

Under Renewal settings, click on No to avoid being charged in the future.

Figure 4 Dataset pricing and renewal settings

Read the Data subscription agreement.

Scroll down and click on Subscribe. The request will take a few minutes.

You will see a message when your subscription has been successfully processed.

Figure 5 Successful subscription

Export Dataset to Amazon S3

In the AWS Data Exchange console, click on My Subscriptions > Subscriptions.

You will see the OnPoint Historical Weather Data subscription (plus any others you have subscribed to).

Click on the OnPoint Historical Weather Data subscription.

Figure 6 OnPoint Historical Weather Data subscription

Scroll down to Entitled data sets.

Click on 2018 Historical Hourly Weather for 50 U.S. Capital Cities dataset.

Figure 7 Entitled dataset

Click on the Nov 5, 2019 revision.

Figure 8 Nov 5, 2019 dataset revision

Select the checkbox next to the Nov 5, 2019 asset.

Figure 9 Nov 5, 2019 asset

Click on the Export to Amazon S3 button.

Select the Amazon S3 bucket you created. You can leave the encryption set to None.

Click the Export button.

Figure 10 Export to Amazon S3

The export should take less than a minute. You will see a success message when the export is completed.

Figure 11 Export to S3 success

You can view the CSV object in your Amazon S3 bucket.

Figure 12 CSV object stored in Amazon S3 bucket

Configure NOS access to Amazon S3

You can choose to use Native Object Store (NOS) to read objects on Amazon S3 directly, or you can load the data from Amazon S3 into Vantage with the Teradata Parallel Transporter utility. NOS is only available with Vantage 2.0 (version 17.0) or later. Teradata Parallel Transporter (TPT) is available with all available versions of Vantage.

If you are using TPT, please skip to step 7.

Native Object Store (NOS) can directly read data in Amazon S3, which allows you to explore and analyze data in Amazon S3 without explicitly loading the data.

Create a foreign table definition

A foreign table definition allows data in Amazon S3 to be easily referenced within the Advanced SQL Engine and makes the data available in a structured, relational format.

NOS supports data in CSV, JSON, and Parquet formats.

Login to your Vantage system with Teradata Studio.

Create an AUTHORIZATION object to access your Amazon S3 bucket with the following SQL command.

CREATE AUTHORIZATION DefAuth_S3
AS DEFINER TRUSTED
USER 'A*****' /* AccessKeyId */
PASSWORD '*****' /* SecretAccessKey */
;

Replace the string for USER with your AWS access key.

Replace the string for PASSWORD with your AWS secret access key.

Create a foreign table definition for the CSV file on Amazon S3 with the following SQL command.

CREATE MULTISET FOREIGN TABLE WeatherData,
EXTERNAL SECURITY DEFINER TRUSTED DefAuth_S3
(
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV
)
USING
(
LOCATION ('/S3/s3.amazonaws.com/dataexchange-dataset/')
);

At a minimum, the foreign table definition must include a table name (WeatherData) and a location clause, which points to the object store data.

The LOCATION requires a top-level name, which is the bucket name. This is highlighted above in yellow. You will need to replace this with your own bucket name.

If the object doesn’t have a standard extension (e.g. “.json”, “.csv”, “.parquet”), then the Location…Payload columns definition phrase is also needed, and the LOCATION phase needs to include the file name. For example: LOCATiON (‘S3/s3.amazonaws.com/dataexchange-dataset/samplefile’).

Foreign tables are always defined as No Primary Index (NoPI) tables.

Query the Dataset in Amazon S3

Run the following SQL command to query the dataset.

SELECT * FROM WeatherData SAMPLE 10;

The foreign table only contains two columns: Location and Payload. Location is the address in the object store system. The data itself is represented in the payload column, with the payload value within each record in the foreign table representing a single CSV row.

Figure 13 WeatherData table

Run the following SQL command to focus on the data in the object.

SELECT payload..* FROM WeatherData SAMPLE 10;
Figure 14 WeatherData table payload

Create a view

Views can simplify the names associated with the payload attributes, can make it easier to code SQL against the object data, and can hide the Location references in the foreign table.

Vantage foreign tables use the “..” (double dot or double period) operator to separate the object name from the column name.

Run the following SQL command to create a view.

REPLACE VIEW WeatherData_view AS (
SELECT
CAST(payload..postal_code AS VARCHAR(10)) Postal_code,
CAST(payload..country AS CHAR(2)) Country,
CAST(payload..time_valid_utc AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_UTC,
CAST(payload..doy_utc AS INTEGER) DOY_UTC,
CAST(payload..hour_utc AS INTEGER) Hour_UTC,
CAST(payload..time_valid_lcl AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_LCL,
CAST(payload..dst_offset_minutes AS INTEGER) DST_Offset_Minutes,
CAST(payload..temperature_air_2m_f AS DECIMAL(4,1)) Temperature_Air_2M_F,
CAST(payload..temperature_wetbulb_2m_f AS DECIMAL(3,1)) Temperature_Wetbulb_2M_F,
CAST(payload..temperature_dewpoint_2m_f AS DECIMAL(3,1)) Temperature_Dewpoint_2M_F,
CAST(payload..temperature_feelslike_2m_f AS DECIMAL(4,1)) Temperature_Feelslike_2M_F,
CAST(payload..temperature_windchill_2m_f AS DECIMAL(4,1)) Temperature_Windchill_2M_F,
CAST(payload..temperature_heatindex_2m_f AS DECIMAL(4,1)) Temperature_Heatindex_2M_F,
CAST(payload..humidity_relative_2m_pct AS DECIMAL(3,1)) Humidity_Relative_2M_Pct,
CAST(payload..humidity_specific_2m_gpkg AS DECIMAL(3,1)) Humdity_Specific_2M_GPKG,
CAST(payload..pressure_2m_mb AS DECIMAL(5,1)) Pressure_2M_Mb,
CAST(payload..pressure_tendency_2m_mb AS DECIMAL(2,1)) Pressure_Tendency_2M_Mb,
CAST(payload..pressure_mean_sea_level_mb AS DECIMAL(5,1)) Pressure_Mean_Sea_Level_Mb,
CAST(payload..wind_speed_10m_mph AS DECIMAL(3,1)) Wind_Speed_10M_MPH,
CAST(payload..wind_direction_10m_deg AS DECIMAL(4,1)) Wind_Direction_10M_Deg,
CAST(payload..wind_speed_80m_mph AS DECIMAL(3,1)) Wind_Speed_80M_MPH,
CAST(payload..wind_direction_80m_deg AS DECIMAL(4,1)) Wind_Direction_80M_Deg,
CAST(payload..wind_speed_100m_mph AS DECIMAL(3,1)) Wind_Speed_100M_MPH,
CAST(payload..wind_direction_100m_deg AS DECIMAL(4,1)) Wind_Direction_100M_Deg,
CAST(payload..precipitation_in AS DECIMAL(3,2)) Precipitation_in,
CAST(payload..snowfall_in AS DECIMAL(3,2)) Snowfall_in,
CAST(payload..cloud_cover_pct AS INTEGER) Cloud_Cover_Pct,
CAST(payload..radiation_solar_total_wpm2 AS DECIMAL(5,1)) Radiation_Solar_Total_WPM2
FROM WeatherData
)
;

Run the following SQL command to validate the view.

SELECT * FROM WeatherData_view SAMPLE 10;
Figure 15 WeatherData_view

Now that you have created a view, you can easily reference the object store data in a query and combine it with other tables, both relational tables in Vantage as well as foreign tables in an object store. This allows you to leverage the full analytic capabilities of Vantage on 100% of the data, no matter where the data is located.

Load Data from Amazon S3 into Vantage (optional)

Having a persistent copy of the Amazon S3 data can be useful when repetitive access to the same data is expected. NOS does not automatically make a persistent copy of the Amazon S3 data. Each time you refer a foreign table, Vantage will fetch the data from Amazon S3. (Some data may be cached, but this depends on the size of the data in Amazon S3 and other active workloads in Vantage.)

In addition, you may be charged network fees for data transferred from Amazon S3. If you will be referencing the data in Amazon S3 multiple times, you may reduce your cost by loading it into Vantage, even temporarily.

You can select among the approaches below to load the data into Vantage.

Create the table and load the data in a single statement

You can use a single statement to both create the table and load the data. You can choose the desired attributes from the foreign table payload and what they will be called in the relational table.

A CREATE TABLE AS … WITH DATA statement can be used with the foreign table definition as the source table.

Run the following SQL command to create the relational table and load the data.

CREATE MULTISET TABLE WeatherData_temp AS (
SELECT
CAST(payload..postal_code AS VARCHAR(10)) Postal_code,
CAST(payload..country AS CHAR(2)) Country,
CAST(payload..time_valid_utc AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_UTC,
CAST(payload..doy_utc AS INTEGER) DOY_UTC,
CAST(payload..hour_utc AS INTEGER) Hour_UTC,
CAST(payload..time_valid_lcl AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_LCL,
CAST(payload..dst_offset_minutes AS INTEGER) DST_Offset_Minutes,
CAST(payload..temperature_air_2m_f AS DECIMAL(4,1)) Temperature_Air_2M_F,
CAST(payload..temperature_wetbulb_2m_f AS DECIMAL(3,1)) Temperature_Wetbulb_2M_F,
CAST(payload..temperature_dewpoint_2m_f AS DECIMAL(3,1)) Temperature_Dewpoint_2M_F,
CAST(payload..temperature_feelslike_2m_f AS DECIMAL(4,1)) Temperature_Feelslike_2M_F,
CAST(payload..temperature_windchill_2m_f AS DECIMAL(4,1)) Temperature_Windchill_2M_F,
CAST(payload..temperature_heatindex_2m_f AS DECIMAL(4,1)) Temperature_Heatindex_2M_F,
CAST(payload..humidity_relative_2m_pct AS DECIMAL(3,1)) Humidity_Relative_2M_Pct,
CAST(payload..humidity_specific_2m_gpkg AS DECIMAL(3,1)) Humdity_Specific_2M_GPKG,
CAST(payload..pressure_2m_mb AS DECIMAL(5,1)) Pressure_2M_Mb,
CAST(payload..pressure_tendency_2m_mb AS DECIMAL(2,1)) Pressure_Tendency_2M_Mb,
CAST(payload..pressure_mean_sea_level_mb AS DECIMAL(5,1)) Pressure_Mean_Sea_Level_Mb,
CAST(payload..wind_speed_10m_mph AS DECIMAL(3,1)) Wind_Speed_10M_MPH,
CAST(payload..wind_direction_10m_deg AS DECIMAL(4,1)) Wind_Direction_10M_Deg,
CAST(payload..wind_speed_80m_mph AS DECIMAL(3,1)) Wind_Speed_80M_MPH,
CAST(payload..wind_direction_80m_deg AS DECIMAL(4,1)) Wind_Direction_80M_Deg,
CAST(payload..wind_speed_100m_mph AS DECIMAL(3,1)) Wind_Speed_100M_MPH,
CAST(payload..wind_direction_100m_deg AS DECIMAL(4,1)) Wind_Direction_100M_Deg,
CAST(payload..precipitation_in AS DECIMAL(3,2)) Precipitation_in,
CAST(payload..snowfall_in AS DECIMAL(3,2)) Snowfall_in,
CAST(payload..cloud_cover_pct AS INTEGER) Cloud_Cover_Pct,
CAST(payload..radiation_solar_total_wpm2 AS DECIMAL(5,1)) Radiation_Solar_Total_WPM2
FROM WeatherData
Where Postal_Code = '36101'
)
WITH DATA
NO PRIMARY INDEX
;

Run the following SQL command to validate the contents of the table.

SELECT * FROM WeatherData_temp SAMPLE 10;
Figure 16 WeatherData_temp

Create the table and load the data in multiple statements

You can also use multiple statements to first create the relational table and then load the data. An advantage of this choice is that you can perform multiple loads, possibly selecting different data or loading in smaller increments if the object is very large.

Run the following SQL command to create the relational table.

CREATE MULTISET TABLE WeatherData_temp (
Postal_code VARCHAR(10),
Country CHAR(2),
Time_Valid_UTC TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS',
DOY_UTC INTEGER,
Hour_UTC INTEGER,
Time_Valid_LCL TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS',
DST_Offset_Minutes INTEGER,
Temperature_Air_2M_F DECIMAL(4,1),
Temperature_Wetbulb_2M_F DECIMAL(3,1),
Temperature_Dewpoint_2M_F DECIMAL(3,1),
Temperature_Feelslike_2M_F DECIMAL(4,1),
Temperature_Windchill_2M_F DECIMAL(4,1),
Temperature_Heatindex_2M_F DECIMAL(4,1),
Humidity_Relative_2M_Pct DECIMAL(3,1),
Humdity_Specific_2M_GPKG DECIMAL(3,1),
Pressure_2M_Mb DECIMAL(5,1),
Pressure_Tendency_2M_Mb DECIMAL(2,1),
Pressure_Mean_Sea_Level_Mb DECIMAL(5,1),
Wind_Speed_10M_MPH DECIMAL(3,1),
Wind_Direction_10M_Deg DECIMAL(4,1),
Wind_Speed_80M_MPH DECIMAL(3,1),
Wind_Direction_80M_Deg DECIMAL(4,1),
Wind_Speed_100M_MPH DECIMAL(3,1),
Wind_Direction_100M_Deg DECIMAL(4,1),
Precipitation_in DECIMAL(3,2),
Snowfall_in DECIMAL(3,2),
Cloud_Cover_Pct INTEGER,
Radiation_Solar_Total_WPM2 DECIMAL(5,1)
)
UNIQUE PRIMARY INDEX ( Postal_Code, Time_Valid_UTC )
;

Run the following SQL to load the data into the table.

INSERT INTO WeatherData_temp
SELECT
CAST(payload..postal_code AS VARCHAR(10)) Postal_code,
CAST(payload..country AS CHAR(2)) Country,
CAST(payload..time_valid_utc AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_UTC,
CAST(payload..doy_utc AS INTEGER) DOY_UTC,
CAST(payload..hour_utc AS INTEGER) Hour_UTC,
CAST(payload..time_valid_lcl AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_LCL,
CAST(payload..dst_offset_minutes AS INTEGER) DST_Offset_Minutes,
CAST(payload..temperature_air_2m_f AS DECIMAL (4,1)) Temperature_Air_2M_F,
CAST(payload..temperature_wetbulb_2m_f AS DECIMAL(3,1)) Temperature_Wetbulb_2M_F,
CAST(payload..temperature_dewpoint_2m_f AS DECIMAL(3,1)) Temperature_Dewpoint_2M_F,
CAST(payload..temperature_feelslike_2m_f AS DECIMAL(4,1)) Temperature_Feelslike_2M_F,
CAST(payload..temperature_windchill_2m_f AS DECIMAL(4,1)) Temperature_Windchill_2M_F,
CAST(payload..temperature_heatindex_2m_f AS DECIMAL(4,1)) Temperature_Heatindex_2M_F,
CAST(payload..humidity_relative_2m_pct AS DECIMAL(3,1)) Humidity_Relative_2M_Pct,
CAST(payload..humidity_specific_2m_gpkg AS DECIMAL(3,1)) Humdity_Specific_2M_GPKG,
CAST(payload..pressure_2m_mb AS DECIMAL(5,1)) Pressure_2M_Mb,
CAST(payload..pressure_tendency_2m_mb AS DECIMAL(2,1)) Pressure_Tendency_2M_Mb,
CAST(payload..pressure_mean_sea_level_mb AS DECIMAL(5,1)) Pressure_Mean_Sea_Level_Mb,
CAST(payload..wind_speed_10m_mph AS DECIMAL(3,1)) Wind_Speed_10M_MPH,
CAST(payload..wind_direction_10m_deg AS DECIMAL(4,1)) Wind_Direction_10M_Deg,
CAST(payload..wind_speed_80m_mph AS DECIMAL(3,1)) Wind_Speed_80M_MPH,
CAST(payload..wind_direction_80m_deg AS DECIMAL(4,1)) Wind_Direction_80M_Deg,
CAST(payload..wind_speed_100m_mph AS DECIMAL(3,1)) Wind_Speed_100M_MPH,
CAST(payload..wind_direction_100m_deg AS DECIMAL(4,1)) Wind_Direction_100M_Deg,
CAST(payload..precipitation_in AS DECIMAL(3,2)) Precipitation_in,
CAST(payload..snowfall_in AS DECIMAL(3,2)) Snowfall_in,
CAST(payload..cloud_cover_pct AS INTEGER) Cloud_Cover_Pct,
CAST(payload..radiation_solar_total_wpm2 AS DECIMAL(5,1)) Radiation_Solar_Total_WPM2
FROM WeatherData
WHERE Postal_Code = '30301'
;

Run the following SQL command to validate the contents of the table.

SELECT * FROM WeatherData_temp SAMPLE 10;
Figure 17 WeatherData_temp

READ_NOS — An alternative method for foreign tables

An alternative to defining a foreign table is to use the READ_NOS table operator. This table operator allows you to access data directly from an object store without first creating a foreign table, or viewing a list of the keys associated with all the objects specified by a Location clause.

You can use the READ_NOS table operator to explore the data in an object.

Run the following command to explore the data in an object.

SELECT TOP 5 payload..*
FROM READ_NOS (
ON (
SELECT CAST( NULL AS DATASET STORAGE FORMAT CSV))
USING
LOCATION ('/S3/s3.amazonaws.com/dataexchange-dataset/')
ACCESS_ID('A*****')
ACCESS_KEY('*****')
) AS THE_TABLE
ORDER BY 1
;

The LOCATION requires a top-level name, which is the bucket name. This is highlighted above in yellow. You will need to replace this with your own bucket name.

Replace the string for ACCESS_ID with your AWS access key.

Replace the string for ACCESS_KEY with your AWS secret access key.

Figure 18 READ_NOS

You can also leverage the READ_NOS table operator to get the length (size) of the object.

Run the following SQL command to view the size of the object.

SELECT location(CHAR(120)), ObjectLength
FROM READ_NOS (
ON (
SELECT CAST( NULL AS DATASET STORAGE FORMAT CSV))
USING
LOCATION ('/S3/s3.amazonaws.com/dataexchange-dataset/')
ACCESS_ID('A*****')
ACCESS_KEY('*****')
RETURNTYPE('NOSREAD_KEYS')
) AS THE_TABLE
ORDER BY 1
;

As above, replace the values for LOCATION, ACCESS_ID, and ACCESS_KEY.

Figure 19 READ_NOS object length

You can substitute the NOS_READ table operator for a foreign table definition in the above section for loading the data into a relational table.

CREATE MULTISET TABLE WeatherData_temp AS (
SELECT
CAST(payload..postal_code AS VARCHAR(10)) Postal_code,
CAST(payload..country AS CHAR(2)) Country,
CAST(payload..time_valid_utc AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_UTC,
CAST(payload..doy_utc AS INTEGER) DOY_UTC,
CAST(payload..hour_utc AS INTEGER) Hour_UTC,
CAST(payload..time_valid_lcl AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI:SS') Time_Valid_LCL,
CAST(payload..dst_offset_minutes AS INTEGER) DST_Offset_Minutes,
CAST(payload..temperature_air_2m_f AS DECIMAL (4,1)) Temperature_Air_2M_F,
CAST(payload..temperature_wetbulb_2m_f AS DECIMAL(3,1)) Temperature_Wetbulb_2M_F,
CAST(payload..temperature_dewpoint_2m_f AS DECIMAL(3,1)) Temperature_Dewpoint_2M_F,
CAST(payload..temperature_feelslike_2m_f AS DECIMAL(4,1)) Temperature_Feelslike_2M_F,
CAST(payload..temperature_windchill_2m_f AS DECIMAL(4,1)) Temperature_Windchill_2M_F,
CAST(payload..temperature_heatindex_2m_f AS DECIMAL(4,1)) Temperature_Heatindex_2M_F,
CAST(payload..humidity_relative_2m_pct AS DECIMAL(3,1)) Humidity_Relative_2M_Pct,
CAST(payload..humidity_specific_2m_gpkg AS DECIMAL(3,1)) Humdity_Specific_2M_GPKG,
CAST(payload..pressure_2m_mb AS DECIMAL(5,1)) Pressure_2M_Mb,
CAST(payload..pressure_tendency_2m_mb AS DECIMAL(2,1)) Pressure_Tendency_2M_Mb,
CAST(payload..pressure_mean_sea_level_mb AS DECIMAL(5,1)) Pressure_Mean_Sea_Level_Mb,
CAST(payload..wind_speed_10m_mph AS DECIMAL(3,1)) Wind_Speed_10M_MPH,
CAST(payload..wind_direction_10m_deg AS DECIMAL(4,1)) Wind_Direction_10M_Deg,
CAST(payload..wind_speed_80m_mph AS DECIMAL(3,1)) Wind_Speed_80M_MPH,
CAST(payload..wind_direction_80m_deg AS DECIMAL(4,1)) Wind_Direction_80M_Deg,
CAST(payload..wind_speed_100m_mph AS DECIMAL(3,1)) Wind_Speed_100M_MPH,
CAST(payload..wind_direction_100m_deg AS DECIMAL(4,1)) Wind_Direction_100M_Deg,
CAST(payload..precipitation_in AS DECIMAL(3,2)) Precipitation_in,
CAST(payload..snowfall_in AS DECIMAL(3,2)) Snowfall_in,
CAST(payload..cloud_cover_pct AS INTEGER) Cloud_Cover_Pct,
CAST(payload..radiation_solar_total_wpm2 AS DECIMAL(5,1)) Radiation_Solar_Total_WPM2
FROM READ_NOS (
ON (
SELECT CAST( NULL AS DATASET STORAGE FORMAT CSV))
USING
LOCATION ('/S3/s3.amazonaws.com/dataexchange-dataset/')
ACCESS_ID('**********')
ACCESS_KEY('***********')
) AS THE_TABLE
Where Postal_Code = '36101'
)
WITH DATA
;

Configure Teradata Parallel Transporter

If you do not have access to Native Object Store (NOS) or do not want to use NOS, you can use the Teradata Parallel Transporter (TPT) utility to import data into Vantage. Using this method means that you must load the data into Vantage first before performing analysis on the data.

TPT works on Windows, Mac, and Linux platforms. This article demonstrates TPT on Windows but the instructions are applicable to the Mac and Linux platforms.

Install Teradata Tools and Utilities

If already have Teradata Tools and Utilities (TTU) installed, you can skip to the next section, Setup AWS Configuration Files.

Download Teradata Tools and Utilities — Windows to your Windows system. While this will download the current version of TTU, it will work with previous versions of Vantage.

Figure 20 Download Teradata Tools and Utilities (TTU)

Unzip the download file.

Run Setup.bat.

At the minimum, install the following packages:

  • Teradata Parallel Transporter Base
  • Teradata Access Module for Amazon S3
Figure 21 Select TPT Base and Access Module for Amazon S3

The installer will add any necessary dependent packages. You may find it useful to add the following features:

o ODBC Driver for Teradata

o Teradata GSS Administration Package

o BTEQ

o .Net Data Provider for Teradata

o Teradata Access Module for Azure

The installer will automatically uninstall previous versions of any features that you have selected. If you have any third-party dependencies that require a specific version that you already have installed, you should de-select the Uninstall previous versions checkbox.

Setup AWS configuration files

The Access Module for Amazon S3 uses the AWS configuration files for credentials and the region.

If you have previously installed the AWS CLI, you can use the AWS configure command at the command prompt to both view and edit your default access credentials and region.

Figure 22 Configure AWS CLI

By default, the two configuration files are located in your %USERPROFILE%\.aws directory. (Note that there is a period in front of “aws”.)

You can also create the files manually, which is necessary if you do not have the AWS CLI installed.

The config file holds the default region for actions on AWS.

[default]
Region = us-east-1

The credentials file holds the default access keys for actions on AWS.

[default]
aws_access_key_id = ABCDEFGHIJKL
aws_secret_access_key = 1234567890

The region and access keys specified in these two files must be able to access the S3 bucket you created.

The Access Module for Amazon S3 needs the location of the AWS configuration files. It uses the HOME environment variable for this value. You will need to create it manually.

Open the System control panel applet by going into Control Panel > System and Security > System.

Figure 23 System control panel

Click on Advanced system settings.

Figure 24 System properties

Click on Environment Variables….

Click on New… in the User variables section.

Figure 25 User environment variables

Enter HOME for the Variable name.

Enter the directory where your AWS configuration files are located for the Variable value. You may use the configuration files created by the AWS CLI, or you may create a separate directory that holds the configuration files that you created (or copied from the AWS CLI).

Figure 26 User HOME environment variable

Load data from Amazon S3 into Vantage using TPT

TPT uses the following script files to load data into Vantage:

  • Job variables
  • Create table
  • Load data

Create the job variables file

The job variables file defines the values for variables used by TPT when running jobs. Using a job variables file allows the variables to be defined once and then used in multiple jobs.

Create the following jobVars script file on your Windows system.

/********************************************************/
/* TPT attributes - Common for all Samples */
/********************************************************/
TargetUserName = '<TargetUserName>'
,TargetUserPassword = '<TargetUserPassword>'
,TargetTdpId = '<VantageSystemName>'
,TargetErrorList = [ '3706','3803','3807' ]
,DDLPrivateLogName = 'DDL_OPERATOR_LOG'
/********************************************************/
/* TPT LOAD Operator attributes */
/********************************************************/
,LoadPrivateLogName = 'LOAD_OPERATOR_LOG'
,LoadTargetTable = 'onpoint'
,LoadLogTable = 'onpoint_log'
,LoadErrorTable1 = 'onpoint_e1'
,LoadErrorTable2 = 'onpoint_e2'
/********************************************************/
/* TPT DataConnector Producer Operator */
/********************************************************/
,FileReaderFormat = 'Delimited'
,FileReaderTextDelimiter = ','
,FileReaderPrivateLogName = 'onpointP2_1'
,FileReaderFileName = 'onpointDT'
,FileReaderDirectoryPath = '.'
,FileReaderOpenMode = 'Read'
/********************************************************/
/* APPLY STATEMENT parameters */
/********************************************************/
,LoadInstances = 1
,FileReaderInstances = 1

Replace <TargetUserName>, <TargetUserPassword>, and <VantageSystemName> with appropriate values. (Remember to remove the brackets, too.)

Create the CREATE TABLE script file

The CREATE TABLE script file cleans up the temporary load tables (if any) and creates the target table in Vantage. You only need to run this script file if the target table needs to be created.

Create the following CreateTable script file on your Windows system.

DEFINE JOB CREATE_TABLE
DESCRIPTION 'Clean up and create target table'
(
STEP CLEANUP_CREATE_TABLE_STEP
(
APPLY
(' DROP TABLE WeatherData '),
(' DROP TABLE WeatherData_e1 '),
(' DROP TABLE WeatherData_e2 '),
(' DROP TABLE WeatherData_log '),
('CREATE MULTISET TABLE WeatherData
(
Postal_code VARCHAR(5),
Country VARCHAR(3),
Time_Valid_UTC VARCHAR(20),
DOY_UTC VARCHAR(2),
Hour_UTC VARCHAR(3),
Time_Valid_LCL VARCHAR(20),
DST_Offset_Minutes VARCHAR(5),
Temperature_Air_2M_F VARCHAR(5),
Temperature_Wetbulb_2M_F VARCHAR(5),
Temperature_Dewpoint_2M_F VARCHAR(5),
Temperature_Feelslike_2M_F VARCHAR(5),
Temperature_Windchill_2M_F VARCHAR(5),
Temperature_Heatindex_2M_F VARCHAR(5),
Humidity_Relative_2M_Pct VARCHAR(5),
Humdity_Specific_2M_GPKG VARCHAR(5),
Pressure_2M_Mb VARCHAR(8),
Pressure_Tendency_2M_Mb VARCHAR(5),
Pressure_Mean_Sea_Level_Mb VARCHAR(8),
Wind_Speed_10M_MPH VARCHAR(5),
Wind_Direction_10M_Deg VARCHAR(8),
Wind_Speed_80M_MPH VARCHAR(5),
Wind_Direction_80M_Deg VARCHAR(8),
Wind_Speed_100M_MPH VARCHAR(5),
Wind_Direction_100M_Deg VARCHAR(8),
Precipitation_in VARCHAR(5),
Snowfall_in VARCHAR(5),
Cloud_Cover_Pct VARCHAR(5),
Radiation_Solar_Total_WPM2 VARCHAR(5)
);
')
TO OPERATOR ($DDL);
);
);

Create the load job script file

The load job script file performs the actual loading of data from the Amazon S3 bucket into Vantage.

Create the following LoadDataFromS3 script file on your Windows system.

DEFINE JOB LOAD_DATA_FROM_S3_TO_VANTAGE
DESCRIPTION 'Load data from Amazon S3 bucket into Vantage'
(
SET TargetTable = 'WeatherData';
STEP IMPORT_THE_DATA
(
APPLY $INSERT TO OPERATOR ($LOAD)
SELECT * FROM OPERATOR ($FILE_READER ()
ATTR
(
SkipRows = 1,
AccessModuleName = 'libs3axsmod.dll',
AccessModuleInitStr = 'S3Bucket=<S3BucketName> S3Prefix="/"
S3Object=onpoint_history_postal-code_hour_201801010000–201812312359.csv S3SinglePartFile=True'
)
);
);
);

The SkipRows parameter determines how many rows will be ignored at the start of the file. Typically, this is one (1) because the first row has the column headers, not the data.

Replace the S3BucketName parameter with the name of your bucket. (We are using dataexchange‑dataset in this article. Your bucket name will be different.) Remember to remove the brackets.

If the dataset object is at the “root” of your bucket, you may leave the S3Prefix parameter as “/”. If you have created “directories”, then you will need to replace the “directory” path, ensuring that it ends as a forward slash. (Technically, Amazon S3 does not support directories, but it acts as if it does.)

The S3Object parameter defines the name of the object to be loaded. onpoint_history_postal‑code_hour_201801010000‑201812312359.csv is the name of the object used in this article.

Run the scripts

Ensure that all of the script files are in the same directory.

In your command prompt, change to the directory with the script files.

Run the following command to create the target table.

tbuild -f CreateTable -v jobVars -j createTable

Run the following command to load the target table.

tbuild -f LoadDataFromS3 -v jobVars -j loadData

Query the Data Set in Vantage

Login to your Vantage system with Teradata Studio.

Run the following SQL command to validate the contents of the table.

SELECT * FROM WeatherData SAMPLE 10;
Figure 27. Final Results

Further Information

Don’t forget to go to teradata.com to find more information about Teradata products and follow us on Medium to get notifications on news and important developer content.

And remember…. Always have fun, and happy coding!!

--

--

Sebastian
Teradata

Data & Analytics Nerd. Product Manager. Former Enterprise Architect. Professor. Speaker. Amateur Musician. Traveller