Import Data from the Web in the Repods Cloud Data Platform

Record Evolution
5 min readDec 29, 2018

--

Author: Marko Petzold from Record Evolution

Image by Johannes Groll

In this article, you will learn how to import web data into the Repods Cloud Data Warehouse Platform based on an existing data set. To import files, you first need to have an account and a Data Pod in the platform. Sign up for a free Data Pod here.

Example Use Cases

  1. You have a portfolio of stocks and want to import the stocks’ close prices daily. You want to narrow down your criteria as you want to import the close prices of the stocks from your portfolio only and not the close prices of all stocks globally.
  2. You have a real estate portfolio and want to track the weather conditions at your real estate locations by importing weather data relevant to these locations only.
  3. You have a set of companies and want to perform a sentiment analysis based on tweets referencing these companies only. In this case, you will need a Twitter import instead of a web import.

Prepare Your Demo Portfolio

Your Data Pod already contains a demo portfolio. Once you have signed up to Repods and have created a Data Pod, you can create a simple csv file stock_symbol.csv with the following content:

stock_symbol
MMM
AXP
AAPL
BA
CAT

Enter your Pod, go to the Import Panel, and click on File Import.

Figure 1. Import New File

The list of Available Files should already include your uploaded file. To use a shortcut to load the file into a table for our demo, click on the file name and then click on Quick Load to Raw.

This action creates a new table S_TMP_STOCK_SYMBOL_1 and loads the file into this table as a new data package with a new package_id. Now the Import overview panel contains a new line representing this new table. To preview the data, click on the grey stripe.

Figure 2. Import Overview

Configure the Web Import

To set up the Web Import, go to the Import Panel again and click on the Web Import panel.

Figure 3. Import Data from Web

Expand the Web Polling panel and hit the small orange plus sign to create a new Web Import rule.

To obtain examples for our demo upload, we will now use the free service provided by IEX trading. For instance, if you want to get a quote for the Apple stock, you can simply follow the link below. This is the process that we want to automate now.

https://iextrading.com/apps/stocks/AAPL

Now enter the following fields as shown in the screenshot:

Figure 4. Enter Web Poll Rule

Note that we have replaced the stock symbol part “aapl” in the original URL with the term “{stock_symbol}” surrounded by curly braces. This is a template expression meaning that we want to replace that part by the “stock_symbol” that we define below in the SQL panel.

To test your template request, you can click on Test API. The test result will be displayed in a panel popping up from below.

Figure 5. Test Result

Now we are going to take our portfolio of stocks in the table S_TMP_STOCK_SYMBOL_1 and edit the SQL Query in the panel to look like this:

Figure 6. Edit the SQL Query

Click on Test API again to see the quotes for the stocks in your portfolio. Loading may take a few seconds depending on the IEX service quality.

Figure 7. Display Quotes

You can now save the Web Import Rule by hitting Save. Get back to the Web Import Rule List and hit the green Play button to activate the rule.

Figure 8. Activate the Rule

Now you can watch the stock quotes relevant to your portfolio populating your table:

Figure 9. Stock Quotes

Clicking on one of these boxes shows you the content of that package. The payload column contains the request-response as a text field called a payload. Since the type of response cannot be known in advance (json or csv or …), the response is first stored as a text. To convert this into native JSON in PostgreSQL (e.g. in a Workbook), you can do the following:

SELECT
payload::jsonb
FROM
S_TMP_STOCK_QUOTE

Here you can find further details on how to handle JSON values in PostgreSQL.

Conclusion

Now you know how to use web resources to enhance your existing data in real time. It is recommended that you use a Core Table as the basis for your web requests in place of the Raw Table S_TMP_STOCK_SYMBOL_1. Note that even if you modify the items in your portfolio (e.g. add stocks), these modifications will be automatically included in your future Web Import requests.

About me:
My name is Marko Petzold from Record Evolution, the creators of the Repods Platform. I hold a PhD in Mathematics from the University of Münster, Germany and have extensive experience in data engineering, analytics, and data science.

--

--