OBIEE and Pivotal HAWQ
Oracle Business Intelligence Enterprise Edition 11g (OBIEE) is a comprehensive business intelligence platform that delivers a full range of capabilities — including interactive dashboards, ad hoc queries, notifications and alerts, enterprise and financial reporting, scorecard and strategy management, business process invocation, search and collaboration, mobile, integrated systems management and more.
HAWQ is a parallel SQL query engine that combines the key technological advantages of the industry-leading Greenplum Database with the scalability and convenience of Hadoop. Using HAWQ functionality, you can interact with petabyte range data sets. HAWQ provides a complete, standards-compliant SQL interface.
Pivotal HAWQ enables data analysis for a variety of Hadoop-based data formats using the Pivotal Extension Framework (PXF), without duplicating or converting HBase files.
In this blog post we will look at setting up OBIEE to run SampleAppLite dashboards sourcing data from Hadoop using Pivotal HAWQ and PXF.
Technology Stack :-
- Oracle Business Intelligence 11g (11.1.1.9.0)
- Pivotal HAWQ 1.3.0.3
- AWS EC2 m3.large instance type
- AWS EC2 t2.small instance type
- Hortonworks HDP 2.2 Sandbox on Microsoft Azure(Basic A3 (4 Cores, 7 GB))
Initial Setup :-
Setup OBIEE 11g(11.1.1.9.0) in AWS EC2 by following the steps in the blog post OBIEE SampleApp in The Cloud
Setup OBIEE 11g(11.1.1.9.0) Developer Client Tools in AWS EC2 t2.small instance
Launch Hortonworks HDP 2.2 Sandbox on Microsoft Azure by following the steps in the blog post Getting Started with Hortonworks Sandbox on Azure
Setup Pivotal HAWQ and PXF by following steps in the blog post Pivotal HAWQ Lands In The Hortonworks Sandbox
Setting up SampleAppLite data store in HAWQ :-
Step 1:- Placing the source XML files in HDFS
The SampleAppLite source data from the following XML files
SAMP_OFFICES_D.xml
SAMP_PRODUCTS_D.xml
SAMP_REVENUE_A.xml
SAMP_REVENUE_F.xml
SAMP_TARGETS_F.xml
SAMP_TIME_DAY_D.xml
SAMP_TIME_MTH_D.xml
Place the above xml files in HDFS
Ex:- hdfs dfs -mkdir /obi_hdfs
hdfs dfs -put SAMP_OFFICES_D.xml /obi_hdfs
Step 2:- Transform XML files to tab delimited files with Pig
XML files are converted to tab delimited files using the Pig scripts shown below
SAMP_Dimentions_Pig_Script :-REGISTER piggybank.jar;
A = LOAD '/obi_hdfs/SAMP_PRODUCTS_D.xml'
USING org.apache.pig.piggybank.storage.XMLLoader('SAMP_PRODUCTS_D')
AS (doc:chararray);clean = foreach A generate REPLACE(REPLACE(REPLACE(REPLACE(doc, '\\u000D', '|'), '\\u000A', '|'), '> <', '> <'), '\t', ' ');
store clean into '/obi_hdfs/SAMP_PRODUCTS_D';B = LOAD '/obi_hdfs/SAMP_OFFICES_D.xml'
USING org.apache.pig.piggybank.storage.XMLLoader('SAMP_OFFICES_D')
AS (doc:chararray);clean = foreach B generate REPLACE(REPLACE(REPLACE(REPLACE(doc, '\\u000D', '|'), '\\u000A', '|'), '> <', '> <'), '\t', ' ');
store clean into '/obi_hdfs/SAMP_OFFICES_D';C = LOAD '/obi_hdfs/SAMP_TIME_DAY_D.xml'
USING org.apache.pig.piggybank.storage.XMLLoader('SAMP_TIME_DAY_D')
AS (doc:chararray);clean = foreach C generate REPLACE(REPLACE(REPLACE(REPLACE(doc, '\\u000D', '|'), '\\u000A', '|'), '> <', '> <'), '\t', ' ');
store clean into '/obi_hdfs/SAMP_TIME_DAY_D';D = LOAD '/obi_hdfs/SAMP_TIME_MTH_D.xml'
USING org.apache.pig.piggybank.storage.XMLLoader('SAMP_TIME_MTH_D')
AS (doc:chararray);clean = foreach D generate REPLACE(REPLACE(REPLACE(REPLACE(doc, '\\u000D', '|'), '\\u000A', '|'), '> <', '> <'), '\t', ' ');
store clean into '/obi_hdfs/SAMP_TIME_MTH_D';SAMP_Facts_Pig_Script :-REGISTER piggybank.jar;
A = LOAD '/obi_hdfs/SAMP_REVENUE_A.xml'
USING org.apache.pig.piggybank.storage.XMLLoader('SAMP_REVENUE_A')
AS (doc:chararray);clean = foreach A generate REPLACE(REPLACE(REPLACE(REPLACE(doc, '\\u000D', '|'), '\\u000A', '|'), '> <', '> <'), '\t', ' ');
store clean into '/obi_hdfs/SAMP_REVENUE_A';B = LOAD '/obi_hdfs/SAMP_REVENUE_F.xml'
USING org.apache.pig.piggybank.storage.XMLLoader('SAMP_REVENUE_F')
AS (doc:chararray);clean = foreach B generate REPLACE(REPLACE(REPLACE(REPLACE(doc, '\\u000D', '|'), '\\u000A', '|'), '> <', '> <'), '\t', ' ');
store clean into '/obi_hdfs/SAMP_REVENUE_F';C = LOAD '/obi_hdfs/SAMP_TARGETS_F.xml'
USING org.apache.pig.piggybank.storage.XMLLoader('SAMP_TARGETS_F')
AS (doc:chararray);clean = foreach C generate REPLACE(REPLACE(REPLACE(REPLACE(doc, '\\u000D', '|'), '\\u000A', '|'), '> <', '> <'), '\t', ' ');
store clean into '/obi_hdfs/SAMP_TARGETS_F';
Step 3 :- Create External tables in HAWQCREATE EXTERNAL TABLE SAMP_TIME_MTH_D_T
(
xml_data text
)
LOCATION (
'pxf://sandbox.hortonworks.com:51200/obi_hdfs/SAMP_TIME_MTH_D/part*?profile=HdfsTextSimple'
)
FORMAT 'TEXT' (delimiter E'\t');CREATE EXTERNAL TABLE SAMP_TIME_DAY_D_T
(
xml_data text
)
LOCATION (
'pxf://sandbox.hortonworks.com:51200/obi_hdfs/SAMP_TIME_DAY_D/part*?profile=HdfsTextSimple'
)
FORMAT 'TEXT' (delimiter E'\t');CREATE EXTERNAL TABLE SAMP_OFFICES_D_T
(
xml_data text
)
LOCATION (
'pxf://sandbox.hortonworks.com:51200/obi_hdfs/SAMP_OFFICES_D/part*?profile=HdfsTextSimple'
)
FORMAT 'TEXT' (delimiter E'\t');CREATE EXTERNAL TABLE SAMP_PRODUCTS_D_T
(
xml_data text
)
LOCATION (
'pxf://sandbox.hortonworks.com:51200/obi_hdfs/SAMP_PRODUCTS_D/part*?profile=HdfsTextSimple'
)
FORMAT 'TEXT' (delimiter E'\t');CREATE EXTERNAL TABLE SAMP_REVENUE_A_T
(
xml_data text
)
LOCATION (
'pxf://sandbox.hortonworks.com:51200/obi_hdfs/SAMP_REVENUE_A/part*?profile=HdfsTextSimple'
)
FORMAT 'TEXT' (delimiter E'\t');CREATE EXTERNAL TABLE SAMP_REVENUE_F_T
(
xml_data text
)
LOCATION (
'pxf://sandbox.hortonworks.com:51200/obi_hdfs/SAMP_REVENUE_F/part*?profile=HdfsTextSimple'
)
FORMAT 'TEXT' (delimiter E'\t');CREATE EXTERNAL TABLE SAMP_TARGETS_F_T
(
xml_data text
)
LOCATION (
'pxf://sandbox.hortonworks.com:51200/obi_hdfs/SAMP_TARGETS_F/part*?profile=HdfsTextSimple'
)
FORMAT 'TEXT' (delimiter E'\t');
Step 4:- Create simple SQL function to parse the dataCREATE OR REPLACE FUNCTION fn_extract_xml_value(p_tag text, p_xml text) RETURNS TEXT AS
$$
SELECT SPLIT_PART(SUBSTRING($2 FROM '<' || $1 || '>(.*)</' || $1 || '>'), '<', 1)
$$
LANGUAGE SQL;Step 5:- Create views in HAWQCREATE VIEW SAMP_TIME_MTH_D AS
SELECT
(fn_extract_xml_value('MTH_KEY', xml_data))::int AS MTH_KEY,
(fn_extract_xml_value('CAL_HALF', xml_data))::int AS CAL_HALF,
(fn_extract_xml_value('CAL_MONTH', xml_data))::int AS CAL_MONTH,
(fn_extract_xml_value('CAL_QTR', xml_data))::int AS CAL_QTR,
(fn_extract_xml_value('CAL_YEAR', xml_data))::int AS CAL_YEAR,
(fn_extract_xml_value('FSCL_HALF', xml_data))::int AS FSCL_HALF,
(fn_extract_xml_value('FSCL_MONTH', xml_data))::int AS FSCL_MONTH,
(fn_extract_xml_value('FSCL_QTR', xml_data))::int AS FSCL_QTR,
(fn_extract_xml_value('FSCL_YEAR', xml_data))::int AS FSCL_YEAR,
(fn_extract_xml_value('JULIAN_MONTH_NUM', xml_data))::int AS JULIAN_MONTH_NUM,
(fn_extract_xml_value('JULIAN_QTR_NUM', xml_data))::int AS JULIAN_QTR_NUM,
(fn_extract_xml_value('MONTH_AGO_WID', xml_data))::int AS MONTH_AGO_WID,
(fn_extract_xml_value('QUARTER_AGO_WID', xml_data))::int AS QUARTER_AGO_WID,
(fn_extract_xml_value('YEAR_AGO_WID', xml_data))::int AS YEAR_AGO_WID,
(fn_extract_xml_value('PER_NAME_HALF', xml_data))::text AS PER_NAME_HALF,
(fn_extract_xml_value('PER_NAME_MONTH', xml_data))::text AS PER_NAME_MONTH,
(fn_extract_xml_value('PER_NAME_QTR', xml_data))::text AS PER_NAME_QTR,
(fn_extract_xml_value('PER_NAME_YEAR', xml_data))::text AS PER_NAME_YEAR,
(fn_extract_xml_value('DAYS_IN_QTR', xml_data))::int AS DAYS_IN_QTR,
(fn_extract_xml_value('DAYS_IN_MTH', xml_data))::int AS DAYS_IN_MTH,
(fn_extract_xml_value('BEG_OF_QTR_WID', xml_data))::int AS BEG_OF_QTR_WID,
(fn_extract_xml_value('LOADED', xml_data))::int AS LOADED,
(fn_extract_xml_value('CAL_MONTH_END_DATE', xml_data))::date AS CAL_MONTH_END_DATE,
(fn_extract_xml_value('CAL_QTR_END_DATE', xml_data))::date AS CAL_QTR_END_DATE,
(fn_extract_xml_value('CAL_HALF_YEAR_END_DATE', xml_data))::date AS CAL_HALF_YEAR_END_DATE,
(fn_extract_xml_value('CAL_YEAR_END_DATE', xml_data))::date AS CAL_YEAR_END_DATE,
(fn_extract_xml_value('FSCL_MONTH_END_DATE', xml_data))::date AS FSCL_MONTH_END_DATE,
(fn_extract_xml_value('FSCL_QTR_END_DATE', xml_data))::date AS FSCL_QTR_END_DATE,
(fn_extract_xml_value('FSCL_HALF_YEAR_END_DATE', xml_data))::date AS FSCL_HALF_YEAR_END_DATE,
(fn_extract_xml_value('FSCL_YEAR_END_DATE', xml_data))::date AS FSCL_YEAR_END_DATE,
(fn_extract_xml_value('DAYS_IN_HALF_YEAR', xml_data))::int AS DAYS_IN_HALF_YEAR,
(fn_extract_xml_value('DAYS_IN_YEAR', xml_data))::int AS DAYS_IN_YEAR,
(fn_extract_xml_value('DAYS_IN_FSCL_MONTH', xml_data))::int AS DAYS_IN_FSCL_MONTH,
(fn_extract_xml_value('DAYS_IN_FSCL_QTR', xml_data))::int AS DAYS_IN_FSCL_QTR,
(fn_extract_xml_value('DAYS_IN_FSCL_HALF_YEAR', xml_data))::int AS DAYS_IN_FSCL_HALF_YEAR,
(fn_extract_xml_value('DAYS_IN_FSCL_YEAR', xml_data))::int AS DAYS_IN_FSCL_YEAR
FROM SAMP_TIME_MTH_D_T;CREATE VIEW SAMP_TIME_DAY_D AS
SELECT
(fn_extract_xml_value('CALENDAR_DATE', xml_data))::date AS CALENDAR_DATE,
(fn_extract_xml_value('DAY_KEY', xml_data))::int AS DAY_KEY,
(fn_extract_xml_value('DAY_OF_MONTH', xml_data))::int AS DAY_OF_MONTH,
(fn_extract_xml_value('DAY_OF_WEEK', xml_data))::int AS DAY_OF_WEEK,
(fn_extract_xml_value('DAY_OF_YEAR', xml_data))::int AS DAY_OF_YEAR,
(fn_extract_xml_value('JULIAN_DAY_NUM', xml_data))::int AS JULIAN_DAY_NUM,
(fn_extract_xml_value('JULIAN_MONTH_NUM', xml_data))::int AS JULIAN_MONTH_NUM,
(fn_extract_xml_value('JULIAN_QTR_NUM', xml_data))::int AS JULIAN_QTR_NUM,
(fn_extract_xml_value('JULIAN_WEEK_NUM', xml_data))::int AS JULIAN_WEEK_NUM,
(fn_extract_xml_value('PER_NAME_HALF', xml_data))::text AS PER_NAME_HALF,
(fn_extract_xml_value('PER_NAME_MONTH', xml_data))::text AS PER_NAME_MONTH,
(fn_extract_xml_value('PER_NAME_QTR', xml_data))::text AS PER_NAME_QTR,
(fn_extract_xml_value('PER_NAME_WEEK', xml_data))::text AS PER_NAME_WEEK,
(fn_extract_xml_value('PER_NAME_YEAR', xml_data))::text AS PER_NAME_YEAR
FROM SAMP_TIME_DAY_D_T;CREATE VIEW SAMP_OFFICES_D AS
SELECT
(fn_extract_xml_value('OFFICE_KEY', xml_data))::int AS OFFICE_KEY,
(fn_extract_xml_value('OFFICE_DSC', xml_data))::text AS OFFICE_DSC,
(fn_extract_xml_value('COMPANY', xml_data))::text AS COMPANY,
(fn_extract_xml_value('ORGANIZATION', xml_data))::text AS ORGANIZATION,
(fn_extract_xml_value('DEPARTMENT', xml_data))::text AS DEPARTMENT,
(fn_extract_xml_value('SEQUENCE', xml_data))::int AS SEQUENCE,
(fn_extract_xml_value('COMPANY_KEY', xml_data))::int AS COMPANY_KEY,
(fn_extract_xml_value('ORG_KEY', xml_data))::int AS ORG_KEY,
(fn_extract_xml_value('DEPT_KEY', xml_data))::int AS DEPT_KEY
FROM SAMP_OFFICES_D_T;CREATE VIEW SAMP_PRODUCTS_D AS
SELECT
(fn_extract_xml_value('PROD_KEY', xml_data))::int AS PROD_KEY,
(fn_extract_xml_value('PROD_DSC', xml_data))::text AS PROD_DSC,
(fn_extract_xml_value('ATTRIBUTE_2', xml_data))::text AS ATTRIBUTE_2,
(fn_extract_xml_value('ATTRIBUTE_1', xml_data))::text AS ATTRIBUTE_1,
(fn_extract_xml_value('TYPE', xml_data))::text AS TYPE,
(fn_extract_xml_value('LOB', xml_data))::text AS LOB,
(fn_extract_xml_value('BRAND', xml_data))::text AS BRAND,
(fn_extract_xml_value('SEQUENCE', xml_data))::int AS SEQUENCE,
(fn_extract_xml_value('BRAND_KEY', xml_data))::int AS BRAND_KEY,
(fn_extract_xml_value('LOB_KEY', xml_data))::int AS LOB_KEY,
(fn_extract_xml_value('TYPE_KEY', xml_data))::int AS TYPE_KEY
FROM SAMP_PRODUCTS_D_T;CREATE VIEW SAMP_REVENUE_F AS
SELECT
(fn_extract_xml_value('OFFICE_KEY', xml_data))::int AS OFFICE_KEY,
(fn_extract_xml_value('PROD_KEY', xml_data))::int AS PROD_KEY,
(fn_extract_xml_value('ORDER_NUMBER', xml_data))::int AS ORDER_NUMBER,
(fn_extract_xml_value('REVENUE', xml_data))::float8 AS REVENUE,
(fn_extract_xml_value('UNITS', xml_data))::int AS UNITS,
(fn_extract_xml_value('DISCNT_VALUE', xml_data))::float8 AS DISCNT_VALUE,
(fn_extract_xml_value('BILL_MTH_KEY', xml_data))::int AS BILL_MTH_KEY,
(fn_extract_xml_value('BILL_DAY_DT', xml_data))::date AS BILL_DAY_DT,
(fn_extract_xml_value('ORDER_DAY_DT', xml_data))::date AS ORDER_DAY_DT,
(fn_extract_xml_value('ORDER_STATUS', xml_data))::text AS ORDER_STATUS,
(fn_extract_xml_value('ORDER_TYPE', xml_data))::text AS ORDER_TYPE
FROM SAMP_REVENUE_F_T;CREATE VIEW SAMP_REVENUE_A AS
SELECT
(fn_extract_xml_value('PROD_KEY', xml_data))::int AS PROD_KEY,
(fn_extract_xml_value('BILL_MTH_KEY', xml_data))::int AS BILL_MTH_KEY,
(fn_extract_xml_value('REVENUE', xml_data))::float8 AS REVENUE,
(fn_extract_xml_value('UNITS', xml_data))::int AS UNITS,
(fn_extract_xml_value('DISCNT_VALUE', xml_data))::float8 AS DISCNT_VALUE
FROM SAMP_REVENUE_A_T;CREATE VIEW SAMP_TARGETS_F AS
SELECT
(fn_extract_xml_value('OFFICE_KEY', xml_data))::int AS OFFICE_KEY,
(fn_extract_xml_value('PROD_KEY', xml_data))::int AS PROD_KEY,
(fn_extract_xml_value('BILL_MTH_KEY', xml_data))::int AS BILL_MTH_KEY,
(fn_extract_xml_value('REVENUE', xml_data))::float8 AS REVENUE,
(fn_extract_xml_value('UNITS', xml_data))::int AS UNITS
FROM SAMP_TARGETS_F_T;
Step 6:- Configure HAWQ to allow connectivity from OBIEE instanceEnable connectivity from OBIEE instance by changing configuration file named pg_hba.conf as shown below[gpadmin@sandbox gpseg-1]$ pwd
/data/hawq/master/gpseg-1
[gpadmin@sandbox gpseg-1]$ tail -30 pg_hba.conf
# special characters must be quoted. Quoting one of the keywords
# "all", "sameuser", "samerole" or "replication" makes the name lose
# its special character, and just match a database or username with
# that name.
#
# This file is read on server startup and when the postmaster receives
# a SIGHUP signal. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect. You can
# use "pg_ctl reload" to do that.
# Put your actual configuration here
# ----------------------------------
#
# If you want to allow non-local connections, you need to add more
# "host" records. In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.
# CAUTION: Configuring the system for local "trust" authentication allows
# any local user to connect as any PostgreSQL user, including the database
# superuser. If you do not trust all your local users, use another
# authentication method.
# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
# IPv4 local connections:
# IPv6 local connections:
local all gpadmin ident
host all gpadmin 127.0.0.1/28 trust
host all gpadmin 10.0.0.4/32 trust
host all gpadmin ::1/128 trust
host all gpadmin fe80::20d:3aff:fe10:dee3/128 trust
host all gpadmin 0.0.0.0/0 password
Setting up SampleAppLite RPD for HAWQ
Step 1:- Setup OBIEE to HAWQ connectivityConnectivity can be established by adding the below entry in odbc.ini file of OBIEE instance[ODBC Data Sources]
HAWQ_DSN=DataDirect 7.1 Greenplum Wire Protocol[HAWQ_DSN]
Driver=/app/oracle/biee/Oracle_BI1/common/ODBC/Merant/7.1.4/lib/ARgplm27.so
Description=DataDirect 7.1 Greenplum Wire Protocol
AlternateServers=
ApplicationUsingThreads=1
ConnectionReset=0
ConnectionRetryCount=0
ConnectionRetryDelay=3
Database=gpadmin
DefaultLongDataBuffLen=2048
EnableDescribeParam=0
EnableKeysetCursors=0
EncryptionMethod=0
ExtendedColumnMetadata=0
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
FetchTSWTZasTimestamp=0
FetchTWFSasTime=0
HostName=<hostname>
InitializationString=
KeyPassword=
KeysetCursorOptions=0
KeyStore=
KeyStorePassword=
LoadBalanceTimeout=0
LoadBalancing=0
LoginTimeout=15
LogonID=
MaxPoolSize=100
MinPoolSize=0
Password=
Pooling=0
PortNumber=15432
QueryTimeout=0
ReportCodepageConversionErrors=0
TransactionErrorBehavior=1
XMLDescribeType=-10
Please note that the "DataDirect 7.1 Greenplum Wire Protocol" drivers that are available along with the installation of OBIEE can be utilized for HAWQ connectivityStep 2:- Change the SampleAppLite RPD connection pool to utilize HAWQ DSNShown below are RPD and HAWQ DSN setup on Windows server
Step 3:- Deploy the RPD and execute SampleAppLite dashboards
As shown above, SQL is being sent from OBIEE to HAWQ. The is being processed by HAWQ and results from XML processed files stored in HDFS are displayed in OBIEE.ConclusionThat’s all folks! I hope you enjoyed the guide. HAWQ opens up interesting ways to use HDFS as data store for variety of enterprise reporting requirements.Please feel free to post questions in the comments section.
Reference Blogs
http://www.pivotalguru.com/?p=782http://blog.pivotal.io/big-data-pivotal/p-o-v/pivotal-hawq-lands-in-the-hortonworks-sandbox