Is This Article Series for Me?


What Is Serverless Computing?

Why Serverless Computing?

Cost Effective

Zero Administration

Low Overhead

Serverless Computing in Analytics

Data Lake Analytics

Scenario — 1 (OSS + DLA + Quick BI)

Understanding the Data - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245

Ingesting the Data into OSS

ossutil mb oss://bucket [--acl=acl] [--storage-class sc] [-c file]
ossutil64 mb oss://apachelogs
ossutil cp src-url dest-url
ossutil64 cp accesslog.txt oss://apachelogs/

Processing the Data stored in OSS Using DLA

CREATE SCHEMA my_test_schema with DBPROPERTIES (
LOCATION = 'oss://xxx/xxx/',
CREATE SCHEMA apacheloganalyticsschema with DBPROPERTIES (
LOCATION = 'oss://apachelogs /',
catalog='oss' );
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[ROW FORMAT row_format]
[STORE AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
LOCATION oss_path
CREATE EXTERNAL TABLE apacheloganalyticstable (
host STRING,
identity STRING,
user_id STRING,
time_stamp STRING,
request STRING,
status STRING,
size STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s" )
STORED AS TEXTFILE LOCATION 'oss://apachelogs/accesslog.txt';
select * from apacheloganalyticstable limit 5;
select count (distinct host) as "Unique Host" from apacheloganalyticstable where status="200";



