Apache Kudu series: 1. The beginning.
A columnar data store.
Read Efficiency: For analytical queries, you can read a single column, or a portion of that column, while ignoring other columns. This means you can fulfill your request while reading a minimal number of blocks on disk.
Data Compression Because a given column contains only one type of data, pattern-based compression can be orders of magnitude more efficient than compressing mixed data types, which are used in row-based solutions.
What you will learn here;
- How to Connect?
- How to create a table in Kudu?
- How to run Kudu health check utility?
- How to delete the Kudu table?
Heart of kudu: — Raft Consensus Algorithm
The Raft consensus algorithm provides a way to elect a leader for a distributed cluster from a pool of potential leaders. If a follower cannot reach the current leader, it transitions itself to become a candidate. Given a quorum of voters, one candidate is elected to be the new leader, and the others transition back to being followers.
How to Connect?
[@mac1 ~]$ impala-shell
Starting Impala Shell without Kerberos authentication
Connected to mac1.c.my-project-laca.internal:21000
Server version: impalad version 2.12.0-cdh5.15.1 RELEASE (build 64f4e19bf59fab8664ebff7e80fc70570dcd8cb8)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.12.0-cdh5.15.1 (64f4e19) built on Thu Aug 9 09:21:02 PDT 2018)
After running a query, type SUMMARY to see a summary of where time was spent.
***********************************************************************************
How to create a table in Kudu?
[mac1.c.my-project-laca.internal:21000] > CREATE TABLE my_first_table
> (
> id BIGINT,
> name STRING,
> PRIMARY KEY(id)
> )
> PARTITION BY HASH PARTITIONS 16
> STORED AS KUDU;
Query: CREATE TABLE my_first_table
(
id BIGINT,
name STRING,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU
Fetched 0 row(s) in 0.95s
[mac1.c.my-project-laca.internal:21000] > INSERT INTO my_first_table VALUES (99, “sarah”);
Query: INSERT INTO my_first_table VALUES (99, “sarah”)
Query submitted at: 2018–11–01 06:50:26 (Coordinator: http://mac1:25000)
Query progress can be monitored at: http://mac1:25000/query_plan?query_id=164825b4f4111750:7373d09e00000000
Modified 1 row(s), 0 row error(s) in 5.31s
[mac1.c.my-project-laca.internal:21000] > INSERT INTO my_first_table VALUES (1, “john”), (2, “jane”), (3, “jim”);
Query: INSERT INTO my_first_table VALUES (1, “john”), (2, “jane”), (3, “jim”)
Query submitted at: 2018–11–01 06:50:41 (Coordinator: http://mac1:25000)
Query progress can be monitored at: http://mac1:25000/query_plan?query_id=3d45dde01f1dc895:437a773800000000
Modified 3 row(s), 0 row error(s) in 0.12s
[mac1.c.my-project-laca.internal:21000] >
How to run Kudu health check utility?
[root@mac1 ~]# sudo -u kudu kudu cluster ksck mac1.c.my-project-laca.internal,mac2.c.my-project-laca.internal,mac3.c.my-project-laca.internal
Connected to the Master
Fetched info from all 3 Tablet Servers
Table impala::default.my_first_table is HEALTHY (16 tablet(s) checked)Table Summary
Name | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable
— — — — — — — — — — — — — — — — + — — — — -+ — — — — — — — -+ — — — — -+ — — — — — — + — — — — — — — — — + — — — — — — -
impala::default.my_first_table | HEALTHY | 16 | 16 | 0 | 0 | 0
The metadata for 1 table(s) is HEALTHY
OK
[root@mac1 ~]#
How to delete the Kudu table?
[root@mac1 ~]# sudo -u kudu kudu table delete mac1.c.my-project-laca.internal,mac2.c.my-project-laca.internal,mac3.c.my-project-laca.internal ‘table_name’
This series continues with more stuff like below,
Apache Kudu series: 2. Troubleshooting for TABLET_DATA_TOMBSTONED
Apache Kudu series: 3. Troubleshooting:- Add or remove data directories.