Apache Kudu series: 1. The beginning.

Jay Bilgaye
3 min readNov 5, 2018

--

A columnar data store.

Read Efficiency: For analytical queries, you can read a single column, or a portion of that column, while ignoring other columns. This means you can fulfill your request while reading a minimal number of blocks on disk.

Data Compression Because a given column contains only one type of data, pattern-based compression can be orders of magnitude more efficient than compressing mixed data types, which are used in row-based solutions.

What you will learn here;

  1. How to Connect?
  2. How to create a table in Kudu?
  3. How to run Kudu health check utility?
  4. How to delete the Kudu table?

Heart of kudu: — Raft Consensus Algorithm

The Raft consensus algorithm provides a way to elect a leader for a distributed cluster from a pool of potential leaders. If a follower cannot reach the current leader, it transitions itself to become a candidate. Given a quorum of voters, one candidate is elected to be the new leader, and the others transition back to being followers.

How to Connect?

[@mac1 ~]$ impala-shell
Starting Impala Shell without Kerberos authentication
Connected to mac1.c.my-project-laca.internal:21000
Server version: impalad version 2.12.0-cdh5.15.1 RELEASE (build 64f4e19bf59fab8664ebff7e80fc70570dcd8cb8)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.12.0-cdh5.15.1 (64f4e19) built on Thu Aug 9 09:21:02 PDT 2018)

After running a query, type SUMMARY to see a summary of where time was spent.
***********************************************************************************

How to create a table in Kudu?

[mac1.c.my-project-laca.internal:21000] > CREATE TABLE my_first_table
> (
> id BIGINT,
> name STRING,
> PRIMARY KEY(id)
> )
> PARTITION BY HASH PARTITIONS 16
> STORED AS KUDU;
Query: CREATE TABLE my_first_table
(
id BIGINT,
name STRING,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU
Fetched 0 row(s) in 0.95s
[mac1.c.my-project-laca.internal:21000] > INSERT INTO my_first_table VALUES (99, “sarah”);
Query: INSERT INTO my_first_table VALUES (99, “sarah”)
Query submitted at: 2018–11–01 06:50:26 (Coordinator: http://mac1:25000)
Query progress can be monitored at: http://mac1:25000/query_plan?query_id=164825b4f4111750:7373d09e00000000
Modified 1 row(s), 0 row error(s) in 5.31s
[mac1.c.my-project-laca.internal:21000] > INSERT INTO my_first_table VALUES (1, “john”), (2, “jane”), (3, “jim”);
Query: INSERT INTO my_first_table VALUES (1, “john”), (2, “jane”), (3, “jim”)
Query submitted at: 2018–11–01 06:50:41 (Coordinator: http://mac1:25000)
Query progress can be monitored at: http://mac1:25000/query_plan?query_id=3d45dde01f1dc895:437a773800000000
Modified 3 row(s), 0 row error(s) in 0.12s
[mac1.c.my-project-laca.internal:21000] >

How to run Kudu health check utility?

[root@mac1 ~]# sudo -u kudu kudu cluster ksck mac1.c.my-project-laca.internal,mac2.c.my-project-laca.internal,mac3.c.my-project-laca.internal
Connected to the Master
Fetched info from all 3 Tablet Servers
Table impala::default.my_first_table is HEALTHY (16 tablet(s) checked)
Table Summary
Name | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable
— — — — — — — — — — — — — — — — + — — — — -+ — — — — — — — -+ — — — — -+ — — — — — — + — — — — — — — — — + — — — — — — -
impala::default.my_first_table | HEALTHY | 16 | 16 | 0 | 0 | 0
The metadata for 1 table(s) is HEALTHY
OK
[root@mac1 ~]#

How to delete the Kudu table?

[root@mac1 ~]# sudo -u kudu kudu table delete mac1.c.my-project-laca.internal,mac2.c.my-project-laca.internal,mac3.c.my-project-laca.internal ‘table_name’

This series continues with more stuff like below,

Apache Kudu series: 2. Troubleshooting for TABLET_DATA_TOMBSTONED

Apache Kudu series: 3. Troubleshooting:- Add or remove data directories.

--

--