Pre-Release Pundun Workshop

This blog post includes material for a Pundun workshop where I will try to cover Pundun features and some use case examples. Pundun architecture and design will not be covered in this workshop due to limited time. Architecture and design will be covered in another post in the future.

Erlang/OTP 20.0 had just released before I finished this post, so we are one step closer to our stable release. There will be some more testing now and then we will conclude.

Project homepage: www.pundun.io

Project source: https://github.com/pundunlabs

Pundun Dependencies

Erlang

We need to obtain Erlang/OTP to start with. With the release of Erlang/OTP 20.0 (21 June 2017) Pundun will stop supporting earlier releases. Until we release next version we recommend OTP Release 19.3 to test. Pundun requires dirty schedulers enabled on Erlang VM so, before building Erlang, one must include the below configuration option.

$ # not needed for OTP > 20.0
$ ./configure --enable-dirty-schedulers

Rocksdb

Pundun uses Rocksdb as data store at the back-end. Installation process will provide a fork of the Rocksdb source but it has its own dependencies. Obtain the libraries that are defined in below link.

Rebar3

To build Pundun, we also need rebar3 build tool. One may obtain it from below link.

Building Pundun from source

Pre-built Pundun binaries can be found at downloads link. Here you will find, how it is built from source.

sitting:~$ git clone https://github.com/pundunlabs/pundun.git
sitting:~$ cd pundun/
sitting:pundun (master)$ rebar3 release

Above commands will compile Pundun source and release under _build directory. You can create a tar ball of this release as follows.

sitting:pundun (master)$ rebar3 as target tar

After above command, you can find the tar ball under _build/target/rel/pundun/ directory. You may extract it under your desired install directory to deploy a Pundun node.

Alternatively you can create a tar ball excluding Erts as below.

sitting:pundun (master)$ rebar3 as prod tar

The tar ball created by above command will not include Erts so Pundun will use the Erlang on path and you should ensure the Erlang has dirty schedulers enabled.

[OPTIONAL] One may create a binary installer package using below commands.

git checkout -b v1.0.6 v1.0.6

This will checkout tag v1.0.6 that points to a release version of Pundun.

make package RELEASE=1

And above command will build the Pundun release and create a package under /package/packages/ directory. Currently rpm and deb packages are supported.

Installing Pundun

We have covered how to build Pundun and it actually installs a Pundun node under _build/default/rel/pundun/ directory. This instance is good for development purposes. You may start a node as follows:

_build/default/rel/pundun/bin/pundun start

And you can attach to Erlang shell

_build/default/rel/pundun/bin/pundun attach

Instead we will install Pundun node to a target system.

sitting:pundun (master)$ scp _build/target/rel/pundun/pundun-1.0.6-14-gb1ffeea.tar.gz mate@kingfisher:
sitting:pundun (master)$ ssh mate@kingfisher
Last login: Thu Mar 30 13:44:04 2017 from 192.168.211.xxx
mate@kingfisher:~/pundun$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
sudo apt-get -y install libgflags-dev libsnappy-dev zlib1g-dev libbz2-dev libzstd-dev

We copied the built binaries and checked Rocksdb dependencies on target system.

Lets install the Pundun Node.

mate@kingfisher:~$ mkdir pundun
mate@kingfisher:~$ tar -C pundun/ -xzf pundun-1.0.6-15-g8f11d8e.tar.gz
mate@kingfisher:~$ cd pundun/
mate@kingfisher:~$ mkdir pundun
mate@kingfisher:~$ tar -C pundun/ -xzf pundun-1.0.6-15-g8f11d8e.tar.gz
mate@kingfisher:~$ cd pundun/

And configure SSL and SSH on target node.

mate@kingfisher:~/pundun$ cd lib/pundun-1.0.6-15-g8f11d8e/priv/
mate@kingfisher:~/pundun/lib/pundun-1.0.6-15-g8f11d8e/priv$ openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 1095 -nodes
mate@kingfisher:~/pundun/lib/pundun-1.0.6-15-g8f11d8e/priv$ cat ~/.ssh/id_rsa.pub > ssh/authorized_keys
mate@kingfisher:~/pundun/lib/pundun-1.0.6-15-g8f11d8e/priv$ cd -
mate@kingfisher:~/pundun$

Finally start the node.

mate@kingfisher:~/pundun$ ./bin/pundun start
Generated node name pundun8ef0f8
mate@kingfisher:~/pundun$ ./bin/pundun attach
Attaching to /tmp/erl_pipes/pundun8ef0f8/erlang.pipe.1 (^D to exit)
(pundun8ef0f8@kingfisher)1> [Quit]

Connect to CLI.

mate@kingfisher:~/pundun$ ssh localhost -p 8884
Welcome to Pundun Command Line Interface
pundun>
user usage table_info show_tables logger help exit cm cluster
pundun> Connection to localhost closed.
mate@kingfisher:~/pundun$

Building Clusters of Pundun Nodes

Building a cluster of Pundun nodes is done through CLI. Any node can join with another node by pulling the target node. Then current node and its current cluster will be joined with target node and its cluster. One prerequisite is that local and target nodes should be configured with same cluster name. This configuration is stored in etc/gb_dyno.yaml file.

mate@kingfisher:~/pundun$ ssh localhost -p 8884
Welcome to Pundun Command Line Interface
pundun> cluster
show pull add_host
pundun> cluster add_host sitting
ok
pundun> cluster pull
pundun97ae64@sitting
pundun> cluster pull pundun97ae64@sitting
ok
pundun> cluster show
Cluster: cl01
Node DC Rack Version
pundun8ef0f8@kingfisher dc01 rack01 1
pundun97ae64@sitting dc01 rack01 1
pundun>

Using Pundun Database

Pundun has Erlang, Go and Javascript client APIs which are using binary protocol (Apollo) that is based on TLS connections.

In this section I will not go through those APIs but I will show a native Erlang approach to run database operations through RPC calls.

Lets create a table on our cluster;

sitting:~$ erl -sname demo -setcookie pundun
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V8.3  (abort with ^G)
(demo@sitting)1> rpc:call(pundun8ef0f8@kingfisher, enterdb, create_table, ["trace", ["tsid"], []]).

Above we create a table called “trace” with key is defined as “tsid” and empty list of options.
Now we can check on CLI to see table information, you will see all attributes are created by default settings;

mate@kingfisher:~/pundun$ ssh localhost -p 8884
Welcome to Pundun Command Line Interface
pundun> show_tables
trace
pundun> table_info trace
column_mapper: '$trace'
columns: []
comparator: descending
data_model: array
distributed: true
hash_key: ["tsid"]
hashing_method: uniform
index_on: []
key: ["tsid"]
name: "trace"
nodes: [pundun8ef0f8@kingfisher,pundun97ae64@sitting]
num_of_shards: 16
replication_factor: 1
shards: [{"trace_shard0",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard1",#{"dc01" => [pundun8ef0f8@kingfisher]}},
{"trace_shard2",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard3",#{"dc01" => [pundun8ef0f8@kingfisher]}},
{"trace_shard4",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard5",#{"dc01" => [pundun8ef0f8@kingfisher]}},
{"trace_shard6",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard7",#{"dc01" => [pundun8ef0f8@kingfisher]}},
{"trace_shard8",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard9",#{"dc01" => [pundun8ef0f8@kingfisher]}},
{"trace_shard10",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard11",#{"dc01" => [pundun8ef0f8@kingfisher]}},
{"trace_shard12",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard13",#{"dc01" => [pundun8ef0f8@kingfisher]}},
{"trace_shard14",#{"dc01" => [pundun97ae64@sitting]}},
{"trace_shard15",#{"dc01" => [pundun8ef0f8@kingfisher]}}]
size: 0
type: rocksdb

Lets set debug filter for logger application.

pundun> logger 
current_filter available_filters set_filter generate_filters
pundun> logger available_filters
emergency alert critical error warning notice info debug normal default
pundun> logger set_filter d
default debug
pundun> logger set_filter debug
Current filter set to debug

One may specify options for above attributes. An example would be;

(demo@sitting)2> Options = [{comparator, ascending}, {data_model, kv}, {distributed, false}, {hashing_method, rendezvous}, {num_of_shards, 32}, {replication_factor, 2}, {ttl, 86400}].
[{comparator,ascending},
{data_model,kv},
{distributed,false},
{hashing_method,randevous},
{num_of_shards,32},
{replication_factor,2},
{ttl,86400}]
(demo@sitting)3>

Lets write some data into database;

(demo@sitting)3> rpc:call(pundun8ef0f8@kingfisher, enterdb, write, ["trace", [{"tsid", 1}], [{"events", #{1 => #{"label" => "paging_response", "from" => "ue", "to" => "enodeb"}}}]]).

And read back, then write some more..;

(demo@sitting)4> {ok, [{"events", Events}]} = rpc:call(pundun8ef0f8@kingfisher, enterdb, read, ["trace", [{"tsid", 1}]]).
{ok,[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"}}}]}
(demo@sitting)5> rpc:call(pundun8ef0f8@kingfisher, enterdb, write, ["trace", [{"tsid", 1}], [{"events", Events#{2 => #{"label" => "paging_response", "from" => "enodeb", "to" => "mme"}}}]]).
ok
(demo@sitting)6> rpc:call(pundun8ef0f8@kingfisher, enterdb, write, ["trace", [{"tsid", 1}], [{"events", Events#{3 => #{"label" => "lcs_ap_location_request", "from" => "mme", "to" => "esmlc", "data" => <<"Some binary data maybe">>, "ts" => "2017-04-26 07:31:32.164"}}}]]).
(demo@sitting)7> rpc:call(pundun8ef0f8@kingfisher, enterdb, write, ["trace", [{"tsid", 1}], [{"events", Events}, {"start_ts", "2017-04-26 07:31:32.163"},{"end_ts", "2017-04-26 14:28:35.347"}]]).
ok
(demo@sitting)8> rpc:call(pundun8ef0f8@kingfisher, enterdb, read, ["trace", [{"tsid", 1}]]).
{ok,[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:32.163"},
{"end_ts","2017-04-26 14:28:35.347"}]}

We can update some column to increment a counter or overwrite its value;

(demo@sitting)9> rpc:call(pundun8ef0f8@kingfisher, enterdb, update, ["trace", [{"tsid", 1}], [{"methods_tried", increment, 1, 0}]]).        
{ok,[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:32.163"},
{"end_ts","2017-04-26 14:28:35.347"},
{"methods_tried",1}]}

We can read ranges of keys;

(demo@sitting)10> rpc:call(pundun8ef0f8@kingfisher, enterdb, write, ["trace", [{"tsid", 2}], [{"events", Events}, {"start_ts", "2017-04-26 07:31:42.163"},{"end_ts", "2017-04-26 14:28:45.347"}]]).
ok
(demo@sitting)11> rpc:call(pundun8ef0f8@kingfisher, enterdb, read_range, ["trace", {[{"tsid", 2}], [{"tsid", 1}]}, 10]).
{ok,[{[{"tsid",2}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc", "ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc","ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:42.163"},
{"end_ts","2017-04-26 14:28:45.347"}]},
{[{"tsid",1}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc","ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc","ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:32.163"},
{"end_ts","2017-04-26 14:28:35.347"},
{"methods_tried",1}]}],
complete}
(demo@sitting)12> rpc:call(pundun8ef0f8@kingfisher, enterdb, read_range_n, ["trace", [{"tsid", 2}], 3]).
{ok,[{[{"tsid",2}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc","ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc","ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:42.163"},
{"end_ts","2017-04-26 14:28:45.347"}]},
{[{"tsid",1}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc","ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,
"from" => "mme","label" => "lcs_ap_location_request",
"to" => "esmlc","ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:32.163"},
{"end_ts","2017-04-26 14:28:35.347"},
{"methods_tried",1}]}]}

Iteration on Pundun database can be done as follows;

(demo@sitting)13> {ok, {Key, Value}, It} = rpc:call(pundun8ef0f8@kingfisher, enterdb, first, ["trace"]).
{ok,{[{"tsid",2}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:42.163"},
{"end_ts","2017-04-26 14:28:45.347"}]},
<<"C\n">>}
(demo@sitting)14> rpc:call(pundun8ef0f8@kingfisher, enterdb, next, [It]).
{ok,{[{"tsid",1}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:32.163"},
{"end_ts","2017-04-26 14:28:35.347"},
{"methods_tried",1}]}}
(demo@sitting)15> rpc:call(pundun8ef0f8@kingfisher, enterdb, next, [It]).
{error,invalid}
(demo@sitting)16> {ok, _, Back} = rpc:call(pundun8ef0f8@kingfisher, enterdb, last, ["trace"]).
{ok,{[{"tsid",1}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:32.163"},
{"end_ts","2017-04-26 14:28:35.347"},
{"methods_tried",1}]},
<<67,30>>}
(demo@sitting)17> rpc:call(pundun8ef0f8@kingfisher, enterdb, prev, [Back]).
{ok,{[{"tsid",2}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:42.163"},
{"end_ts","2017-04-26 14:28:45.347"}]}}
(demo@sitting)18> rpc:call(pundun8ef0f8@kingfisher, enterdb, prev, [Back]).
{error,invalid}
(demo@sitting)19> {ok, _, Sought} = rpc:call(pundun8ef0f8@kingfisher, enterdb, seek, ["trace", [{"tsid", 2}]]).
{ok,{[{"tsid",2}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:42.163"},
{"end_ts","2017-04-26 14:28:45.347"}]},
<<"C%">>}
(demo@sitting)20> rpc:call(pundun8ef0f8@kingfisher, enterdb, next, [Sought]).
{ok,{[{"tsid",1}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:32.163"},
{"end_ts","2017-04-26 14:28:35.347"},
{"methods_tried",1}]}}
(demo@sitting)21> rpc:call(pundun8ef0f8@kingfisher, enterdb, next, [Sought]).
{error,invalid}

Using Indexes and ability to search text per column

To demonstrate the indexing on Pundun, we implemented an example application to index Wikipedia dumps.

Application can be found at:

Continuing on previous example, we can add index on a columns as follows;

(demo@sitting)22> rpc:call(pundun8ef0f8@kingfisher, enterdb, add_index, ["trace",["chosen_method", "mme"]]).
ok
(demo@sitting)23> rpc:call(pundun8ef0f8@kingfisher, enterdb, write, ["trace", [{"tsid", 4}], [{"events", Events}, {"start_ts", "2017-04-26 07:31:52.163"},{"end_ts", "2017-04-26 14:28:55.347"},{"mme", "166.181.19.34"},{"chosen_method","cellid"}]]).
(demo@sitting)24> rpc:call(pundun8ef0f8@kingfisher, enterdb, index_read, ["trace", "chosen_method", "cellid"]).
[{[{"tsid",4}],
[{"events",
#{1 =>
#{"from" => "ue","label" => "paging_response",
"to" => "enodeb"},
2 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"},
3 =>
#{"data" => <<"Some binary data maybe">>,"from" => "mme",
"label" => "lcs_ap_location_request","to" => "esmlc",
"ts" => "2017-04-26 07:31:32.164"}}},
{"start_ts","2017-04-26 07:31:52.163"},
{"end_ts","2017-04-26 14:28:55.347"},
{"mme","166.181.19.34"},
{"chosen_method","cellid"}]}]

Index API is under development and will be changed later. We consider to add support for setting different analyzers per column so an IP address will not be indexed the same way as a text document. Also index queries will change to retrieve posting lists according to relevance and size limits. Retrieval of actual data will be left to user.