Apache Gora Aerospike Data Store

Nishadi Kirielle
6 min readSep 7, 2017

--

Apache Gora is an in-memory data model that facilitates object to datastore mappings providing a data representation and persistence framework for big data. As it currently supports for persisting objects to various database models such as column stores like Apache Hbase, Apache Cassandra and key value stores, in my GSoC project I was able to extend its capability to provide support for Aerospike database.

Before going further on the usage of gora-aerospike module, it is better to have a little background on the basic concepts on Aerospike. Aerospike is a key value store having a distributed NoSQL database. The key value store operations associate with records (RDBMS rows), namespaces (RDBMS databases), sets (RDBMS tables) and bins (RDBMS columns).

Aerospike Data Model

An important factor in the Aerospike data model is that it is schemaless. The following keywords provides the basic insight into the Aerospike data model.

  • Namespaces

In compared with RDBMS, namespaces can be defined as a part of a database or a group of databases. As per their documentation, ‘A database can specify multiple namespaces, each with different policies to fit your application. Consider namespaces physical containers that bind data to a storage device (a RAM segment, disk, or file, or none).’

  • Sets

Sets can be defined as logical groupings of records and assembles with a table in RDBMS. But here the sets are optional as to there can be records not belonging to any set.

  • Bins

Bins assemble to the columns in RDBMS, but they do not specify the data type. Data type is defined by the value in the bin. Bin values are strongly typed but the bin itself is not typed.

  • Records

Records assemble to RDBMS rows. Records include key, metadata and bins.

For further information on how Aerospike works, you can refer to their official website which is the best place to get the big picture on what Aerospike is all about.

Gora — Aerospike Module

From the next Gora release onwards, we can use Gora Aerospike module to perform the data store based operations. We’ll have a quick peek into how to setup Aerospike locally and use Gora to perform the basic operations.

Setting up Aerospike Server

As a first step, follow up the Aerospike getting started guide to setup the Aerospike server. Once you setup the Aerospike server, you can verify that it is running by executing the following command if you are using Ubuntu platform.

In order to operate on the Aerospike server, we need to have a namespace created. By default it has the namespace in the name of ‘test’ which we will be using through this guide, but if you need to configure another namespace you can configure it.

With the Aerospike server, a tool is installed locally in the name of aql , which we can use to interact with the Aerospike server. To verify the available namespaces you can use the aql command line interface as follows;

Using Aerospike via Apache Gora

In order to try out the functionality of Aerospike via Apache Gora, currently you need to get the source code and build the master. I will update the post once the next release is available.

We will be following up the gora tutorial to use the basic functionality of the Aerospike server.

Gora supports the use of different data stores and in the tutorial example, we need to configure which datastore to be used. It is configured through a file in the classpath named gora.properties. We will be using the following file gora-tutorial/conf/gora.properties to set the data store to Aerospike.

gora.datastore.default=org.apache.gora.aerospike.store.AerospikeStore

By default the aerospike server ip and port is set to the default port and ip values that are used in starting up the server. IP value to the localhost and the port value to 3000. If you need to change the default values, you can set them via the properties file as follows;

gora.aerospikestore.server.ip=localhost
gora.aerospikestore.server.port=3000

If the server you are trying to access is restricted with credentials, you can provide the credentials as follows;

gora.aerospikestore.server.username=user_name
gora.aerospikestore.server.password=password

Once the properties are set, you need to define the mapping via the gora-aerospike file. For the purpose of log analytics example, the following mapping file counts.

<gora-otd>

<policy name="write" gen="NONE" recordExists="UPDATE" commitLevel="COMMIT_ALL" durableDelete="false"/>
<policy name="read" priority="DEFAULT" consistencyLevel="CONSISTENCY_ONE" replica="SEQUENCE" maxRetries="2"/>

<class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" set="AccessLog" namespace = "test">
<field name="url" bin="url"/>
<field name="timestamp" bin="timestamp"/>
<field name="ip" bin="ip" />
<field name="httpMethod" bin="httpMethod"/>
<field name="httpStatusCode" bin="httpStatusCode"/>
<field name="responseSize" bin="responseSize"/>
<field name="referrer" bin="referrer"/>
<field name="userAgent" bin="userAgent"/>
</class>

<class name="org.apache.gora.tutorial.log.generated.MetricDatum" keyClass="java.lang.String" set="Metrics" namespace = "test">
<field name="metricDimension" bin="metricDimension"/>
<field name="timestamp" bin="ts"/>
<field name="metric" bin="metric"/>
</class>

</gora-otd>

For further reference regarding what these attributes do, you can refer the gora documentation.

Now that all the configurations are set and we are good to play around Aerospike via Gora.

Basic PUT, GET, DELETE functionality

As per the log manager example, we can use its parsing functionality [9] provided to test for adding records to aerospike server.

$bin/gora logmanager -parse gora-tutorial/src/main/resources/access.log

When it is parsed successfully, we can check whether it has successfully added via aql as follows;

By looking at the sets available, we can see that the set corresponding to the ‘Accesslog’ is created.

In addition if we need to see the added records, we can do it by;

aql> select * from test

Retrieving the data stored in the Aerospike server can be done as follows with the log manager example.

$ bin/gora logmanager -get 42

INFO log.LogManager — {“url”: “/index.php?i=0&a=1__rntjt9z0q9w&k=398179”, “timestamp”: 1236710649000, “ip”: “88.240.129.183”, “httpMethod”: “GET”, “httpStatusCode”: 200, “responseSize”: 43, “referrer”: “http://www.buldinle.com/index.php?i=0&a=1__RnTjT9z0Q9w&k=398179", “userAgent”: “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)”}

In addition, the deletion of data stored in the aerospike server can be done as follows;

$ bin/gora logmanager -delete 12

INFO log.LogManager — pageview with key: 12 deleted

QUERY functionality

Gora tutorial provides a better documentation how to query the data stored in a data store supported by gora. Normal query functionality in gora incorporates the support for the following;

  1. Querying to retrieve the whole set of data without filtering
  2. Querying a single key
  3. Querying key ranges

As at the moment, querying for the key ranges are not supported via the gora — aerospike module, because Aerospike does not natively provide support to querying key ranges without the use of secondary indexes.

We can query a single key as follows;

$bin/gora logmanager -query 10

Which will return the entry related to the key value of 10.

This concludes the basic introduction to the use of Apache Gora module to interact with the Aerospike server. With a future post, I will give more insight into the module. Hope this helps to get started :)

Thanks for reading..!!!

--

--