Detailed Introduction: Redis Modules, from Graphs to Machine Learning (Part 1)

ASHISH RANA
13 min readFeb 15, 2019

I know its late( modules are already out for quite a time) but better late than never 😅. Also, this part contains explanation and an example about RedisGraph module only. Others, will be covered in next part in coming weekend.

Machine learning models, artificial neural networks are definitely the new in thing not only for research but for major enterprise industries also. Already enough data is available with various organizations operating in multiple domains especially the big ones. Now, the task is to find suitable mechanism to make the most out this resource available in abundant quantity. To carry out of the box POCs (Proof of Concepts) on all the possible options available in market to maximize the results as per your needs.

But, in times of such great advancements in the trendy fields of ML, AI and Deep Learning it would be quite unjustified to not discuss about advancements in databases which are founding stones for everything from research projects to industries in terms of storing data. Write choice of a database can save you millions of dollar on enterprise level applications over the years. Databases and their architectures are evolving to serve this modern industrial need of rising ML and AI tech. One of the best examples that I have seen is Redis(a key-value in-memory store with persistence) which becomes more powerful with its incredible modules which provides it large number of functionalities.

A time-saver tip: You can skip to any section mentioned below that interests you and start reading. They are written in modular manner 😅.

Let’s carry on our discussion in following manner stated below. So, that we can learn all about Redis from its basics to different functionalities of below mentioned modules.

1. Redis, its basics and use-cases

2. Introduction to its modules and Examples

3. Basics of RedisGraph

4. Basics of Neural-Redis

5. Basics of Redis-ML

If you are here for learning how to use Redis for ML & AI use-cases directly jump onto 3,4,5 section specified above. Only, section 3 present in this article 😉.

Redis, its basics and use-cases

Redis is an in-memory key-value NoSQL data structure store that can be used either as a database or for cache. Its specialty is it supports same data-structures that are used in programming languages like strings, hashes, lists, sets etc. and some other special ones too, few of them will be discussed later in this article. It uses simple commands like SET, DEL for setting and deleting keys. RPUSH, LPUSH, LLEN, LRANGE, LPOP, and RPOP for working with list data structure. And many more commands to interact with the database store. Structure to datastore is provided by these commands only which also encodes typing mechanisms to data. Just follow along documentation it’ll be an excellent guide for you for getting familiar with commands and their functionalities. Also, you can opt for Redis Introductory course RU101.

The main question is: When to use Redis ? And for What purpose ?

The finest way to answer that question will be to mention all the important architectural features of Redis and then present suitable use-case scenarios for it.

Fast and Addition of new Modules: It is very fast as written in C (plus bench-marked as fastest database) and supports addition of multiple modules that can be loaded as binary object file via foreign function interface mechanism. Modules are dynamic libraries that extend functionality of Redis by implementing new commands with features that acts as if they are implemented in Redis core itself. Here, is a sample command to load any module onto your redis-server.

# redis-server --loadmodule path/to/module/src/module-name.soredis-server --loadmodule path/to/module/src/redisgraph.so
CAP Theorem, is stated for distributed systems. That a distributed system can have can only have 2 of the 3: Consistency, Availability and Partition Tolerance. It is very important to decide the trade-offs while designing systems.

CAP theorem, Where exactly it falls ? If you are using only single Redis instance or a master/slave configuration then it’ll be quite wrong to evaluate to evaluate it with CAP theorem as these will not be called as distributed systems in true sense. Here, ‘P’ will be zero as it has no native partition tolerances, the way to scale it horizontally is by partitioning data across multiple Redis instances. It can be done via client side partitioning where clients determine on which instance to write or ‘Proxy Assisted’ partitioning where a proxy looking like a single Redis instance partitions the data.

We can discuss CAP theorem in context of Redis Sentinal which is a high availability solution with multiple master Redis instances and a list of replicating slaves. It monitors masters and promote slaves via a consensus when master node dies. But, Sentinal is designed keeping performance and strong data model values in mind. It is a general solution high availability and horizontal scalability.

True consistency is given up in favor of performance with replication being performed asynchronously. Similarly, strong availability(all writes be successful, even when the nodes cannot reach each other across the network.) is given up in favor of strong data model of Redis. Let’s say both masters should have same data and if a master joins later data must be merged onto it. But, Redis never actually merges data because it is difficult to correctly merge lists, sets, and some of the other advanced data structures Redis supports. Instead, it will always accept the most recently written state of the data.

Along with three parameters of CAP theorem Redis clusters are designed to keeping in mind to retain performance and strong data models. Salability being the ideal gem for it to focus on. It won’t be good match for heavy industrial grade application demanding very high consistency and availability(HTAP or OLTP use-cases) but for applications demanding high performance or analytics (OLAP use-cases) as prime criteria it can do wonders for them.

I have seen big organization skip away from Redis and its module when it comes to utilizing as primary datastore because many organizations demand consistency and high availability. But, it is used majorly as caching layer in the existing data pipeline for providing high throughput.

So, when to use Redis ? It can be used for cache sessions of users. It isn’t a mission critical consistency issue and with persistence it gains advantage over Memcached systems. Real time analytics like leader-boards, order by user-votes and time, deletion and filtering etc. are some other prvelant use-cases. Also, it can broadcast messages to every server that listens to a channel using Publisher/Subscriber technologies. Redis used Pub/Sub feature to trigger scripts, create a list of most common tasks on system, social network connections and many more boundless use-cases for it. Redis implements queues for jobs where master posts the jobs and worker picks up these jobs.

Key Takeaway: Do consider all Cloud Databases available before considering Redis as your primary data-store. Also, Redis increases performance for many other use-cases. Leverage its potential to the fullest !!

Now, exploring its potential with the modules it provides to extend its functionalities and carry on multiple new operations. Each module deserves its special article but we’ll explore modules related to AI and ML here. Also, keeping out RediSearch module of our discussion as it will be out of scope for now and requires a complete separate discussion on its features.

This page contains list of all the modules supported by Redis. Here, I’ ll be listing discussing about major ones that are often used with Redis and list some examples use-cases related to that module. In following sections I’ll be explaining in detail modules for working with graphs & its queryable relationships, ML models and neural networks. Let’s get started with introduction to these modules in this section.

RediSearch: Again, bench-marked with top results !!

RediSearch: It implements search engine like functionalities on top of Redis, it is a full-text and secondary index engine. Secondary indices are often implemented when you don’t want to alter your database structure but still want some feature specific queries to be executed with high efficiency. For example, querying for customers with shoe size 7 or all customer in a given city from a NoSQL storing information customer information indexed with their unique-id. It also have advanced features numeric filtering for text queries and exact phrase matching. Its popular use-cases are auto-complete and other fuzzy suggestions even with a typo on prefix. When index is big enough, it can be sharded across multiple machines. But, instead of normal approach RediSearch uses index partitioning method which queries all the shards concurrently and returns the result after merging. This module clearly requires more discussion, hopefully a dedicated future article and some coding examples for this module.

RediSQL: RediSQL is a module that embed a SQLite database, which makes it possible to have several smaller decentralized databases instead of a single giant one. It is created to add more structure on the given in-memory database instead of redis commands we can use SQL queries for our datastore. It utilizes velocity, portability, simplicity, and capability to work in memory of SQLite DB. It is developed by RedBeardLabs which for enterprise version provide 14 days free trial. But, I personally feel its 990€/year cost is quite expensive. Hence, you must analyze your use-case requirements carefully before selecting this one.

Well, the remaining popular modules add different data structure capabilities for Redis datastore for example: ReJSON provides JSON as native datatype and allows storing, updating and fetching JSON values from Redis keys. Also, RedisTimeSeries module which provides times series as linked list of memory chunks where each chunk has a predefined size of samples, each sample is a tuple of the time and the value.

Just a Quick Fact to end discussion about redis. Redis is incredibly fast. If performance is your prime criteria for your datastore. It has been bench-marked with 50 million operations per sec with 0.99 seconds of latency to be precise. It is a incredibly high throughput for a database to have. Also, it can be used for session cache which is one of its prime usage. But, there are also other parameters that are needed to be kept in mind while selecting a datastore like replication mechanism, conflict handling etc. Do analyze it before using it for your organization.

Basics Of RedisGraph (Finally, it started 😉 !!)

Intelligence can be defined as ability to see and analyze connections. In AI we always attempt to make intelligent in some sense or the other which can do inexplicable tasks in fairly simple manner. This connection knowledge can be extracted out and modeled in the form of a knowledge graph to leverage out AI related capabilities like intent detection, sentiment analysis, connectivity etc. Other important use-cases are connected feature extraction, graph accelerated ML and AI explainability.

Now, it is clear that graphs do have a important role when it comes to applications is AI. And a data modelling process carried out specifically to analyze graphical relations will definitely be more effective as compared to any NoSQL or relational modelling approach.

For example: Fraud ring detection (basically a ring of authentic looking accounts defaulting together), traditional databases uses discrete data points for analysis of such rings but the important part of this data analysis problem is to detect relationships amongst these malicious nodes. Other use cases like social network analysis and real-time recommendation engine systems shows better results for graph databases as compared to other database approaches.

Let’s start playing with these graph databases and learn in depth how RedisGraph works. And why it will be a great choice (one of the most innovative thing I read about previous year 😉) in future. Let’s do comparative analysis with Neo4j, the current industry standard. Both, in terms of small bench-mark test and other features.

RedisGraph: Adjacency matrix for graph representation instead of adjacency list.

Again, like Redis its module RedisGraph is also bench-marked as fastest graph database. Behind the scenes it uses graph algorithms to carry out query operations in database with GraphBLAS library i.e. it uses linear algebra and matrix representation instead of adjacency list representation to perform faster operations. Matrix representation are used to get an idea about exact relationship between each pair of nodes and it is space efficient because it uses sparse matrix representation which leverage the fact that most real life graphs are sparse. Where as traditional graph databases use hexastore or adjacency list which makes them slower relatively as follow standard graph implementations instead of matrix algebra for finding relationships.

BFS for nodes 1 and 3 (Green Nodes) from linear algebra multiplication with vector mask ‘x’ which gets the 1st level of BFS nodes for them, namely 6,2 and 4(Red Nodes).

Algorithms explained like above in matrix-multiplication terms can be thought of as more abstract representation of one hop relation query like finding friends-of-friends. Such, queries can abstracted out from different graph algorithms to suite any particular use-case. Meaning all graph algorithms can be reiterated in terms of matrix algebra operations to find out different relationships in a super-fast manner with more optimized matrix multiplication algorithms.

Do, remember it currently in its initial stage only and product is not mature enough as compared to databases like Neo4j, ArangoDB etc. Also, the highly awaited feature will be GPU support for matrix multiplication which will make it even more faster(Remember GPUs, from deep learning). It is a very promising one of its kind database with huge scope of improvements even with such great results already in hand.

How to install and run? Here, are the following instructions to install it on your linux distro (Also, there is a bug while in installation CentOS7. A very similar issue was encountered by installing it on RHEL 7).

Follow the instructions below:

# 1. type following command on your linux terminal
git clone https://github.com/RedisLabsModules/RedisGraph.git
# 2. Make sure build-essential tools are present already. Otherwise, install them
apt-get install build-essential cmake
# 3. Run make command in RedisGraph directory
make
# 4. Then to load onto redis server
redis-server --loadmodule /path/RedisGraph/src/redisgraph.so
# 5. Start redis-cli and test working with a demo cypher query( similar to Neo4j)
redis-cli
GRAPH.QUERY social "CREATE (:person {name: 'roi', age: 33, gender: 'male', status: 'married'})"

Let’s do a speed test: For this make sure Neo4j is also installed on your system. Follow the steps from following documentation link, which involves JDK installation, repository addition and sudo apt-get install neo4j=1:3.5.2 command. After that, run the following commands

# Check if neo4j is installed or not
neo4j
#Output: Usage: neo4j { console | start | stop | restart | status | version }
# Start neo4j server
sudo neo4j start

Let’s run some CREATE and MATCH cypher query commands for evaluation of performance for both these graphs by using moto gp example from RedisGraph documentation.

Starting with our motoGP queries example, below are mentioned queries with their execution time for both the DBs and a visualization chart for performance comparison.

# First, Let's Run it For RedisGraph. Machine used: 3 (Core) x 8 Gigs. (RAM)
# Load graph module onto Redis Server and start redis-cli
# redis-server --loadmodule RedisGraph/src/redisgraph.so
# redis-cli
# Let's create MotoGP graph
GRAPH.QUERY MotoGP "CREATE (:Rider {name:'Valentino Rossi'})-[:rides]->(:Team {name:'Yamaha'}), (:Rider {name:'Dani Pedrosa'})-[:rides]->(:Team {name:'Honda'}), (:Rider {name:'Andrea Dovizioso'})-[:rides]->(:Team {name:'Ducati'})"
# Execution Time Output: 19.246694 milliseconds
# A match query: Who's riding for team Yamaha?
GRAPH.QUERY MotoGP "MATCH (r:Rider)-[:rides]->(t:Team) WHERE t.name = 'Yamaha' RETURN r,t"
# Execution Time Output: 13.330147 milliseconds
# Another match query: How many riders represent team Ducati?
GRAPH.QUERY MotoGP "MATCH (r:Rider)-[:rides]->(t:Team {name:'Ducati'}) RETURN count(r)"
# Execution Time Output: 0.702222 milliseconds
#-----------------------------------------------------------------# Running Same for Neo4j GUI | In Setting, check in multi-statement query editor and then run following commands.
CREATE (:Rider {name:'Valentino Rossi'})-[:rides]->(:Team {name:'Yamaha'}), (:Rider {name:'Dani Pedrosa'})-[:rides]->(:Team {name:'Honda'}), (:Rider {name:'Andrea Dovizioso'})-[:rides]->(:Team {name:'Ducati'});
MATCH (r:Rider)-[:rides]->(t:Team) WHERE t.name = 'Yamaha' RETURN r,t;
MATCH (r:Rider)-[:rides]->(t:Team {name:'Ducati'}) RETURN count(r);
# Output:
Added 6 labels, created 6 nodes, set 6 properties, created 3 relationships, completed after 129 ms.
Completed after 28 ms.
Completed after 4 ms.
Better performance of RedisGraph for a 6 node and 6 properties graph.

In detail Feature Comparison with Neo4j: Firstly, there is no UI available for RedisGraph is the free opensource version where as an interactive GUI is there for neo4j that helps us visualize the returned query results also. For both clustering is available for enterprise version only. Also, Neo4j uses causal clustering using raft protocol to maintain consistency and is not partition tolerant. Neo4j fully maintains data integrity and ensure good transactional behavior. It supports the ACID properties where as Redis follows Strong eventual consistency with CRDTs(conflict-free replicated data types).

Now, discussing about maturity of the product RedisGraph is very new product with still enhancements and bugs pending to be handled where as Neo4j is the most matured database in the industry. With 10 times more adoption rate than all others graph DBs combined. Also, both of them use same query language cypher with minor syntax variation for execution as shown above.

RedisGraph has huge potential to be a big player in graph industry with Redis being the fastest growing database and with most used image in docker. As per my views it is not ready for production environments considering the open issues are very basic in nature and will cause problems in deployment. I’ll definitely be spending more time on it for an in-depth analysis for this module, present some of large scale benchmarks and will discuss that in separate article.

Stay tuned !! In next part discussion will be about neural-redis (It can execute neural networks while they are ongoing training) and Redis-ML (Implements ML models as redis data types which can be used for either prediction or evaluation).

Wanna discuss databases( anyone you have in mind ), Ping me !!

--

--

ASHISH RANA

Mentoring @Manning liveProject: RL for Self-Driving Vehicles