Cassandra Connect

Arvind Hari
Geek Culture
Published in
7 min readJun 28, 2021

--

Blending Cassandra with Python

Image Credits: Upslash.com, Color Crescent

What is Cassandra?

Apache Cassandra™ is a distributed NoSQL database that delivers continuous availability, high performance, and linear scalability that successful applications require.

Apache Cassandra is an open source, distributed NoSQL database that began internally at Facebook and was released as an open-source project in July 2008. Cassandra delivers continuous availability (zero downtime), high performance, and linear scalability that modern applications require, while also offering operational simplicity and effortless replication across data centers and geographies. Cassandra can handle petabytes of information and thousands of concurrent operations per second, enabling organizations to manage large amounts of data across hybrid cloud and multi cloud environments.

DataStax have provided a fantastic outlook on the “What is” and “Why” of Cassandra. Follow the link.

However, it is indeed a subtly convoluted process to get started with Cassandra considering its installation itself takes some good time and efforts. In this article, I take you through this process step by step to install and ensure Cassandra is up and running in the local system. In addition, this article also gives a glimpse of using accessing the cloud version of Cassandra through DataStax. By the end of this article, I finally discuss configuring Cassandra-Python integration to perform the usual Database operations with Cassandra through Python.

So here we start.

Cassandra Installation — Local System

Cassandra Installation follows the sequence as:

· Java Installation (Version 8u202 or earlier)

· Cassandra Installation

· Python2 Installation (Cassandra-Python Integration demands Python2 to work seamlessly as of now)

· Apache Thrift Installation

Java Installation

Follow the link. Signing into Oracle is mandatory to progress. Please ensure, the version earlier than Java SE 8u202 is chosen.

Choose the Highlighted Java Version

Once, JAVA is downloaded, set up a new System Environment Variable JAVA_HOME. This should have the link to the Java folder downloaded.

JAVA_HOME System Variable Creation

Cassandra Installation

Follow the link.

Ensure, 7zip is installed in your system to handle the “. tar.gz” files.

Extract the folder from the download the “. tar.gz” file. Create another system variable, CASSANDRA_HOME with the link to the extracted folder.

CASSANDRA_HOME System Variable Creation

Python 2 Environment

Well, this is not mandatory, but I have observed this a few times myself and checked with some friends of mine, Cassandra encounters issues when we try to launch the CQL (Cassandra Query Language) through a Python 3 environment. So, it is advisable to create a Python 2 environment and try launching Cassandra from this Python 2 environment.

I have used Anaconda Prompt to get this done. I have created a Python 2 environment named Python2.

Apache Thrift Installation

Well, one the steps which is swift to perform.

Follow the link and execute the ‘.exe’ file. This will flash within a second. No installation steps are involved, just execute the file.

With these 4 steps done, we are good to get started with Cassandra.

Cassandra Query Language Execution

Launch Command Prompt or Anaconda Prompt. Activate the Python 2 environment. It is advisable to launch the prompt in the admin mode. Change the directory to the bin folder within the Cassandra folder extracted.

I have placed the extracted Cassandra in my C drive, so the change directory command looks like:

On executing, “cassandra.bat -h”, we should get the message, Startup Complete as shown below.

Once, this is done, keep this prompt open and launch a new prompt, repeat the steps to change the directory to bin. This time, execute ‘cqlsh’.

This is exactly how we jump into CQL, the CASSANDRA QUERY LANGUAGE. With successful execution, this is how the prompt should look like.

CQL Setup in Local System Complete

Congratulations, Cassandra is up and running successfully in your local system as soon as you can notice the cqlsh>

It was long, wasn’t it, really time consuming and convoluted?

Let us take the cloud path to reach this in no time.

Cassandra Through DataStax

DataStax, Inc. is a data management company based in Santa Clara, California. Its product provides commercial support, software, and cloud database-as-a-service based on Apache Cassandra. DataStax also provides event streaming support and a cloud service based on Apache Pulsar.

We can create a free account and use their services free of cost worth 25 dollars every month. For the purposes of within the scope of this article and understand Cassandra-Python integration, free version of DataStax is more than enough.

All we need to do is, create an account through the link.

Once logged in, this is what we will be seeing.

Click on the Create Database option to proceed with it. Be wise and use the Free Version. I am emphasizing again, the free version is more than enough to get the job done for us.

Click on the Get Started button under “Start Free Now”

Now comes the fun part, Cassandra has a slightly different architecture than other databases. Tables are not stored directly inside a database, rather they are stored in keyspaces. In other terms, a table is stored in a keyspace which is stored in a database. To understand more on what a keyspace is, how it differs from a table, follow the link.

Choose the cloud and region of your interest, it really does not matter, it is just a matter of luck, how quickly we can get our database created. As we are using shared services through free account, it takes some significant time (a few minutes) to get the Database created, initialised and provide the access.

IMPORTANT, Ensure the KEYSPACE names are completely lowercase.

And, hurray, our new Cassandra Database is now ready. Wait until the Connect button gets activated. It takes a few minutes as we are working through shared servers.

Finally, it is time to access and play with Cassandra Databases through Python.

CASSANDRA-PYTHON Integration

We will start with discussing integrating the cloud version with Python, hold tight, a few steps process coming up. Cheers as it is a one-time activity per database.

As soon as we click on the new database and got to the Connect tab, we get to see the window as shown below. As we are trying to connect Cassandra with Python, select Python from the list of labels to the left. Then download the bundle from the Download Bundle option. Just download this and do nothing else yet.

Scroll down the page, and reach the bullet 4, which has a code snippet as shown below. This bunch of code is the key to connecting Cassandra with Python. Every single program through which, we are trying to reach the cloud version of Cassandra needs this piece of code. For every new database we create, a similar piece of code will be available.

All we need to do are:

· Import the CASSANDRA driver using !pip install cassandra-driver

· Replace the <</PATH/TO/>> with the folder location where the zipped bundle has been downloaded. Do not forget to place a backward slash (/) at the end of folder path.

· Replace the CLIENT ID and CLIENT SECRET with the corresponding tokens from the Token Management option. Just copy sand paste the ID and SECRET into the code.

On executing the code, no errors should be encountered, especially, it should not return “An error occurred.” If so congratulations, we have successfully established the connection between cloud version of Cassandra and Python.

All we need is, place the Cassandra Query within the execute().

To integrate the local version of Cassandra with Python, we do not need any authorisation. Hence, just ignore the lines related to cloud_config and auth_provider. Just call the Cluster(<no args>) and create a session.

Congratulations on CASSANDRA-PYTHON INTGEGRATION

More related articles on connecting Python to MySQL and MongoDB

Do follow me on LinkedIn and GitHub for more on Python and Data Science.

--

--