Build CRUD Operations with Node JS and Python on DataStax Astra DB
Author: Eric Zietlow
Welcome to the CRUD operations with Node JS and Python with Astra DB workshop! In this tutorial, part of an eight-week series on Cassandra Fundamentals, we will show you the basics of connecting, updating, and reading records from the powerful distributed NoSQL database Apache Cassandra®.
In this post, we’ll provide an overview of our tutorial using Astra DB to cover some of the most basic operations any new developer interested in learning a new database needs to know.
CRUD is essentially the basic functionality that deals with creating, reading, updating, and deleting records within your application. Users can call these four functions of CRUD to perform different types of operations on data within a relational database. As a concept, it can help developers build a base of functionality for any app you go on to develop.
Although it seems simple in concept, it adds very important functionality when it comes to data models. CRUD is vital to application development, so you can become more familiar with the concept in our workshop, and then you’ll be ready to get more creative when building your cloud-native applications.
If you’ve been following along our eight-week series on Youtube, you’re probably familiar with this spaceship app, an IoT-style workload. We are not launching a spaceship today, but rather, simulating a spaceship application with several scripts.
What is Astra?
Astra is a multi-cloud, fully managed DBaaS built on Apache Cassandra delivered by Datastax. In the old days, we would spend half the day starting up Cassandra so we could run a couple of credit operations. We don’t have to do that anymore. Astra allows us to eliminate most of the time we spend setting up the database, which can manage, scale, and distribute data at scale at a much faster speed.
There is no commitment and no credit cards are needed. You can use Astra’s free tier which offers 5 GB to launch a database in the cloud with a few clicks. If you’ve created an Astra database in a previous workshop, you will just change two values to match today’s code. If you haven’t created one, we will walk you through it step by step. Let’s get started.
Tutorial overview
You will find step-by-step instructions with screenshots on GitHub and we’ve also prerecorded all the codes and values there. Most of the time, you’ll just simply need to copy and paste so make sure to have your GitPod ready!
The tutorial is divided into five main steps:
- Designing your data model.
- Setting up your database.
- Connecting to Astra DB.
- Creating and updating records.
- Reading results.
These steps are separated into a total of 10 hands-on exercises where you’ll work with Astra and GitPod using your preferred driver: Node JS or Python.
1. Designing Your Data Model
To design our data model, we reverse the relational data modeling approach, by formulating queries based on what our query is, and how data will be consumed.
The conceptual data model of your spaceship application will include:
- A journey such as a spacecraft name, summary, and start and end time.
- Sensors that can feed data back such as speed and temperature.
The sensors will then be normalized and stored into their own individual tables. This way we can just run a direct query without any joins on Cassandra and get data back very quickly.
Next, we need to:
- Lookup all the journeys from a particular spacecraft.
- Look at the state of the journey.
- Create a new journey.
It’s also important to have a spacecraft journey catalog where we can string all of our journeys in one individual table for each of our sensors.
Another point to note here is that on Cassandra, data is sliced up and sent to different places based on partition keys so it’s important to have them in your data model. In this case, the spacecraft’s name is the partition key. Partition keys should always be provided in your “where” clauses to get your data back. If you’d like to learn more about how to model data on Cassandra on a deeper level, take a look at our Cassandra basics tutorial on YouTube.
You can find the detailed data modeling process for the spaceship application here. Once you have designed your model, you are ready to create a database on Astra.
2. Setting up your database and schema
If you have previously created an Astra database, you’ll just need to add two values to match the one you created. If you are new to Astra, our code is very user-friendly so you’ll be able to go through it easily by following the instructions on GitHub. Here are the steps to create a database:
- Log in or sign up to Astra here.
- Use the free tier account type.
- Set up your location to the one that’s closest to you geographically.
- Copy and paste the username, password, and keyspace of your cluster on GitHub here. You can create values of your own but we recommend that you use our values to make it easier to follow the rest of the exercises.
Your database will be created in about one or two minutes. Once it’s active, you then need to:
- Connect your database to the CQL console.
- Create schema by copying and pasting the values in Step 1f on GitHub.
- Check that all your schema tables have been created.
3. Connecting to Astra
Now that you have the database, you’ll need to connect it to Astra using Datastax drivers. Datastax drivers take care of all the difficult Cassandra things for you and allow you to just rate your CQL statements and get your data back.
In this post, we focus on two main drivers: Node Javascript and Python, so you can pick your favorite language to work with. To set up connections to Astra:
- Get the specific language package manager of your preferred driver on GitHub in the form of secure connect bundles, which are zip files that have different kinds of certificates and configurations. For Node, it’s Node Package Manager (NPM) and for Python, it’s Pip Installs Packages or Pip INstalls Python (PIP). The bundles for Node and Python are mostly equivalent to one another although there are slight syntactic changes.
- Curl either NPM or PIP on GitPod, an IDE 100% online based on Eclipse Theia.
Here are the Node and Python drivers on Cassandra:
It’s important to remember that once you open a session, you should close it at application shutdown since sessions are very expensive objects. They have a connection pool associated with them and they consume a lot of RAM. Use client.shutdown
for Node and session.shutdown
for Python.
You can find the detailed instructions in Sections A and B here:
When matching the values in the database connection file, if you want to use your existing Astra database, you’ll need to change the values on GitPod to the ones on your database. This is the only change you’ll need to make to the code but it’s very important that you get the path of your secure connect to bundle here to avoid seeing errors if you can’t establish the connection.
Remember to test your connection afterward!
4. Create and update records
Now, we’re ready to run some CRUD operations!
Inserts using simple statements
Simple statements are CQL statements inserted into drivers. The driver then handles them and returns relevant data. Simple statements are the simplest ways to execute a command against a Cassandra cluster. Here are examples for Node and Python:
Inserts using prepared statements
Prepared statements are usually preferred over simple statements because they are compiled once on each node automatically, reducing traffic and optimizing data flow. You’ll also only need to prepare a statement once per application and it’s much safer when it comes to injection than a simple statement.
There is an important difference to note if you’re just switching from simple statements to prepared statements. Prepared and bound statements are compiled once on each node automatically as needed, and each statement should be prepared only once per application.
When running simple statements, we use %s
to specify the value that’s going to be replaced when we run the statements. But for prepared statements, we’ll need to use ?
to specify the value in the node.
Inserts using user-defined types and batches
Let’s briefly discuss batches. The first thing to understand is that batches are not relational batches. They are not an optimization step where you package up all your queries in a batch and run it as a job. Batches are a way in a denormalized system to keep tables for similar data to be consistent with one another in the simplest way. Because they are here to keep live data across multiple tables consistent, we’re not going to use them as an optimization because it will actually cause performance degradation. If you are interested in learning more, check out some of the content in the earlier weeks of this learning series.
So what is a batch? It is basically a set of statements all in a group run as a single job in Cassandra terms. A batch goes through a single coordinator node and is handled as a single operation made up of multiple operations.
The behaviors of the drivers differ slightly when inserting batch statements. For Node, Node uses a method called “batch” which executes the batch statements:
For Python, you add the object of a batch statement and execute the statement:
A user-defined type (UDT) is also another area where drivers can differ. With the Node driver, you can retrieve and store UDTs using JavaScript objects. However, with Python, you need to take an extra step to define the class and refer to it when you create your query:
It’s also recommended that you register the UDT with the cluster instance to make sure that you’ll be aligned and in sync with what the driver knows about the cluster and what the cluster actually knows about the various types of drivers. However, when using prepared statements, it isn’t necessary to register.
Mark the journey as completed with an end time
Once the journey is over, mark it as complete.
In the following GitHub tutorials, you’ll follow the instructions in Section C, Exercises 3, 4, 5, and 6 to run these operations:
5. Read results
In the last part of the post, you’ll be selecting, partitioning, parsing, and paging records from journeys.
Select all records from a table
In this exercise, you’ll select all the records from a table and list journeys. You use the same statement that has been running this whole time but the only difference is that it is now bound with a “where” clause. You’re basically referencing the statement and asking it to give you all the journeys that the spacecraft has gone on. We’ve only gone on one journey so it will return that record.
Partition
Here, you’ll read a specific journey. You submit a spacecraft name in the journey ID in your query and get the exact same data back broken up into a nice format. This is very similar to a data frame and it’s a really easy way to work with data.
Parsing records
In this exercise, you’ll get all of the data back from a specific journey of a specific spacecraft by looking at sensor tables and selecting by spacecraft name and journey ID. We are parsing out the row with all the different values and we’ve got a print statement again.
Paging
Paging is how you get back a subset of the whole data you want to return. When you run the code, it will return the data and a pointer to the next set of data.
In the following GitHub tutorials, you’ll follow the instructions in Section C, Exercises 7, 8, 9, and 10 to read results:
Summary
If you’ve followed the instructions on GitHub and our YouTube video, by now you will have:
- Learned to design a data model systematically.
- Created a cloud-native database using Astra built on Cassandra.
- Worked with your preferred driver to conduct CRUD operations and read results.
- Run basic CRUD operations.
We hope this post has helped you grasp the basics of building a CRUD application using Astra DB and that it will inspire you to create many more cloud-native applications! We’d also love to hear from you: share with us a cool CRUD application you built, or just tell us more about your experience working with Astra DB in the comments!
Explore more tutorials on our DataStax Developers YouTube channel and subscribe to our event alert to get notified about new developer workshops. For exclusive posts on all things data: Cassandra, streaming, Kubernetes, and more; follow DataStax on Medium.