Build a Goodreads Clone with Spring Boot and Astra DB — Part 6
Author: Pieter Humphrey
This is the sixth post in a series that walks you through building a simple, highly available Spring Boot application that can handle millions of data records. In this post, we will set up our database schema and load our data into the app. To understand how to set up an Astra DB database and connect to it, check part five. To get the full story, see parts one, two, three, and four in the series.
In the previous post, we uploaded all our Author data from our file into our DataStax Astra DB instance by creating a Spring Boot data loader application. There are multiple ways we could have accomplished data loading. Since we had already connected Spring Data Cassandra to our main web app, we reused some of that work to load our data as well.
Normally, this is not a recommended practice for a production app. Using DS Bulk in most cases would sidestep the performance problems with this approach. Since data loading is not the focus of the tutorial, we used Spring Boot Data Loader app with Spring Data Cassandra for simplicity. That said, this application worked by parsing the data file and creating author records in the database.
Now the book information needs to be uploaded. We need to get each row from the “works” file, and create rows in the
books_by_id table. This is going to be very similar to what we did for the
author_by_id table, but we are also going to refer to the
Joins are unnecessary in Apache Cassandra®, so we will parse each book line by line, and use the
author_id in the
books_by_id table to fetch the name from the
authors_by_id table, and insert the author name to go with each row of book data. This is how we denormalize the data so fetching an individual book is going to be super fast and efficient.
Let’s get to work getting the book data into our Astra DB instance.
BetterreadsDataLoaderApplication.java, which is our main application file, underneath the
initAuthor()method, we will create the
initWorks() method. The underlying concept is the same as
initAuthors(). We get the path to the
workDumpsLocation, then we skip the initial part of each line (before the first curly brace). A new JSON object is created from what is left (the book object). Then we get the book information, put it into a book instance, and post it to the BookRepository.
So we have an
Author model class and an
AuthorRepository. Now we need a
Book model class, and
BookRepository. When someone loads a page with a particular
book_id, all the book information for that book needs to load, which means we will need a
For this, we will create a
book_by_id entity so we can create that object and save it to the database. We can basically copy the contents of
Author.java and rename some variables (
Book_id will be the one primary key, and essentially the partition key. We want to book data to be partitioned based on the partition
key book_id. We are going to add some more interesting key properties to our
Book entity, including:
- published _date (local date)
- Cover. This is type int, and so is mapped to a particular API url. This column will contain cover IDs, and we will use OpenLIbrary’s Covers API to get the image
- authorNames. Currently we have author_ids, we will get the names from the author table. This is a list since a book could have multiple authors.
- Author_ids. This is also a list in case a book has multiple authors, and therefore has multiple IDs. Element 0 in this array corresponds to element 0 in the author names array.
These are all the fields that we will be using. Create getters and setters for all of these properties in the
Create the BookRepository
Now we are going to create the BookRepository. The BookRepository needs:
- To extend the
- The entity that we need to fetch, which is a
- The ID for the book is of type
Construct the Book Object
Construct the Book object as a part of the
initWorks() method. For each row in the Works table, we want to construct a new Book object. We are going to parse properties from the JSON object to add to the Book object, just like we did for the Author object. We will set the properties on the Book object based on the JSON object’s properties.
Some parsing will need to be done to get these values out of the JSON object. For example:
book id: Go back and get the Book ID from the JSON file. This should be fairly simple. It is basically the key with “/works/” removed.
authorIds: Get the author object array from the JSON blob and parse one by one.
publishedDate: Create a date formatter (
dateFormat) for the
publishedDate, and add it to where we are parsing the date.
Also, to get
authorNames we need to make a call to our
AuthorRepository on Cassandra. We provide an Author ID and ask for the corresponding author name, so that we can save it into the
BookRepository. We will do this for each of the Author IDs we have.
We need to call the authorRepository, and get the Author object for a corresponding Author ID, and then map it to that ID. The
findById() call is going to go to the Cassandra database, fetch the author information for this ID, and then give us the author object.
Once we have the Author object, we are going to map it to just the name. This way it will map each Author ID to either “Unknown author” if the name cannot be found, or the name of the author that we fetched from Cassandra.
Run the application
Replicate the try/catch block in
initWorks() that we had in
initAuthor() as well. We now have the code to parse the data for every book in existence and can save it to Cassandra. We will try running it on a limit of 50 records.
Note that doing the whole dataset this way isn’t recommended. If you want to load the whole dataset, we recommend using DS Bulk instead to ensure reasonable performance. Comment out
initAuthors() in the
start() method to make sure that it doesn’t run again.
So we don’t blow away all our author data, make sure that
application.yml has this configuration:
@Autowired the bookRepository and persist to the database. Run your application, then check CQL Console to see if it worked.
Now that we have book and author data in the database, we are going to switch to creating a web application that displays all this book information.
We have been looking at a data loader so far. Next, we are going to create a Spring MVC/Spring Boot application, which is going to have a URL that we are going to use to save books/book_ids. This is going to fetch book information and display it in a nice HTML page using Thymeleaf. It will show the user the cover, the title, and the description for a book.
See how we start to make that happen in the next blog post. In the meantime, check out the Github repository with the full code for this project.