Create a Data Marvel : Develop a Full-Stack Application with Spring and Neo4j — Part 1
My dad and I are now both developer advocates, and we have put together joint presentations the last couple of years to submit to conferences. Our goal is to present our joint project/presentation at least once together. This year, we wanted to leverage both of our amazing technologies to show something easy, powerful, and fun.
My dad works with all things Spring, and I work with Neo4j. The two technologies are well-established and thriving, so our integration would encompass the seamless and simple development experience of Spring with the data structure of a graph database using Neo4j. There is also a great integration project that allows each technology to work with the other easily. Spring Data Neo4j allows Spring to understand the nodes and relationships of the graph database and allows Neo4j to connect and pass data to/from a full Spring application.
This gave us a great foundation to build upon and a solid starting point. With my dad’s expertise in all things Spring and my growing knowledge in Neo4j, the only thing lacking was to find an intriguing plot for the basis of our presentation.
Finding a data set
To me, the most fun demos are centered around an interesting data set that also showcases the technologies and their capabilities. So, first, we had to find a data set. This took some time. There were already a few popular data sets that Neo4j often used in demoing our technology, but we wanted something fresh and new to tackle. We also knew we didn’t want to use something too small or simple because showing something real-world is always more helpful and inspiring to others.
* Note: medium-sized datasets also make it more likely to come across real data issues to work around to get it to function. First rule of data is that it is messy and often inconsistent when pulling from various sources.
Dad found that Marvel publishes a large set of its comic data to an API that developers can access. With the recent release of the latest Avengers movie in May and our love of superheroes, it easily topped the list.
* Note: You can find out more information about this API by going to developer.marvel.com.
Now, would it work? APIs can be frustrating and fraught with gaps in or inconsistent data. While we later found that this API was no exception, we needed to do a little research up front to see if the data provided enough and had what we needed.
The initial look at Marvel’s API showed that they had good documentation for connecting to various endpoints. There was substantial data to return and several connections among the entities for leveraging Neo4j nicely (relationships between entities are a main component in a graph!). I could even use their online documentation to test their endpoints with various parameters and verify the data results returned.
Marvel’s general organization of the information, documentation, and site also bode well. This was a serious enough endeavor for Marvel that they invested the time and energy to create this developer portal and include docs, test endpoints, and other info. Needing an API key also showed that they cared who was hitting their data and how much, providing an exchange of their comic data for your email and a bit about your intended use. Our goal was to utilize their data in a personal project (non-commercial use) to show the best of two technologies in a fascinating use case, so it seemed like a fair trade to us. :)
* Note: I also found that Marvel uses graph technology. One of their representatives even spoke at Neo4j’s GraphConnect conference a few years back! You can view the full recording of Peter’s presentation about the complicated data and chronology of the Marvel universe on vimeo.
Constructing the data model
We had completed our initial evaluation of the API, and it looked promising so far. At this point, I needed to create the data model for Neo4j so that we could import the data into a sensible structure.
Since end points are often laid out like relational table structures (high-level categories with specific fields or columns returned), it was up to me to decide how each entity was related to each other. In the Marvel data, the main data entities include characters, comics, stories, events, series, and creators.
There were a couple of different approaches I could take with those entities. I could focus on one central entity and how it relates to each of the other entities. I could also make some entities have relationships with multiple entities. For instance, comic issues could include characters, and series could contain comic issues, as well as feature certain characters.
Neo4j allows the user to determine the best data model for the particular use case. Even if multiple users had the same business project, each party could come up with different data models, and Neo4j could support and handle each one! This allows you to build the data model that is best for your data and the usage — not based upon the structure and requirements of the database itself.
After several iterations and “whiteboard sessions”, I came to the data model in the image below. This model gave me some complexity due to multiple entities and types of relationships, but it also gave each entity only one relationship with one other entity. If the data became too complex, I could always ignore some of the entities.
In this model, I decided to make the Comic Issue entity the center of the data model and relate all the other objects to that central node. In my mind, the other entities (like creator and series) made the most logical sense directly connected to a comic. Also, the Marvel API documentation mentions how each entity relates to the comic and retrieves some of each other entity when pulling a comic, so that further solidified my choice.
What I Learned
We now have a sensible data model for our Neo4j graph database, and we can look forward to importing the data from the API into the graph!
It took some time and a few iterations to determine the best entities to include, the properties we wanted, and the relationship structure that worked best for this project. Below is a list of my key takeaways from the steps we have covered so far (API evaluation and data model creation).
- It took time to research the API structure and understand what we could or could not get from it.
- It took some thought and testing in the interactive documentation to come up with a data model that made sense, was interesting, and also didn’t overcomplicate the data.
- I learned a LOT from modeling a real data set. I wasn’t playing with a pre-existing model or something that had already been translated to graph before. I had to step through the process, just as any other project developer would for a new project.
In the next posts, I will walk through the next phases of this project covering the data import and application development using Spring Data Neo4j, plus other details around the project and its current state. Stay tuned for more info!