Creating a Data Marvel: Part 10 — Lessons and Resources

Published in

Neo4j Developer Blog

10 min readMar 28, 2019

*Update*: All parts of this series are published and related content available.
Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10
Completed Github project (+related content)

If you have followed this series through all 10 blog posts, I commend you! Thank you for reading along as I documented our project and code. From the analysis of the Marvel API and evaluation of the project direction to building the final UI components in the webpage, this process has exposed the time and energy required not only to populate a database and build a pretty simple full-stack application, but also how to focus on the goal of the project. In this post, we want to discuss 3 remaining topics.

Future of the project — additions, improved features, more functionality
Lessons learned — which ones stuck out the most to me
Resources — from code to foundational, where to find info

Future of the Project

From the beginning, we designed this project to be a demo for a presentation. However, as we started building the application, I think we began to realize the data could support a much more complex and intriguing application than our original intent. We obviously could not (and did not want to) build such an application for demo purposes, but it sparked plenty of ideas for continuing this project after the demo was complete.

Data Cleanup

First, we could spend an entire effort and many hours cleaning up the data from the API. There were several fields and values in Marvel’s API that we had no way of knowing they weren’t valid until they were in our database. It might take some extra exploring in the API to find these inconsistencies and broken images, but since the data owners haven’t been perfect stewards of the data, then we can improve the data integrity on our end.

If there are certain values we cannot clean up, then we may need to explore better ways to handle those on the frontend. After all, most websites already focus on failing gracefully so that the user does not see confusing error messages or ugly displays in rendering errors. We could do the same on our webpage so that missing images are thrown to a prettier default or just don’t appear.

We could also likely retrieve additional data about our Character, Creator, Event, Series, and Story entities to populate in our database for retrieval. There might be additional insight we could gain by tying some of the other entities to one another, as well, though this would quickly add complexity in our application.

Additional Endpoints

The application could also be built out to include additional endpoints for reaching more detailed information about the outlying Character, Creator, Event, Series, and Story nodes. Then, users could drill down into deeper information about each of these nodes from the webpage and find additional information about each one.

For instance, if users clicked on the Character in a ComicIssue from our main html page, we could send them to another webpage with more information about that character and possibly add another custom query to see which other superheroes this person appeared with the most (who they often teamed up with).

Adding functionality like this would mean more complexity for adding all the relationships and properties across them, but it would definitely help with exploration of the network and analysis of the connectedness of the data set.

Making the Webpage Prettier

As mentioned in our previous blog post, the frontend page could use some refinement. We gave it a great place to start, but I think we could add some relatively minor features that would greatly improve the usability and tidiness of the interface.

only render chosen ComicIssue and its connections

To start, I want to add an adjustment to the graph rendering where it pulls the related ComicIssue and the relationships around it, rather than a segment of the graph each time.

I began this process when I updated the render() method to execute each time the user clicked on a ComicIssue from the list in the left pane. However, I would need to adjust the query in the application to retrieve the nodes and relationships around the particular comic that was clicked and pass those nodes/relationships to the endpoint that builds the visualization. This would be a combination of frontend and backend work, but I think it would make the application and visualization more interesting and relevant.

This was mentioned above, but we also want to handle some of the missing or bad data in a better way on the webpage. If empty results or unhelpful values return, then there isn’t much point in displaying them. While we might be able to clean some of this up in the actual database, we may not be able to handle it all. Blank or missing data can still be helpful, and we can make it valuable through better display.

As an example, a ComicIssue might not be a part of a Series or an Event. Instead of showing blanks or null values, we could possibly use a message that states the comic is not in a series or event. The user may find that odd and drill into more information about the creators or characters involved to see if there is a pattern. Or, if the values are not empty, but unknown, then perhaps the API doesn’t have that information documented, and the user could do external research to fill in those values manually.

One final thought on improving the webpage would be to make the list of ComicIssues in the left pane pageable. This would shorten the scroll bar and result set and allow the user to sift through search results in more manageable segments. Changing this would require some research on how to chunk the results from the database and then display them with arrows to move forward and backward through the results by certain amounts, but I think it would be a nice feature.

Project Lessons Learned

If you have followed this blog series, you probably know that we put some lessons learned for each step towards the end of each post. This may have been helpful to you, but has definitely been helpful for us in seeing the progress and tracking unusual developments along the way. I wanted to highlight a few of the learnings from previous posts, as well as add any others on the process as a whole.

Part 1: There is much to be learned from modeling a real data set. We can be introduced to concepts with provided examples, but those are neat and clean, with headaches removed for easy on-boarding and learning. We can practice with other people’s examples, but they have often done the work for us in cleaning up the data, creating a data model, and sharing an import script. We truly learn how to overcome when we face all these obstacles on our own with no predefined process or shortcuts to defeat the challenges. We don’t have to start from scratch, as we were given the tools and ideas for accomplishing these tasks in the other examples, but we must work out our unique situation as we go.
Part 2: Along the lines of the one above, being given a practical example helps develop your skills to that unique situation. When we were faced with a buggy, inconsistent API, we had to figure out what to add to our Cypher queries to reduce timeout and still stay within the API restrictions of call limit and number of results. Sometimes, you just need to experiment with a data set hands-on to gain a deeper understanding. Of course, the implication of this is added time for experimentation. Having a deadline isn’t a bad thing, but it helps to have some built-in time for exploration to find the best possible solution.
Part 3: Data import seems to be the long pole in the tent for most of the projects I have tackled so far. I think it is easy to assume that the main result of the project (application, process, demo, etc) will take the longest, but I have found that getting data from one place to another takes the most time and effort. After all, the structure and integrity of the data must be good or nothing else is. Factoring data format, missing/null values and handling, and transformations took some time and careful planning in order to get the results we were looking for.

Part 4 and Part 5: Try to research tools and choose those that best fit the goals of the project. For us, Spring Data/Spring Boot was already familiar, but it also met the criteria of being simple and concise. Both those capabilities allowed us to meet our criteria for building a live-coding demo of the application. Simple and concise code meant we could cover everything we needed with good explanation in a presentation time slot. While using Spring Data Neo4j was a bit new to us, our project team (Mark and I :D) brought the foundational skills needed to apply to SDN. He had Spring expertise, and I had Neo4j knowledge. Together, we could depend on one another and stretch both our skills for the new project.
Part 6 and Part 7: You shouldn’t reinvent the wheel, but sometimes you might have to customize the wheel a bit. :) Once we got one set of domain classes outlined to match one entity, we were able to copy/paste the same outline to our other entities and adjust for differing properties. There were plenty of code examples for creating relationships between entities, but none that were quite as tightly interconnected as our Marvel data set. It was here where we needed to go “off script” and write/test code for our particular use case and data set.

putting in the final piece in the St. Louis Gateway Arch

Part 8: When you get to the point where you are knitting together all the pieces you have built into the completed end result, try to understand why each piece fits a certain way and make informed decisions. It’s easy to get caught in the scramble to cobble it together because you know it works that way. In our project, we divided our code into controller and service for the ComicIssue entity because we felt it kept responsibilities cleanly separated and made it easier for other developers to follow what method handled specific functionality. It was also tempting to simply copy/paste the D3 example code from other projects, but then I couldn’t have explained it to all of you in a blog post. Taking the time to take each puzzle piece and know why it belongs helps you reassemble and modify the pieces for your next project(s).
Part 9: You also don’t need to over-engineer things right out of the gate. The saying “Rome wasn’t built in a day” applies to software development, too. You don’t have to have the whole project in perfect order on the first iteration. For us, we didn’t need to build the perfect webpage with all the endpoints coded and functional. We could allow the project to evolve as needs and interest arose. Enjoy the learning process and let ideas surface for future improvements.

*Disclaimer: I do realize that not every developer and every project has the leisure to expand timelines or experiment with new technologies. However, I do think it is important to tackle one small thing in each new project. After all, if you never do anything new and never allow time for better ideas, your output will never reach full potential. Developers may not have the voice for some of the decisions, but we can still do our best to champion those thought processes. Best of luck! :)

Project Resources

We wanted to provide a place for all of you to have a single place to retrieve all the documentation we have used and referenced on this project. There is a lot of material here, but if you have followed along with all the previous posts, you have conquered all of these topics! Congratulate yourself on your learning and feel free to pick up additional information from the resources below as you embark on your next project!

Github source code
All previous posts in this blog series:
Part 1 — Marvel data set and data model
Part 2 — data import to Neo4j from API (initial pass)
Part 3 — data import (adding details)
Part 4 — Spring Data Neo4j choice and structure
Part 5 — first domain classes (Character, repository, controller)
Part 6 — rest of domain classes (Creator, Event, Series, Story)
Part 7 — tying entities together with relationships
Part 8 — endpoints, queries, data formatting, and more
Part 9 — web page and graph visualization
Follow the duo on Twitter to see what’s coming: @mkheck and @jmhreif
Download Neo4j
Spring Data Neo4j docs
Spring Data Neo4j Guide
Marvel API (data set)
Project Lombok docs
Spring/Neo4j movie application example

Of course, feel free to pull down the repository and add your own customizations or functionality to it! You can also reach out to us with questions, if needed. We would love to see more projects with Spring Data Neo4j and see how you are taking your applications to the next level. Keep building amazing things!