A JURIST Digital Scholar Reflects on Improving News Access with Technology

Michael
6 min readSep 23, 2020

--

JURIST is the world’s only law school-based legal news and commentary service

Honestly, life is just so weird and interesting, you know? It’s hard to say exactly what I expected when I was accepted into JURIST’s Digital Scholar program this summer. If someone would’ve asked me what I would be doing, I would’ve responded with some variation of “working on a project to improve news access.” Little did I know, however, I would do that and so much more. It’s just — sometimes we become so focused on what we’re doing that we become unaware of how much we grow and accomplish through challenges and experiences. My time as a Digital Scholar certainly exemplifies this.

I applied to the program back in June hoping to build a project using JURIST’s datasets. My project? I wanted to develop a web application to simplify how JURIST readers discover and understand news articles. This web application would allow one to search any broad topic (Ex. FCC) and have that topic would be broken into subtopics (Ex. 5G, net neutrality, robocalls). JURIST articles related to these subtopics would then be visualized on chronological timelines. My motivation for building this project came from my own experience using JURIST’s website. After searching “FCC”, I encountered 52 pages of news articles displayed in an unorganized manner. This led me to ask several questions. What if searched news articles could be displayed on a simple and engaging visualization? What if it was easier for readers to understand the development of any large and complex topic? These questions inspired my project, which seeks to provide an answer to these questions.

Screenshot of the Web Application
Screenshot of the Web Application

The web application can be found here. The source code of the web application can be found here.

Programming the Web Application

Building the Backend

spaCy Word Embedding Model
K-Means Clustering

I started by focusing on building the backend of the application first using Python with Jupyter Notebook. With a few lines of code, I parsed through JURIST’s XML data files, which were filled with news article data and text. Next, I trained the spaCy word embedding model on the news article text to transform the article text into numerical vectors. Ultimately, I settled on using spaCy as the word embedding model for my project due to its simplicity and ease of use. There are other word embedding models that could’ve also been used in this scenario like Word2Vec or GloVe.

Because of spaCy, news articles that are similar in topic should have numerical vectors that are also similar. Therefore, finding a way to cluster similar numerical vectors would allow me to organize news articles by topic. To do this, I experimented with two scikit-learn clustering algorithms, DBSCAN and K-means. Although I was drawn to DBSCAN’s simplicity and potential, it proved to be very unreliable during testing. It mostly struggled to identify groups of similar vectors even with modifications to its parameters. K-means, on the other hand, was very efficient in clustering the vectors and it soon became obvious which clustering algorithm I should use.

Building the Frontend

HTML Code

After taking about four weeks to build out the backend, I started to focus more on the frontend of the application. Before starting this project, my frontend programming skills were non-existent. I had never heard of Flask or even written one single line of HTML code. Because of this, I had to essentially teach myself HTML/CSS and figure out how to translate my Python code into a working web application. Throughout this process, I encountered so many wonderful online resources ranging from W3 to Medium’s Data Science Page, all of which made learning HTML and Flask so much easier. Developing the frontend from start to finish was a continuous process of trial and error that took me around 5–6 weeks to complete. Once I was finished with both the frontend and backend, I uploaded my programming code to GitHub and deployed it into a web application using Heroku.

Final Thoughts

Reflections

Overall, my journey of building a web application from scratch was one filled with a mix of emotions. It was rewarding, frustrating, exciting, humbling, and so much more. There were many times where adrenaline would be flowing through my veins after I reached a major milestone. There were also many times where I felt defeated after spending hours trying to fix a code error to no avail. The only analogy I have to describe this journey is comparing it to a mountain climb. The journey up is tedious and challenging. You constantly encounter obstacles and you have to push yourself to solve them. But once you finish and get to the top of the peak, the view and feeling of euphoria are great!

Outside of new technical skills, if there’s anything I learned from this experience, it’s to always surround yourself with rockstar people. My mentors — Xiaoli, Andy, and Cameron were the rocket fuel that enabled me to blast off with my project idea. Through their feedback and guidance, I took my project further than what I envisioned by implementing ideas I would’ve never thought of. I owe them a tremendous amount of gratitude for their kindness and confidence in me.

The Future

This web application is far from perfect. There are numerous improvements that can be made on both the frontend and backend. For example, the spaCy word embedding model can be improved to better cluster the articles. One limitation I encountered during development was the limited memory space of Heroku. This forced me to downgrade to a smaller spaCy model. Better memory allocation and management can improve the speed and clustering performance of the application. There are also many ways to enhance the frontend of the application. For example, more information could be added to the timeline to give users more information about each cluster. Revamping the timeline visualization to look more modern and sleek is also something that can be done.

The goal of JURIST’s Digital Scholar Program is “to cultivate rising interdisciplinary talent at the crossroads of law, technology, and public policy” to inspire research and ideas. This mission is what strongly attracted me to the program. If you’re interested, I strongly encourage you to apply. Like seriously! You will work on interesting projects with amazing people.

Understanding the intersection of law, technology, and policy is more essential today than at any time in human history. It’s up to us to be the scholars and leaders who will drive conversations and ideas in this ongoing journey. I’m proud to have been part of this journey — even if it was just for one summer. Although my time as a Digital Scholar is coming to an end, one thing is very certain, I will continue immersing myself in current events and working to contribute to society using policy and technology in small or big ways. And I hope you do too.

Unlisted

--

--