How to read The Lord of the Rings in 5 minutes using data science: A Trilogy (Part 3*)

Connor Mitchell
7 min readDec 20, 2019

--

*If you haven’t read Part 1 or Part 2 yet, go do that. Reference code here

Part 3: Text Summarization

As part of one of your introductory data science courses, you’ve probably been exposed to something called the “PageRank” algorithm. A version of this algorithm is what Google used to use for ordering results returned when you queried its search engine. The more websites that linked to a page, the higher it would rank in the order of returned results (based on the assumption that higher quality pages would have more links pointing to them).

We take a similar approach for extractive text summarization; however, instead of pages we use sentences and instead of links we use weighted edges in a graph. Why weighted you ask? This is because it In order to compute the weight of each edge, we need to introduce only one more library than we’ve already used:

The cosine_distance function is needed for comparing word vectors for each sentence to approximate both sentences’ similarity. The vector is defined as an n-dimensional vector where n is the number of unique words in both sentences. Each component of each sentence’s vector thus represents a word from the list of unique words and is incremented by one for each repetition of that word in the sentence. Here’s a simple example:

Even though the word “the” occurs once in both sentences, we ignore it because it’s a “stopword”: a word that doesn’t contribute to the sentence’s meaning.

Now if we take the cosine_distance of these vectors, it returns 1 minus the cosine similarity (~0.484) suggesting these sentences are slightly more similar than they are different. Cosine distance values closer to zero mean the sentences are more similar (since the cosine of a 0 degree angle is 1). In the function below, we invert this such that the closer we are to 1 the more similar the two sentences are, but this is arbitrary.

After we’ve constructed a similarity matrix for all pairs of sentences in the dataset, we can convert it into a networkx graph and run a pagerank algorithm over it to return the top_n most dissimilar sentences:

Why do we want the most dissimilar sentences? This is because we are assuming they are the most representative of the content in the text, since they tell the reader something the other sentences don’t. This assumption is simplistic and vulnerable to differing author writing styles, but is often used for extractive text analyses. I encourage you to think of better alternatives!

And after running our summarization functions over all the chapters, we obtain a book summary that will hopefully make sense in light of our NER model and network analysis. It’s time to tie it all together.

Reading Time

We’re ready for your 5 minutes of reading! Take a look at our final html summary with character annotations or simply read the plain text below:

Frodo was the only one present who had said nothing.

One Ring to find them, One Ring to bring them all and in the darkness bind them In the Land of Mordor where the Shadows lie.’_ He paused, and then said slowly in a deep voice: ‘This is the Master-ring, the One Ring to rule them all.

‘Hush!’ said Frodo.

After half an hour Pippin said: ‘I hope we have not turned too much towards the south, and are not walking longwise through this wood! It is not a very broad belt -I should have said no more than a mile at the widest and we ought to have been through it by now.’ ‘It is no good our starting to go in zig-zags,’ said Frodo.

But I don’t know quite how to begin.’ ‘I think I could help you,’ said Merry quietly, ‘by telling you some of it myself.’ ‘What do you mean?’ said Frodo, looking at him anxiously.

There was not as yet any sign of a path, and the trees seemed constantly to bar their way.

At last Frodo spoke: ‘Did you hear me calling, Master, or was it just chance that brought you at that moment?’ Tom stirred like a man shaken out of a pleasant dream.

Though Frodo looked about him on every side he saw no sign of the great stones standing like a gate, and before long they came to the northern gap and rode swiftly through, and the land fell away before them.

‘I saw him, Mr.

‘I don’t know what came over me.’ ‘I do,’ said Strider.

If that is so, we must be wary.’ ‘I wish we could feel sure that he made the marks, whatever they may mean,’ said Frodo ‘It would be a great comfort to know that he was on the way, in front of us or behind us.’ ‘Perhaps,’ said Strider.

‘By Elbereth and Lthien the Fair,’ said Frodo with a last effort, lifting up his sword, ‘you shall have neither the Ring nor me!’ Then the leader, who was now half across the Ford, stood up menacing in his stirrups, and raised up his hand.

And yet I am not sure; it may have been better so.’ ‘I wish you would tell me what happened!’ ‘All in good time! You are not supposed to talk or worry about anything today, by Elrond’s orders.’ `But talking would stop me thinking and wondering, which are quite as tiring,’ said Frodo.

That is my belief.’ `Yet all the Elves are willing to endure this chance,’ said Glorfindel ‘if by it the power of Sauron may be broken, and the fear of his dominion be taken away for ever.’ ‘Thus we return once more to the destroying of the Ring,’ said Erestor, `and yet we come no nearer.

I must be off.’ `How long do you think I shall have here?’ said Frodo to Bilbo when Gandalf had gone.

Let us go and see what things are like now! ‘ They found the stone steps without difficulty, and Gimli sprang swiftly up them, followed by Gandalf and Frodo.

‘He cannot stand alone! ‘ cried Aragorn suddenly and ran back along the bridge.

Frodo looked and saw, still at some distance, a hill of many mighty trees, or a city of green towers: which it was he could not tell.

With Dwarf and Hobbit, Elves and Men, with mortal and immortal folk, with bird on bough and beast in den, in their own secret tongues he spoke.

Farewell to Lrien That night the Company was again summoned to the chamber of Celeborn, and there the Lord and Lady greeted them with fair words.

Frodo peering forward saw in the distance two great rocks approaching: like great pinnacles or pillars of stone they seemed.

Conclusions

Aside from being a bit disconnected, the summary we produced wasn’t too bad. Our fitted NER model identified all the characters, and thanks to our network analysis, we can tell that they are the main ones. After reading, you can infer that the story has to do with destroying an evil ring,

involves a winding journey for the protagonists,

during which they face many back-breaking challenges to achieving their mission.

It’s a story of team-work, friendship,

and enduring hope looking into a monumental future.

Could a company pick a better founding story? While there are many more layers to Tolkien’s world, I hope you’re feeling adequately prepared for the interview. This analysis focused on characters, but as our summary (unwittingly) shows, the themes are more important.

Whether you join the team at Palantir or not, I hope you bring Frodo’s attitude to your future data science projects; that dedication will take you far.

--

--