What Do I Philosophize Next? A Natural Language Processing Approach
It’s 2 a.m. — I’m listening to The National and reading about the philosophy behind mind and body and their relation to each other. As the music quiets down to make way for Berninger’s overwhelming, baritone voice, I feel a deep urge to know more. So like any well-educated graduated-from-uni individual, I decide to Google it. Yes, I suppose I could have been more specific about my query than ‘mind and body’ so it gave me some philosophy instead of yoga ads, but it would still be a large amount of information with no direction (= buzzkill).Problem is, the concepts are too abstract and too vague (not to mention the ephemeral nature of a train of thought past midnight… ) and unless I dig through all the articles — skimming and scanning for some concrete theories to pop out — I will go to sleep disappointed. And one must never sleep disappointed, this is a rule.
Skip all the melodrama and the issue is simple: how do I ‘map’ out my reading so as to focus around a specific concept without distracting myself with unnecessary (albeit interesting) information? What I desire in this solemn moment of philosophical exploration is condensation — to be able to get a handful of key connections between theories, concepts, personas, and terms that all together generate a mini scenius-network as a guideline for a self-study. In the age of the internet, subtraction is value.
This was the underlying motivation for attempting an NLP (Natural Language Processing) approach to essentially make automatic high-level connections between groups of texts (articles). Truthfully, what I did isn’t really anything ‘new’ — it’s been done before for numerous applications from creating links between news articles to creating a mind-map of references between academic papers. The machine learning and algorithmic techniques for this sort of data mining tasks are also varied and range from simple word counts (which work surprisingly well) to complex statistical models. As language interpretation is no easy task, these approaches, though helpful, have yet to become reliable enough for any serious business application but I find them handy for learning purposes (and they look, like, really cool and smart and stuff). Similar to my last NLP project, the idea is to use these technical approaches as tools and helpers for analysis, learning, and gaining insights.
The code is pretty simple, though it took some iterations to find what math built the most relevant connections (at least for this specific application).
- Create a ‘Corpus’ — This is a collection of written texts which we are finding connections between. In this case, I went to the Internet Encyclopedia of Philosophy and did some python web-scraping (BeautifulSoup) to extract the text from articles in the Mind and Cognitive Science category.
- Parts-Of-Speech Tagging — I used the NLTK library for this task which assigns words a token (e.g. noun, verb, adjective) that will help for entity extraction.
- Entity Extraction — This is the step which requires us to read through the articles and pick out ‘big ideas’, ‘entities’, or any significant nouns that stand out. I created a bunch of simple grammatical rules (which I decided indicated an entity) and used NLTK’s RegexpParser to extract entities from each group of texts. To further filter out insignificant concepts, I only kept the 50 most frequent entities for each article.
- Making Connections — I would select a specific article which I was currently reading, iterate over its extracted entities, find the entity’s statistical significance, and build a connection with other articles that also had those entities listed with a similar statistical significance.
As expected, not all graphs turned out to be of value but majority of them had at least one insight that could potentially help guide the next read. I’ve included 5 sample graphs here which I found powerful or interesting for the connections they presented. Remember, these are all automated with a click of a button and without human guidance other than the initial categorization of articles.
Observations & Reflections
Extraction of Underlying Concepts
The first point of interest, even before looking at the connections, is how the article in question is broken down into its basis. For some of these topics, it’s easy to see that the underlying concepts are neatly separated out. Figure 5.0, for example, shows Artificial Intelligence split into Machines, Things, and Thoughts — it’s almost poetic how well this encapsulates the philosophical discussion around AI. The reader here can now make a choice to explore an underlying concept further by studying the article(s) connected to it.
Figure 3.0. also shows a somewhat poetic divide as well — Moral Development split into morality and reason with an accurate connection to Hume. While Morality is an obvious underlying concept, its relation with Reasoning and Rationality can also be seen as a major subject (having skimmed the article myself) — especially when we are concerned with moral ‘Development’.
Considering the math used to generate the subjects involved word counts, this sort of breakdown is not too surprising (who would have thought that the most mentioned concepts turn out to be fundamental? Shocking). Still, it takes a computer to have that level of patience.
As the motivation behind these mind maps was to be guided by interweaving connections between readings, the relevancy of these connections is a major element to determining the success of this approach. Some of these connections are obvious right away to the human eye (i.e. the title of an article contains the word) but others prove to be more subtle.
Figure 2.0 shows a connection between the main article, Embodied Cognition, to Epiphenomenalism through the term action. Now epiphenomenalism is not a term that one occasionally bumps into — you have to be looking for it. Anyone invested in self-study will know one of the great barriers to learning in that manner is the looming intimidation and yet elusiveness of jargon. Just having a name for something goes a long way into studying it. Epiphenomenalism, for example, is a view that holds actions as purely physical phenomenon — that volition (using one’s ‘will’) does not cause action. I can spend all night searching with phrases like ‘actions are not caused by volition’, ‘actions are physical phenomenon’, or ‘volition is caused by action’ to try and find a concrete concept that can help me consolidate this idea but to get to this specific jargon in the large interconnected web of google proves to be a difficult task. And so, a connection to obscure terms proves to aid self-study in the right direction, providing an initial context and motivation to concepts you may otherwise never approach.
(Functionalism connected to mental states, mind, and argument is another example of the same connection relevancy.)
Perhaps the most pleasure I received from this entire exercise was stumbling deep into the following hypothesis: that the way we relate ideas and concepts to each other is dictated by the context provided by the current conceptual discovery. In other words, the connections we see between two pieces of thoughts are not solely dependent on the two pieces themselves but heavily influenced by an existing frame of mind.
As a practical example, upon reading one article, our mind is put on a ‘quest’ of sorts — some conceptual undertaking that we pledge ourselves to carry with us all the way through. And this ‘quest’ acts like a teacher, building and dictating connections that appeal to it’s own personality, knowledge, and desire. Take the connection in Figure 4.0 between Intentionality and The Lucas-Penrose Argument about Gödel’s Theorem — connected by the word ‘true’, namely because of how much both articles are based on ‘arguments’ and proving what is ‘true’ — and look at how they are laid out in Figure 5.0. In the context of Artificial Intelligence, Intentionality connects with thought and thing while The Lucas-Penrose Argument is on the other side with machines. And my hypothesis is that if the Artificial Intelligence article manages to start the ‘quest’ within us to explore further concepts, the cross references and connections our minds make are in a different manner than had we read the articles without the context of Artificial Intelligence (or in the context of another article).
Context, we all intuitively understand, plays a strong role in how we define and interpret language units in writing as well as dialogue. But if the way we make connections is also strongly dictated by the context, this, in some sense, leaves little room for escape from the ‘quest’ that we have naively taken on. Aside from putting tremendous pressure on our choices for our first reads in the morning, this opens a whole new set of intellectual pleasure that can be achieved by simply rearranging the order of which we read. Imagine the numerous permutations, mix and matches of poems, music, and philosophical discovery — each with its own savory aftertaste, cross-referencing potential, and the ability to open up multi-realms of discovery for the same human mind reading the same texts.
(…or maybe I’m exaggerating… or maybe if you had read something else earlier, you wouldn’t feel so abrasive…)
Bonus Observation: Water examples
If you paid a bit of close attention to the graphs, you may have noticed an odd connection in Figure 4.0 between Intentionality and Internalism and Externalism over the word ‘water’. Strange… right? Something gone wrong? Obviously a bug? Nope. In fact, a unique relation that I may be highly motivated to follow. Water, actually, was used extensively because both articles cover a certain thought experiment that uses the analogy of water on two different earths. Now, having read this analogy in one article, this graph would clearly indicate where else the thought experiment is discussed. (This can happen for numerous cases including extensive uses of a single metaphor, character, or even a pop culture reference that may be used throughout an article).
Overall, this use of NLP has been.. satisfactory. Partially because more detailed and thorough analysis is possible — in fact, there are numerical metrics to determine the strength of techniques like this though more boring compared to qualitative analysis— but I’ve been procrastinating writing this for so long that I just felt it was time to be out with it, despite it not being as complete as I wished.
Following the same theme as my last post, my motivation behind this exercise was to enhance human learning and analysis through the usage of these techniques. I think there is a strong potential following this line of thought — using the robotic strength of computers to condense information into knowledge is becoming increasingly necessary day by day. There are more ideas I want to try out in this domain of building connections but this is a decent start — more exciting stuff to come! (Possibly)