Information Retrieval with Term Frequency and TF-IDF Models

Image for post
Image for post
Photo by Markus Winkler on Unsplash

This article originally appeared on Lemmalytica— a blog about language, artificial intelligence, and coding.

One of the core tasks in information retrieval is searching. Anyone who deals with large amounts of text data (and that’s almost all of us) knows how difficult this seemingly simple task can be. If your search term is too broad, you may find yourself sifting through an impossible quantity of documents. And if your search term is too narrow, you could be missing out on relevant results. So how do we decide which documents are the most relevant to our search?

Search relevance is a difficult problem — and modern search engines employ highly sophisticated (and proprietary) algorithms to deal with the issue. We won’t delve into those algorithms, but let’s look at some simple strategies that you might employ in your own information retrieval applications. …


The Benefits of Learning to Write Well

Image for post
Image for post
Photo by Liviu C. on Unsplash

Writing is humanity’s superpower — when done well, it informs, provokes, and entertains. Perhaps that is why blogging is so popular among programmers. We’re a naturally curious community and sharing knowledge is an integral part of our ethos.

For the reader, the benefits of good writing are obvious. When an author takes the time to prepare a high-quality article, knowledge flows seamlessly from one mind to another. I would argue though that the benefits may be even greater for the writer. Writing a good technical article requires deep research, careful thought, and a significant amount of experimentation. …


A Guide to Getting Started with Academic Literature

Image for post
Image for post
Photo by Debby Hudson on Unsplash

This article originally appeared on Lemmalytica — a blog about language, artificial intelligence, and coding.

Natural language processing (NLP) is a complex and evolving field. Part computer science, part linguistics, part statistics — it can be a challenge deciding where to begin. Books and online courses are a great place to start, and project-based learning is always a good idea, but at some point it becomes necessary to dig deeper, and that means looking at the academic literature.

Reading academic literature is an art unto itself, and just because a paper is popular doesn’t mean it’s the right place for a beginner. However, there is something to be said for papers that have withstood both the test of time and been widely accepted by experts. If a paper has been consistently cited in academic literature, then it’s probably fair to say that the paper is influential. …


All the Tools You Need for Your NLP Workflow

Image for post
Image for post
Photo by Susan Yin on Unsplash

This article originally appeared on Lemmalytica — a blog about language, artificial intelligence, and coding.

One of the great things about using Python for natural language processing (NLP) is the large ecosystem of tools and libraries. From tokenization, to machine learning, to data visualization — Python has something for every NLP task in your workflow. Of course, choosing the *right* tool isn’t always so easy. Every NLP library provides slightly different functionality and has slightly different implementation. The key to finding the right tool is having an awareness about what is out there, and experimenting with each of them such that you know each tool’s strengths and weaknesses. To that end, provided below is a list of the major NLP tools in use today. …


Using Natural Language Processing to Analyze Text

Image for post
Image for post
Photo by Patrick Hendry on Unsplash

The Stanford NLP Group has long been an active player in natural language processing, particularly through their well-known CoreNLP Java toolkit. Until recently though, Stanford NLP has been a less well-known player in the Python community, which is a shame since many NLP practitioners work primarily in Python. But there’s good news! Stanford NLP’s Stanza Python library is coming into its own with the recent release of version 1.1.1!

The new Stanza version supports 66 different human languages (which is a big step forward, since NLP has long been very English-centric) and can carry out core NLP tasks like lemmatization and named entity recognition. …


The Human and Business Imperatives of Deep Work

Image for post
Image for post
Photo by Thomas Martinsen on Unsplash

In 2016, Cal Newport introduced a new term into the business lexicon: deep work. It’s an idea that has since taken hold of disaffected knowledge workers everywhere, due in no small part to the promise that they could finally start doing what they were hired to do — create value. More importantly, intertwined with this promise is something more nebulous — something fragile and fleeting. Dare we call it self-actualization? Anyone who has worked in a modern office knows the creeping sense that what you’re doing doesn’t really matter. It’s that unspoken but ever-present worry that your life is little more than a series of TPS reports. …


Why Reading Wisely Rather than Widely is a Better Recipe for Staying Informed

Image for post
Image for post
Photo by Hayden Walker on Unsplash

Much has been made of the question about what it is to be an informed citizen. We’re instructed to “read widely,” “engage in debate”, “seek out new viewpoints,” etc. The message is clear: the more information you consume, the better informed you will be. On its face, this is reasonable and well-intentioned advice. The problem is that it’s also completely wrong. We’ve become so enamored with the availability of information that we’re forgetting to first judge the quality of our information. For years I practiced this kind of information consumption — particularly with the daily news — only to become burned out by an overload of unfiltered, inaccurate, biased, and ultimately low-quality information. …


Bro, do you even have control over your own mind?

Image for post
Image for post
Image via Know Your Meme

When Charlie bit his brother’s finger, little did he know that he was unleashing a virus that would burrow into the consciousness of nearly a billion YouTube viewers. Charlie created a meme without even knowing what a meme was. Or perhaps his brother, the victim of said bite, deserves credit—after all, it was his half-laughing, half-crying narration that made the phrase “Charlie bit me” famous. Or did the meme create itself—seizing an opportunity to launch upon an unsuspecting world?

The meme, not at all mindful who Charlie was or caring much for his habit of biting fingers, spotted an opportunity. And like any self-respecting virus, it set out to replicate itself far and wide, riding on the backs of two playful brothers and humankind’s susceptibility to awwww-inducing moments. …


An introduction to functional programming

Image for post
Image for post
Photo: Sai Kiran Anagani/Unsplash

One of the best things about programming is that there are many ways to solve the same problem. Of course, this is also one of the most difficult things about programming. For new programmers, the seemingly endless array of design patterns, best practices, techniques, principles, and all other manner of prescriptive dogma are intimidating at best and deeply demoralizing at worst. But there is good news hiding among the panoply of architectural choices: As different as they may seem, all such patterns eventually lead back to the same foundational principles. …


Decoupling Clients from Service Construction

Image for post
Image for post
Photo by Ant Rozetsky on Unsplash

In object-oriented design, one of the principle aims is to produce code that is flexible, maintainable, and reusable. One of the ways to do this is to use abstractions in your code rather than concretions. The more your objects know about how one another are implemented, the more dependencies there are in your system. As the number of dependencies grows, the potential for cascading breakage grows as well. But what happens when you have a system that requires certain objects to come from the same family? How do you ensure that any objects you instantiate are indeed from that family without hard-coding a complicated control structure? One solution to this problem is the abstract factory pattern (AFP). …

About

Severin Perez

Writer | Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store