To start working on NLP, this is probably the most apt time. Internet connectivity and data accessibility has brought millions of applications in the market today. Here we will see how we can take benefit of this mobile shift with Natural Language Processing.
Natural Language refers to a language that we, humans use for everyday communication such as English, Hindi, or Portuguese. In contrast to artificial languages such as programming languages, mathematical notations etc., natural languages keep evolving with every generation, thus are hard to pin down with explicit rules. Natural Language Processing covers any kind of computer manipulation of natural language. It could be as simple as counting word frequencies to compare different writing styles or it can involve comprehension of complete human utterances, at least to the extent of being able to respond with meaningful answers.
Many applications emerged in the real world following intense and continued research and development.
NLP is trending in the following technology trends:
#Knowledge discovery in texts
#Sentiment Analysis in E-Commerce Websites
#Named Entity Extraction
These are some successful implementations of natural language processing (NLP):
- Search engines like Google, Yahoo, etc. understand that you are a tech guy, so it shows you results related to that.
- Social Media feeds like your Facebook ,Twitter news feed show you data relevant to your interests. The news feed algorithm does this using natural language processing and shows you related ads and posts more likely than other posts.
- Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format. Siri processes human language and then responds with relevant results.
- Spam filters like Google spam filters let you manage your incoming mail to send email to a label, or archive, delete, star, or automatically forward it. It’s not just about your usual spam filtering; now, spam filters understand what’s inside the email content and see if it’s spam or not. You can also create rules to filter your emails.
NLP using Python
NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
Install NLTK: sudo pip install -U nltk
Install Numpy (optional): sudo pip install -U numpy
Run python and type : import nltk
Ensure that the NLTK module is installed. On the command line, check for NLTK by running the following command:
$ python -c "import nltk"
If NLTK is installed, this command will complete without error.
In Python’s interactive environment, import the
>>> from nltk.corpus import twitter_samples
Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements, which are called tokens.
This output is a list where each element in the list is a list of tokens of the sentence. Now that we have the tokens of sentence now we can tag the tokens with the appropriate POS tags.
Want to explore more visit NLTK.
NLP using NodeJS
The idea is loosely based on the Python NLTK where all algorithms are in the same package.
Installation: You can install via NPM like so:
npm install natural
If you want to install from the source (which can be found here on github), pull it and install the npm from the source directory.
git clone git://github.com/NaturalNode/natural.git
npm install .
Now, let us understand Stemming with an example.
In NLP, stemming is the process of reducing words to their base or root form — generally a written word form.
Here is the small code snippet for stemmer with “natural ”.
In case you want to explore more about stemming, visit github .
When using ML techniques in NLP, you should always pay attention to what information you need to feed your algorithm and how you can represent that information to get the best results.
Stay tuned for next update !!!