Getting Started with Natural Language Processing NLP for Beginners

Uniqtech
Data Science Bootcamp

--

Natural Language Processing or NLP is a subset of the field of Artificial Intelligence. It is a field that analyzes our human language, takes texts as input. The entire text dataset, the input data is called the corpus. For example we calculate how many times a word appears in the corpus. This count is called term frequency. Natural Language Processing (NLP) is not supposed to be easy! But let’s try to simplify for beginners. Follow us for more beginner friendly articles like this. Updated October 2022.

Liberal arts, humanities studies graduates may not think programming, AI and machine learning is for them. Natural Language Processing (NLP) is actually very interdisciplinary, requires analytical, writing, and research skills in linguistics, social science, English language / English text / literature, philosophy of representation, morality, transparency, justice etc. It’s a great field in AI that all can shine and contribute.

“Hi there! It’s good to see you. I just wanted to say hi.” # The sentence is the corpus. Term frequency of ‘hi’ is 2, because it appears twice in the corpus, if our analysis case insensitive (‘Hi’ equals to ‘hi’). If it is case sensitive, then the term frequency of ‘Hi’ is one, and TF of ‘hi’ is also one.

--

--