Can machines be taught to correctly distinguish between loans and credit cards ?
Saving time is what machines do best. Imagine having the super power to find within seconds material in which you are interested, from among a collection of numerous random posts.
This is exactly what my project worked towards, by using advanced natural language processing tools to correctly identify topics to which a reddit post belongs.
Two highly similar topics were chosen for this project(“Loans” and “Credit Cards”). This was to build a model which could strongly differentiate , even between similar topics. The NLP techniques first identified most frequent words within posts and their count in each individual post. Following this a Logistic regression ML model, Naïve Bayes model and a neural network model was trained on a random subset of the scraped data.
The Logistic regression model correctly differentiated between posts achieving an accuracy of 95%. Naive Bayes & neural networks were not far behind with a classification accuracy of close to 93%. Overall the project was highly successful and it served as a great starting point to classify texts.
The resultant machine learning model could additionally be adapted for post filtering, post identification, etc. Further analysis could also be done to identify associative keywords for various topics.
Please view these in sequence to understand the project in its totality