Spam Filter on Comments Forum of Smartprix
Introduction to Comments Forum
At Smartprix, we believe in suggesting users the correct product at the best price. The user experience has increased by comments forum which we introduced recently. Users can interact on Discussions forum now on every product page, listing page, deal page, and comparison page as well. They can upvote, downvote any comment and even report a comment as spam if found abusive.
The need for Spam Filter
Since the time we launched Discussions forum, we have seen that users have expressed their interest in several forms. The community has been very supportive by answering each other’s queries and helping each other out, but on the other hand, sometimes users have used abusive languages as well on the forum.
We had two options to remove such comments from the forum:
1. Check each and every comment manually, which will be more accurate but utilize much manpower also.
2. Create an Automatic Spam Detecting Filter, which might give some false positives but will be cost-efficient.
We went ahead with some kind of mixture of both options, which will be cost-efficient and accurate at the same time. Let’s see the implementation.
How Spam Filter works
As soon as the user submits a comment on the Discussions forum, be it on any page, the comment goes through the Spam Filter and will be categorized as one of the following:
1. Hard Spam
2. Soft Spam
3. Normal
Hard Spam Comments
Comments are matched to a regex (containing a list of 400 hard abusive words), and if it contains any abusive word, then it will be marked as hard spam and will be sent for approval. If the context of that comment is found not abusive, then it will be approved by our CRM team and will be displayed on that particular page.
Soft Spam Comments
This category contains many types of comments, including the following:
1. Comments containing Email Ids
2. Comments containing Phone Numbers
3. Comments containing Websites’ link other than smartprix.com
4. Comments containing less abusive words
Comments are matched with the email regex, phone regex, and link regex mentioned in this gist. They are also matched with the soft words regex (which contains a list of 3000 soft abusive words). If the comment passes in any regex, it will be marked as soft spam.
Soft Spam Comments will be shown on the Comments forum but re finally approved by the CRM team only. So, it is suggested to not mention any personal email id, phone number or any other website link.
Normal Comments
Any comment which is not marked as either Hard Spam or Soft Spam will be marked as Normal and will be approved automatically and the comment will be listed on the particular page instantly.
Spam Filter Results
At the time of writing, out of the total comments we got on the forum, here is the distribution of the comments on the basis of Spam status.
Hard Comments: 2.58%
Soft Comments: 22.91%
Normal Comments: 74.49%
There were a total number of 8 false positives, comments which were marked as Hard Spam, but were not actually.
These results show that the Spam Filter has been very useful to exclude any type of abusive language on our Comments forum and provide a much better experience to users.