Web scraping + text analytics for competitive intelligence

Manas Ranjan Kar
NLP Wave
Published in
2 min readOct 19, 2015

In these series of case studies, I will showcase in brief the business problems I have worked on along with my team at Juxt Smart Mandate. These problems are mostly in the domain of natural language processing, information retrieval and machine learning. The idea is simple — to demonstrate that analytics needn’t be a long drawn exercise always, but short bursts of work can provide tremendous business value.

To start off — a case study in the domain of finance.

One of our clients had started a business in UK specializing in crypto-currency. However, he wanted to answer three major questions to maintain a competitive edge;

  1. What are the latest developments in the crypto-currency industry?
  2. What are my competitors doing in terms of new products, announcements and developments?
  3. What are the potential areas to gain an edge over the competitors?

While there were many websites and blogs catering to the industry, the sheer volume made it practically impossible to manually cover and assimilate information. The client approached us with a problem statement — Can you create an engine to extract the above information?

Using multiple techniques for natural language processing like relation extraction, POS and entity tagging, we were able to create an engine to scrape websites, collect relevant material, and maintain a repository of relationships and showcase entities in a chronological manner. This helped the client access information at his fingertips without needing to go through multiple websites and look for important material.

A sample intermediate output before processing more information like relationships;

Day 1 of project to results: 4 weeks

