NLP: Python Data Extraction From Social Media, Emails, Documents, Webpages, RSS & Images

Clear Overview Of Python Libraries & Techniques To Fetch Textual Data From All Of The Common Sources

Published in

FinTechExplained

5 min readJun 24, 2019

The first stage of NLP project is to extract the required textual data. The data is usually unstructured and is stored in a varying number of sources.

This article illustrates how we can extract text based data from the most common sources.

Textual data is fundamental to a NLP based models.

Article Aim

This article will cover text extraction from following sources:

Table From HTML Webpage
Tweets From Twitter
Statuses From Facebook
RSS Feeds
Text From Images
Text From PDF
Text From Word Documents
Text From CSV Files
Text From Excel Files
Text From Outlook Emails
Text From HTML Webpages

NLP: Python Data Extraction From Social Media, Emails, Documents, Webpages, RSS & Images

Clear Overview Of Python Libraries & Techniques To Fetch Textual Data From All Of The Common Sources

Article Aim

Written by Farhad Malik