NLP: Python Data Extraction From Social Media, Emails, Documents, Webpages, RSS & Images
Clear Overview Of Python Libraries & Techniques To Fetch Textual Data From All Of The Common Sources
Published in
5 min readJun 24, 2019
The first stage of NLP project is to extract the required textual data. The data is usually unstructured and is stored in a varying number of sources.
This article illustrates how we can extract text based data from the most common sources.
Textual data is fundamental to a NLP based models.
Article Aim
This article will cover text extraction from following sources:
- Table From HTML Webpage
- Tweets From Twitter
- Statuses From Facebook
- RSS Feeds
- Text From Images
- Text From PDF
- Text From Word Documents
- Text From CSV Files
- Text From Excel Files
- Text From Outlook Emails
- Text From HTML Webpages