Natural Language Processing

We will perform sentiments analysis using a News API for predicting Amazon (AMZN) stock price using Python

Image for post
Image for post

Sentiments analysis of news has become one of the most robust ways of generating buy/sell signals for stocks in all major developed and major emerging markets. The idea is simple, a cumulative sentiments score of the news articles mentioning a companies name, brand, stock ticker, etc. will serve as a great indicator for the next days’ closing stock price.

This only works with stocks that have high trading volumes and active news coverage across major outlets. Generally speaking, constituent stocks of major market indices such as NASDAQ, Dow Jones, or S&P 500 will all satisfy these criteria.

In this article, we will discuss the steps necessary for building such a sentiments analysis pipeline for amazon.com stock. …


Scrape email address from a particular webpage and query the Specrom’s email database containing over 200 million email addresses

Image for post
Image for post

Extracting email addresses from websites is one of the most common ways to find email addresses of prospective clients and it powers the marketing and sales strategies of countless businesses.

If you are familiar with coding then you’ll probably know that we can extract email addresses by using something known as regular expressions (regex). However, they are pretty tricky to write and even harder to debug, and even experienced coders run into numerous issues when trying to write a regex for something as complicated as an email address.

However, if you are a coder and got here just to find an email regex than, look no further; the regex for extracting email is shown below. For the rest of you, read on to find simple and free methods to extract email addresses. …


Image for post
Image for post

Web scraping is the extraction of structured information from webpages. Major news outlets like CNN and CNNMoney are excellent sources for getting objective financial and stock market-related information.

You will have to select which portions of the page you want to extract. Typically, people want to extract author names, dates, titles, and full text of the news article.

If you Google “web scraping CNN” than you will probably come across numerous articles and blogs outlining common methods to do it in Python. Let us go through the top methods below.

Easy Method: News Extraction API

Let me put the easiest method of doing the extraction here; simply use an API that can extract all the details of the news article such as full-text content, feature image URL, author, date, etc. There are numerous such free to use APIs out there but I recommend that you use the News Extraction API from Algorithmia. You will need to signup with Algorithmia but it’s free (no credit card required) and you get 10,000 free credits which are more than enough for thousands of API calls a month. …


Image for post
Image for post

Yellowpages.com is an incredibly powerful way to search for local businesses in a particular city, county, or state and it contains all the relevant information such as phone numbers, addresses, business names, etc. by simply specifying a search term and location in the search bar. For example, you go to yellowpages.com and search for lawyers in Atlanta, GA as shown below.

Image for post
Image for post

We get the relevant search results with physical addresses, phone numbers, websites, etc. as shown below.

Image for post
Image for post

Hence, scraping Yellowpages results is commonly done to research competitors, lead generation, and other sales, marketing, and outreach activities. …


Image for post
Image for post

Store location information for major retailers such as Walmart has increasingly become an important signal of a community, town, or city’s overall health in these challenging COVID-19 times.

It is pretty easy to get a store location for a particular zip code by simply going to the store finder site and getting a list of all the store locations within 50 miles on a map (feature image above).

However, what if you want each store’s name, address, phone number, city, etc. in bulk in a CSV or Excel file? in that case, you are looking at three options.

Method 1: you can simply hire a web scraping expert from Fiverr and let them worry about getting the data you need. You’ll probably spend somewhere around $15–50 depending on the level of customization you need with the scraped dataset. …


Scrape/extract job postings in bulk and save it in Excel in less than 2 minutes without any coding

Image for post
Image for post

Indeed.com is an extremely powerful job search engine and it’s a perfect tool to scrape job postings for a particular city/state/zipcode.

In this post, we will use absolutely no coding to extract job titles, company names, location, salary, summary, url from job postings and save it as a CSV file in simply four steps.

Step 1: Indeed Job Scraper API is a great option if you want to get extracted information for free. …


Data Mining, Programming

Learn how to apply language detection and sentiments analysis on tweets

Image for post
Image for post
Photo by 🇨🇭 Claudio Schwarz | @purzlbaum on Unsplash

Getting Twitter data

Let’s use the Tweepy package in python instead of handling the Twitter API directly. The two things we will do with the package are, authorize ourselves to use the API and then use the cursor to access the twitter search APIs.

Let’s go ahead and get our imports loaded.

import tweepy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set()
%matplotlib inline

Twitter authorization

To use the Twitter API, you must first register to get an API key. To get Tweepy just install it via pip install Tweepy. …


Image for post
Image for post
Photo by Jaye Haych on Unsplash

Why model tuning is so important?

Implementing machine learning models in big data distributed ecosystem as well as individual servers has become extremely easy in past few years in no small measure due to with high level libraries such as sklearn in Python, or deep learning interface libraries such as Keras with Tensorflow, CNTK, or Theano.

Lets say that you want to run a nearest neighbor algorithm on a corpus of text documents to identify documents most similar to each other. It’s only three lines of coding if you use sklearn library.

>>> from sklearn.neighbors import NearestNeighbors 
>>> import numpy as np
>>> nbrs = NearestNeighbors(n_neighbors=5, radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, n_jobs=None).fit(X) …

Image for post
Image for post
Photo by pictures of money on Flickr

Tim Denning wrote an excellent piece for Better marketing blog about a simple formula on making money on internet and it spurred me to actually write out a more in depth article on all the possible ways out there.

It may sound simple enough; but you’ll be surprised at how many times aspiring entrepreneurs has asked us some variation of the question below.

“How do I monetize my website/app?”
or alternately,
“My website gets tons of traffic from US/EU but I don’t know how to monetize it”
A variation on the above question is:
“Can I tell how much revenue/profit a website makes by just looking at it?” …


Learn how to use a common hierarchical clustering algorithm called agglomerative clustering to find new topic clusters in recent news articles

As data scientists, text analytics on news stories has always been pretty important both from learning as well as practical perspective since it gave us bulk data corpus to train text classification, sentiments analysis, name entity recognition etc. models.

By and large, most of those models were trained on a historical news corpus which used data from past 1–3 years of news stories. This works great in normal times, however, in midst of the covid-19 pandemic, it poses a serious problems for us since news stories now have a faster turnaround cycle.

One way to fight this problem is by running text clustering algorithms on news stories collected in a short time period and identifying emerging trends before they start affecting our pretrained sentiments analysis models too much. …

About

Jay M. Patel

Cofounder/principal data scientist at Specrom Analytics (specrom.com) natural language processing and web crawling/scraping expert. Personal site: JayMPatel.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store