Analytics Vidhya
Published in

Analytics Vidhya

MakeYourOwn Covid-19 RealTime Data with Google Alerts

Jumping straight to the point— Wanted to generate a real-time feed of know-hows on Novel Coronavirus, as it happens.
Firstly, configure Google Alerts —
you hit https://www.google.com/alerts and sign in. Input the text you ‘d want to configure these alerts for

go for full exact match words like coronavirus, covid-19 etc.

Once alerts have been configured, it looks like :

when you create an alert, go for “show options” and make sure you have the following settings in there :

In the “deliver to” section — select your “email id” and then click “Create Alert”

#ShowMeTheCode B-) :
A few easy steps next to retrieve these alerts via web requests. Right-Click to open “Inspect Element” section, as shown below and move to the “network” tab. Look for “xhr” and tap/click there. An optional click on the “trash” icon helps to make it look like :

I am doing this on Mozilla Firefox, Google Chrome Developer Tools maybe similar or a bit nuanced.

Click on either of alerts. Say I click on “coronavirus”, I see the Network tab as follows, next :

There you see, what you otherwise have underneath “NEWS” when viewed normally, is the response rendered by a “GET” Method request high-lighted in Blue.

Right-Click it. And look up “Copy as Curl”.

The Headers, Cookies, and Params that you see are all compressed as a cURL request we just copied for this GET Request. This Request fetches us the response shown.

Python Guy? So am I — oh boy! the galerts and googlealerts python packages, they just don’t work. Not for me, at least. and this was easy alerts anyway!

Another online utility that comes in handy next is “cURL to Python Requests” converter. Fire up https://curl.trillworks.com/ for transforming cURL request copied into Python Request Library code.

Not showing the text underneath, but just copy and past teh cURL under curl command, and look up the right section “Python requests”

Try to identify the “Cookies”, “Headers”, “Params” section with the ones in the “Inspect Element” Network tab mentioned before, and you ‘ll realise how the web request has been crafted into a python source code.

Just heed to the “params” tuple in the Python requests section — you’d find that as you scroll down. the 1584868878 in there, is actually the current epoch time.

leave the other params as is. we need to basically choose our time epoch wisely to match the current timestamp, time of interest.

In python, where the “time” package is offered in-built, you can get the current timestamp epoch, Fire up a python shell as I walk you through :

This value you get here, is the current timestamp epoch, You can verify as you input this epoch value in : https://www.epochconverter.com/

You have at least the Google Alerts retrieval done right now, just pasting the code snippet to bring us on same page :

import requests
import time
cookies = {//your cookies under the “python request” section in curl.trillworks.com tab}headers={//your headers under the “python request” section in curl.trillworks.com tab}params = (
(‘params’, ‘[null,null,%s,604800]’%(int(time.time()))),
(‘s’, ‘<value-for-this-under-curl.trillworks.com tab python section>’),
)
response = requests.get(‘https://www.google.co.in/alerts/history',
headers=headers, params=params, cookies=cookies)
print(response.contents)

This “response.contents” when printed will display a lengthy html, wherein one can find all those alerts. The trick just is to get up-to-date alerts by replacing the 3rd arg in ‘params’ with int(time.time()) i.e current epoch time.

What with this response next?
So far, we have automated retrieving the alerts. But this response is an html yet, that we need to parse / mine needed attributes from. I believe, mining the news source, web link, title, and text should be great to start with. Let’s do it!

make sure you have the Python Packages, bs4 / BeautifulSoup4 and lxml installed. You may install those as : pip install <package-name>

While you may try pasting and formatting the “response.contents” in yet another online utility : https://jsonformatter.org/xml-parser I have the following snippet to help you parsing what we want :

This has you writing the details of these alerts into a csv file, separated “;” semi-colon.

Which looks like this :

The CSV File we are getting our alerts printed into.

Quite. Easily. Done!
While this is a great deal of work to be covered in a single blog, especially for python / coding newbies to grasp, this is just the beginning, and we look forward to extending it for :

> Automating the google alerts fetch : A cron-job or a scheduled workflow to keep fetching the alerts, and either appending to or creating daily CSVs.

> De-duplication : Next time when you run this, it may not only fetch newer alerts, but also bring along these older google alerts processed already. We can maintain the fetched/processed alerts against their title hashes and only honor the delta / new google alerts in our iterations, this may be the v2.0 of “retrieving google alerts on Covid-19/Coronavirus” with the strategy described in this blog.

> Rise above data, into insights : Visualizations, Text Mining as we extend the html parsing to the news source links too, and more. This shall make it a full-blown tool of much capability and utility, where each organization into data is now trying to benefit from with insights around coronavirus and the state of affairs!

So stay tuned!

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

#ICT IOTA Controlled agenT on Android

Telefono Android con il display guasto

Tactical Observability

How To Use Lucky Patcher Apk and Make App Purchases?

How do I structure a monorepo serverless project with the Serverless Framework?

LRU Cache Implementation Using Java

Test Driven Development 101: Why should you care

GitHub As Your Primary DevOps Toolchain

Removing the Blackbox Approach from EDI Integrations

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Khanna

Khanna

Programmer. Thinker. Reader

More from Medium

Build OOP Program with Scraping Feature in Python —  Chapter 2: Implementation Requests &…

Web Scraping With Python for Beginners

Mastering Data Scraping using Python — Portal Job

Scrapy Tutorial — Part 3