Schedule Tweets with Python for a Job Search Website
In this article we are going to build code that will schedule tweets with Python for our Job Search Website— we have been building with Python and Django. We have a series of articles we have released that cover the stages for building a fully functioning Web Application with Python and Django from scratch.
In the previous two articles we discussed:
In this article — we are going to combine what we have done in the previous two articles and build a twitter bot that will tweet the data we have collected from our web scrapping activities.
Reason for this tutorial
Social media websites like twitter and facebook rank in the top 5 of the world’s most visited websites by traffic volume. Social media is popular with consumers and is therefore a good place to reach you potential audience.
Social media is also organised according to interests, groups and hashtags. Which make it a good place for marketing, as you can target you message to a specific hashtag or audience, instead of a billboard that is fishing for anyone who might come across it.
Therefore, as we start with launching our job search platform — it is important that we also focus on how to reach a wider audience using social media. Ideally we want to reach people on social media, that we can lead to our web application.
Re-Purpose the Web Scrapping Code
We have already written code to scrap the websites we need to get job data, but we now need to re-purpose that code to scrap what we need for tweets. Using the same website, create a new file to scrap the data again for tweets:
from bs4 import BeautifulSoup
from datetime import date
import requests
import json
import configheaders = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
baseUrl = 'https://careers.vodafone.com'
url = "https://careers.vodafone.com/key/vodacom-vacancies.html?q=vodacom+vacancies"#Read a JSON File
def readJson(filename):
with open(filename, 'r') as fp:
return json.load(fp)#Write a JSON file
def writeJson(filepath, data):
with open(filepath, 'w') as fp:
json.dump(data, fp)def shortenUrl(linkUrl):
linkRequest = {
"destination": linkUrl,
"domain": { "fullName": "messages.careers-portal.co.za" },
#"slashtag": "",
}requestHeaders = {
"Content-type": "application/json",
"apikey": config.rebrand_api_key,
}r = requests.post("https://api.rebrandly.com/v1/links",
data = json.dumps(linkRequest),
headers=requestHeaders)if (r.status_code == requests.codes.ok):
link = r.json()
return link["shortUrl"]
else:
return ''#Scan the website
def jobScan(link):
the_job = {}jobUrl = '{}{}'.format(baseUrl, link['href'])
the_job['urlLink'] = shortenUrl(jobUrl)job=requests.get(jobUrl, headers=headers)
jobC=job.content
jobSoup=BeautifulSoup(jobC, "html.parser")to_test = jobSoup.find_all("div", {"class": "joblayouttoken displayDTM"})if to_test == []:
return None
else:title = jobSoup.find_all("h1")[0].text
the_job['title'] = titlethe_divs = jobSoup.find_all("div", {"class": "joblayouttoken displayDTM"})country = the_divs[1].find_all("span")[1].text
the_job['location'] = countrydate_posted = the_divs[2].find_all("span")[1].text
the_job['date_posted'] = date_postedfull_part_time = the_divs[3].find_all("span")[1].text
the_job['type'] = 'Full Time'contract_type = the_divs[4].find_all("span")[1].text
the_job['contract_type'] = contract_typereturn the_jobr = requests.get(url, headers=headers)
c = r.content
soup = BeautifulSoup(c, "html.parser")#Modified these lines here . . . .
table=[]
results = soup.find_all("a", {"class": "jobTitle-link"})
[table.append(x) for x in results if x not in table]#run the scanner and get all the jobs . . . .
final_jobs = []
for x in table:
job = jobScan(x)
if job:
final_jobs.append(job)#Save the data in a Json
filepath = '/absolute-path-to-folder/vodacom.json'
writeJson(filepath, final_jobs)
URL Shortening
We have added specific code for shortening the job page URL before adding it to the dict object, that gets saved in our JSON . URL shortening is crucial for twitter because of the character limitation — you have only 160characters for a tweet.
I used a service called Rebrandly (https://www.rebrandly.com)/— you can create a new account here and use it in the code for shortening the URL.
Saving the data
What we are also doing with this code, at the bottom of the page is: saving the data in a json file. This way, we can run this code on a regular basis to always check for the latest available jobs, and refresh the data saved in the file. The tweeting code will then only need to access the saved file to get always the latest data.
Creating the Tweet from JSON data
We will be creating the tweet in two stages:
- Clean the data from the website and string it together to create the tweet message
- Concatenate the tweet message with appropriate hashtags and create the Tweepy API object to send the tweet
Data processing, creating tweets
import jsondef readJson(filename):
with open(filename, 'r') as fp:
return json.load(fp)def grabVodacomJobs():
statuses = []
filepath = 'absolute-path-to-folder/vodacom.json'
jobs = readJson(filepath)for job in jobs:
status = 'Vodacom-{}{} Apply {}'.format(job['title'], job['contract_type'], job['urlLink']).replace('\n', '').replace(' ',' ')
statuses.append(status)return statuses
Build up tweet message with hashtags
Don’t forget to import the above file in to the tweeting file, we will need to access the function that gets data from the database.
# -*- coding: utf-8 -*-import tweepy
import config
import random
import tweet_dataimport schedule
import time
def createTheAPI():
auth = tweepy.OAuthHandler(config.api_key, config.api_secret)
auth.set_access_token(config.access_token, config.access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
return apivodaJobs = tweet_data.grabVodacomJobs()
hashtags = ["#hashtag1", "hashtag2", "hashtag3", "hashtag4", "hashtag5"]def createVodacomTweet():
while True:
msg = random.choice(vodaJobs)hashtags = getTags()
tags = random.sample(hashtags, 2)if len(msg) < 110:
msg += ' '+tags[0]
msg += ' '+tags[1]if len(msg) < 160:
return msg
break
def sendVodacomTweet():
api = createTheAPI()
tweet = createVodacomTweet()
api.update_status(tweet)
print('Just tweeted: {}'.format(tweet))
#The Scheduling is happening below: schedule.every().day.at("10:30").do(sendVodacomTweet)
while True:
schedule.run_pending()
time.sleep(1)
We are using the random function imported at the top of the file to (1) Select a random tweet from the list sent from database and (2) select a random sample of 2 from the hashtags list. You can make this list as long as you need, the random selection of hashtags also prevents you from sending two identical tweets back-to-back.
We are also using the python schedule library to create a daily schedule of the task — there are multiple approaches that can be used, directly from the schedule website — https://pypi.org/project/schedule/
Code from the documentation:
import schedule
import time
def job():
print("I'm working...")
schedule.every(10).seconds.do(job)
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)
while True:
schedule.run_pending()
time.sleep(1)
Full Video Tutorial: