Schedule Tweets with Python for a Job Search Website

Skolo Online Learning
Nerd For Tech
Published in
5 min readFeb 18, 2021

--

In this article we are going to build code that will schedule tweets with Python for our Job Search Website— we have been building with Python and Django. We have a series of articles we have released that cover the stages for building a fully functioning Web Application with Python and Django from scratch.

In the previous two articles we discussed:

In this article — we are going to combine what we have done in the previous two articles and build a twitter bot that will tweet the data we have collected from our web scrapping activities.

Reason for this tutorial

Social media websites like twitter and facebook rank in the top 5 of the world’s most visited websites by traffic volume. Social media is popular with consumers and is therefore a good place to reach you potential audience.

Social media is also organised according to interests, groups and hashtags. Which make it a good place for marketing, as you can target you message to a specific hashtag or audience, instead of a billboard that is fishing for anyone who might come across it.

Therefore, as we start with launching our job search platform — it is important that we also focus on how to reach a wider audience using social media. Ideally we want to reach people on social media, that we can lead to our web application.

Re-Purpose the Web Scrapping Code

We have already written code to scrap the websites we need to get job data, but we now need to re-purpose that code to scrap what we need for tweets. Using the same website, create a new file to scrap the data again for tweets:

from bs4 import BeautifulSoup
from datetime import date
import requests
import json
import config
headers = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
baseUrl = 'https://careers.vodafone.com'
url = "https://careers.vodafone.com/key/vodacom-vacancies.html?q=vodacom+vacancies"
#Read a JSON File
def readJson(filename):
with open(filename, 'r') as fp:
return json.load(fp)
#Write a JSON file
def writeJson(filepath, data):
with open(filepath, 'w') as fp:
json.dump(data, fp)
def shortenUrl(linkUrl):
linkRequest = {
"destination": linkUrl,
"domain": { "fullName": "messages.careers-portal.co.za" },
#"slashtag": "",
}
requestHeaders = {
"Content-type": "application/json",
"apikey": config.rebrand_api_key,
}
r = requests.post("https://api.rebrandly.com/v1/links",
data = json.dumps(linkRequest),
headers=requestHeaders)
if (r.status_code == requests.codes.ok):
link = r.json()
return link["shortUrl"]
else:
return ''
#Scan the website
def jobScan(link):
the_job = {}
jobUrl = '{}{}'.format(baseUrl, link['href'])
the_job['urlLink'] = shortenUrl(jobUrl)
job=requests.get(jobUrl, headers=headers)
jobC=job.content
jobSoup=BeautifulSoup(jobC, "html.parser")
to_test = jobSoup.find_all("div", {"class": "joblayouttoken displayDTM"})if to_test == []:
return None
else:
title = jobSoup.find_all("h1")[0].text
the_job['title'] = title
the_divs = jobSoup.find_all("div", {"class": "joblayouttoken displayDTM"})country = the_divs[1].find_all("span")[1].text
the_job['location'] = country
date_posted = the_divs[2].find_all("span")[1].text
the_job['date_posted'] = date_posted
full_part_time = the_divs[3].find_all("span")[1].text
the_job['type'] = 'Full Time'
contract_type = the_divs[4].find_all("span")[1].text
the_job['contract_type'] = contract_type
return the_jobr = requests.get(url, headers=headers)
c = r.content
soup = BeautifulSoup(c, "html.parser")
#Modified these lines here . . . .
table=[]
results = soup.find_all("a", {"class": "jobTitle-link"})
[table.append(x) for x in results if x not in table]
#run the scanner and get all the jobs . . . .
final_jobs = []
for x in table:
job = jobScan(x)
if job:
final_jobs.append(job)
#Save the data in a Json
filepath = '/absolute-path-to-folder/vodacom.json'
writeJson(filepath, final_jobs)

URL Shortening

We have added specific code for shortening the job page URL before adding it to the dict object, that gets saved in our JSON . URL shortening is crucial for twitter because of the character limitation — you have only 160characters for a tweet.

I used a service called Rebrandly (https://www.rebrandly.com)/— you can create a new account here and use it in the code for shortening the URL.

Saving the data

What we are also doing with this code, at the bottom of the page is: saving the data in a json file. This way, we can run this code on a regular basis to always check for the latest available jobs, and refresh the data saved in the file. The tweeting code will then only need to access the saved file to get always the latest data.

Creating the Tweet from JSON data

We will be creating the tweet in two stages:

  • Clean the data from the website and string it together to create the tweet message
  • Concatenate the tweet message with appropriate hashtags and create the Tweepy API object to send the tweet

Data processing, creating tweets

import jsondef readJson(filename):
with open(filename, 'r') as fp:
return json.load(fp)
def grabVodacomJobs():
statuses = []
filepath = 'absolute-path-to-folder/vodacom.json'
jobs = readJson(filepath)
for job in jobs:
status = 'Vodacom-{}{} Apply {}'.format(job['title'], job['contract_type'], job['urlLink']).replace('\n', '').replace(' ',' ')
statuses.append(status)
return statuses

Build up tweet message with hashtags

Don’t forget to import the above file in to the tweeting file, we will need to access the function that gets data from the database.

# -*- coding: utf-8 -*-import tweepy
import config
import random
import tweet_data
import schedule
import time

def createTheAPI():
auth = tweepy.OAuthHandler(config.api_key, config.api_secret)
auth.set_access_token(config.access_token, config.access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
return api
vodaJobs = tweet_data.grabVodacomJobs()
hashtags = ["#hashtag1", "hashtag2", "hashtag3", "hashtag4", "hashtag5"]
def createVodacomTweet():
while True:
msg = random.choice(vodaJobs)
hashtags = getTags()
tags = random.sample(hashtags, 2)
if len(msg) < 110:
msg += ' '+tags[0]
msg += ' '+tags[1]
if len(msg) < 160:
return msg
break

def sendVodacomTweet():
api = createTheAPI()
tweet = createVodacomTweet()
api.update_status(tweet)
print('Just tweeted: {}'.format(tweet))
#The Scheduling is happening below: schedule.every().day.at("10:30").do(sendVodacomTweet)
while True:
schedule.run_pending()
time.sleep(1)

We are using the random function imported at the top of the file to (1) Select a random tweet from the list sent from database and (2) select a random sample of 2 from the hashtags list. You can make this list as long as you need, the random selection of hashtags also prevents you from sending two identical tweets back-to-back.

We are also using the python schedule library to create a daily schedule of the task — there are multiple approaches that can be used, directly from the schedule website — https://pypi.org/project/schedule/

Code from the documentation:

import schedule
import time

def job():
print("I'm working...")

schedule.every(10).seconds.do(job)
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)

while True:
schedule.run_pending()
time.sleep(1)

Full Video Tutorial:

--

--

Skolo Online Learning
Nerd For Tech

Founder of Tati Digital, passionate about digital transformation and ushering businesses in to the digital age and 4th industrial revolution.