While looking for a software engineer related internship lately, I stumbled across a website called https://www.internsg.com/jobs /. This website showcases the latest internships available ranging from positions like software engineer, mobile app developer and many more.
Each job posting provides information such as the job type , period, location ,remuneration ,description and responsibilities as seen from the picture below.
Personally , I felt that it was a hassle to check the website daily for jobs that I was interested in. Thus I decided that it might be a good idea to automate this process. I wanted to be notified of a job opening that I am interested in via a messaging platform to save the time to check the website.
I decided that I wanted to
scrap the website using Beautiful Soup 4, Twilio to send a text to myself whenever there is a job that I might be interested and crontab(window task scheduler if you’re a windows user) to run the python script at a specific time of the day.
Beautiful Soup 4 is a python library which is mainly used to pull out data from html or xml files. To install Beautiful Soup 4 , open terminal or cmd(if you’re window’s user) and run
pip install beautifulsoup4
Twilio is a python library which integrates sending text/whasapp messages into your python script. Check https://www.twilio.com/docs/libraries/python on more details regarding Twilio. To install twilio, open terminal or cmd(if you’re window’s user) and run
pip install twilio
You will need to register an account at https://www.twilio.com/. After which click on console dashboard>>dashboard and copy the
ACCOUNT SID and
AUTH TOKEN .
Now create a python file called and paste this code.
from bs4 import BeautifulSoup
from requests import get
from datetime import datetime
from twilio.rest import Clienturl='https://www.internsg.com/jobs/'
print(prettyHTML) # html code for https://www.internsg.com/jobs/
You will roughly get this(image below) when you print prettyHTML
Now, let’s further inspect the html code.
Inside every <div class=’ast-col-sm-10'> , contains the company’s name(Sblock Foundation Pte Ltd) ,href address(https://www.internsg.com/job/sblock-foundation-pte-ltd-copywriter/) and the job title(Copywriter).
You will need to extract the relevant information from the html code by finding all the ast-col-sm10 classes. Subsequently, we will check if the jobTitle matches and the jobKeyword, if so, append the url into a list.
for jobInfo in jobInfos:
# split all the <a> tags to find the job title i.e Copywriter
#if jobKeyword matches jobTitle
for each_url in url_links:
Finally, with the help of twilio, send a text message(that contains all the href links) to yourself using the code below.
for each_url in url_list:
messageStr=messageStr + '\n' + '' + each_urlmessageStr=messageStr+'\n'+str(datetime.now())[:19]## Your Account Sid and Auth Token from twilio.com/consoleaccount_sid = 'XXXXXXXXXX'
auth_token = 'XXXXXXXXXXX'
client = Client(account_sid, auth_token)message = client.messages \
Now, we will need to set up a cron job such that twilio will send us a text message every morning whenever there is a role related to software.To create a
job on a
mac, that executes every day at 12:00, open the terminal and type:
env EDITOR=nano crontab -e
0 12 * * * /full/path/to/python /full/path/to/script.py
Thus in my case,it was
0 18 * * * /usr/bin/python /Users/automationfeed/Programming/Python/Job_Search/job_search.py
This script will run every day at 18:00.Refer to https://ole.michelsen.dk/blog/schedule-jobs-with-crontab-on-mac-osx.html , this is an extremely good guide to understand how to schedule jobs with crontab on Mac OS X .
Full code can be found at github https://github.com/kaikiat/automation/blob/master/src/job_search.py