Automating coffee chat sign-ups using Python, Selenium, and AWS EC2

Rushi Patel
4 min readOct 31, 2018

--

Summary: I’m currently going through recruiting during my MBA program at Wharton. We have a great career center which does an amazing job bringing company representatives to campus to meet with us. One of the common ways to engage with companies is through a coffee-chat. However, sign-ups go live at 5pm which is when I have class, and therefore all the spots are taken before I can log-into my computer (800+ students…10–15 spots per employer…do the math). My solution: Build a script that automates the sign-up process, deploy it on the cloud, and never worry about missing sign-ups again.

The problem

To sign-up for a coffee chat, we use an online portal and it’s first come, first served. While this is great in theory, there are 800+ students and typically only 10–15 coffee chats for each employer. This means it’s a mad dash once sign-ups start, and if you have slow internet, or are terrible at clicking fast, you are out of luck. Or, if you are like me, and have class at 5pm (when sign-ups usually go live), there is no way to actually sign-up for coffee chats which means less time to meet potential employers.

What the sign-ups look like once I can finally log-in

The solution

What if I could automate the sign-up process? Fortunately, there are some amazing python packages for web scraping, which can be repurposed to automate web clicks. For this project, I chose Selenium. I’ve never worked with it before, but it’s fairly easy to use and has robust functionality. Using chrome “inspect element”, it was easy to find the buttons and input boxes to sign-into the portal and click through the menus. “Copy XPath” is your friend here.

Writing the script

The script was fairly straightforward. I spent a lot of time thinking about the optimal way to have it run right at 5pm, since having it start even a few seconds later would mean missing the sign-up. I ended up using the “schedule” library to have the script start at 4:59pm to get through the log-in pages and then used a while loop to have it wait on the sign-up page until 5:00pm before it ran through the clicks to sign-up.

from selenium import webdriver
from getpass import getpass
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import schedule
def runCareerPathSignUp():
#Inputs
timeToAct = #in Epoch (UTC) - make this 5:00pm local time
usr ='Username'
pwd ='Password'
event_page_URL = 'URL of coffee signup page' #use the event page link
timeSlot = '' #use inspect to find xcode of the div that holds the button. add "/div/a" at the end if the button is not there
chrome_path = r"/home/ubuntu/PythonScripts/chromedriver" #path of chromedriver on the local machine
##Create headless chrome instance
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('headless')
chrome_options.add_argument('window-size=1920x1080')
driver = webdriver.Chrome(chrome_path,chrome_options=chrome_options)
##Enter username and password
driver.get('https://groups.wharton.upenn.edu/groups')
login_pennKey = driver.find_element_by_link_text('PennKey Login').click()
username_box = driver.find_element_by_id('pennkey').send_keys(usr)
password_box = driver.find_element_by_id('password').send_keys(pwd)
##Log-in
login_btn = driver.find_element_by_id('submit1')
login_btn.submit()
##Go to career path and log-in
driver.get(event_page_URL)
login_pennKey = driver.find_element_by_link_text('PennKey Account').click()
driver.get(event_page_URL)
#Wait until actual time hits the requested time
while time.time() < timeToAct:
time.sleep(1)
#Refresh once, wait a few seconds for everything to load
driver.refresh()
time.sleep(2)
driver.implicitly_wait(2)
#find the right time slot and click
selectTimeSlot = driver.find_element_by_xpath(timeSlot).click()
#on the registration page, check if there is a button that allows you to click and if so, select "existing resume" and then click it
element=driver.find_element_by_xpath('//*[@id="events-module"]/div/tt-edit-registration/div/div[2]/section/div/div/form/div[1]/div[3]/label/input')
if not element:
selectRegister = driver.find_element_by_xpath('//*[@id="events-module"]/div/tt-edit-registration/div/div[2]/div[2]/a[2]').click()
else:
selectExisting = driver.find_element_by_xpath('//*[@id="events-module"]/div/tt-edit-registration/div/div[2]/section/div/div/form/div[1]/div[3]/label/input').click()
select_fr = Select(driver.find_element_by_xpath('//*[@id="events-module"]/div/tt-edit-registration/div/div[2]/section/div/div/form/div[3]/div/select'))
select_fr.select_by_index(1)
selectRegister = driver.find_element_by_xpath('//*[@id="events-module"]/div/tt-edit-registration/div/div[2]/div[2]/a[2]').click()
print("Finished") #debug
#Schedule this to run every day at 4:49pm
schedule.every().day.at("4:49").do(runCareerPathSignUp)
while True:
schedule.run_pending()
time.sleep(1)

Putting it on the cloud

I wanted to have the script run automatically and without my computer needing to be on. Since I’m in class at 5pm and we have a strict no electronics policy, I couldn’t have this run locally on my computer. Therefore, I decided to deploy the script to AWS. The hard part of this project was scheduling it to run at a specific time. For some reason, the linux scheduling protocol (cron) was not working for me, so I ended up using “screen” and have the script run continuously in the background. I used a t2.micro instance for this project since I did not need any major computing power and did not want to burn through my AWS credits.

Final thoughts

Overall, this was a fun project that required a lot of trial and error as I was learning how to use Selenium and working with scheduling on AWS. I definitely want to continue learning web-scraping tools as a precursor to the fun data analytics projects I want to do, and this was a great start.

--

--

Rushi Patel

Previously @ BCG and Illumina. Interested in data and analytics, and building cool things! MBA @ Wharton. BioE @ Rice.