Data Visualization and Web Scraping using Python & Python Libraries(Folium/Selenium) by Edward Wang, Bradley Turner and Rowena Singh

Edward Wang
Life’s Journey Through A Lens
4 min readFeb 4, 2018
Winning Picture for Stonybrook HackHealth 2018

Objective: Our hackathon project queries a database of available healthcare services within a given area and provides a visualization of the data in a form of a map with clusters. This will allow cities/states officials to understand which locations require more development in terms of their healthcare services.

Start off by adding these libraries to your project:

  • folium
  • jinja2
  • selenium

Selenium — Library used for web scraping

The HRSA site allows users to query a location but doesn’t supply an API, the best choice is to perform ‘web scraping.’ Web scraping is to extract data from a html page grabbed from the internet. In this case we will be using the python library, Selenium.

Selenium works by first setting the import statement,

import selenium
from selenium import webdriver

For this example we will be using chrome. It’s web driver executable file can be found here:

After downloading this I put this file into my project folder for quick access and get the absolute path to this folder.

absPath, filename = os.path.split(os.path.abspath(__file__))
browser = webdriver.Chrome(executable_path = absPath+'/chromedriver')
url = 'https://datawarehouse.hrsa.gov/tools/analyzers/geo/ShortageArea.aspx'
browser.get(url)

Make sure to run the chromedriver.exe while running this code. When running you will see a chrome browser popup with the website you put as the url. This may take a couple runs for it to popup, if it doesn’t then please look back and see what you may have missed.

For the next step it is necessary to inspect the elements. Here you will find from an inputting side where you will be inputting to any html objects, i.e. textarea. The elements you want to find can be found by right clicking and choosing the option ‘inspect element.’

A helpful tool when searching for elements is to use the chrome plugin ‘xpath.’ When inspecting an element you can fine tune your search for the element using the query:

//label[@id="ctl00$ctl00$MainContent$ContentPlaceHolder1$tbxAddress"]

The id is found by using the ‘Select an element in page to inspect tool,’ or you can right click and inspect element for its id. This query points to the “Street Address” input and this can be seen in the ‘Results’ box of the XPath tool. For the quotations within the parenthesis, you can put a single quotation for the id name as seen below.

"'ctl00_ctl00_MainContent_ContentPlaceHolder1_Label2'"

Once you have your elements for the web page scraped, you can directly set its value by using the command:

element = browser.find_element_by_name("'ctl00_ctl00_MainContent_ContentPlaceHolder1_Label2'")
element.send_keys("blah")

Even when dealing with an element like a drop down you can use the same method to set the value.

When you run the browser you will see your input reflect on the website. To set the search button to run, you just repeat the search step before and run the ‘click()’ method.

browser.find_element_by_name("xpath to button").click()

Once this is completed, it’s the same as performing the past actions and saving the results inside a label to a variable. The results of the querry is saved into the “redesignDiv” element. Here instead of using XPath we use the div id in the code:

browser.find_element_by_name("redesignDiv").text()

In the case of NoSuchElementException, just import this exception from the Selenium library and use a try/catch block to handle the case.

Folium — Data Visualization(Markers)

After grabbing our data it’s time to visualize it and a great library is Folium. Folium can be used to create a map object and in this case we are focused on displaying a choropleth map. This is a map which can be used to indicate intensity of a given set of data. In my example I focused on seeing the intensity of health service in a given location. There are fives points of data queried from a database: dental, mental, primary, MUA/P. If a location found is missing any amount of these five factors the color changes to these values:

Yes = Missing A Service

No = Not Missing A Service

Red[4 Yes]

Orange[3 Yes]

Pink[2 Yes]

Yellow[1 Yes]

Green[0 Yes]

The code for creating a simple map object is below:

import folium
import os
import pandas
def createMap():
map = folium.Map(location=[45.5236, -122.6750])
absPath, filename = os.path.split(os.path.abspath(__file__))
map.save(absPath + '/map.html')

The result will create a html file that can be run on any web browser. From this example the result is:

With the map object created it’s time to add markers. The code for adding markers is:

folium.Marker([45.5244, -122.6699],popup='Title').add_to(map)

Here is how we changed the marker to have color.

folium.CircleMarker([reader["LAT"][i],reader["LON"][i]], radius = 5, popup=reader['STREET'][i], color = "#3186cc").add_to(map)

The result of this looks like:

The final result based on the color code mentioned earlier is:

The final result within the location of New York City could be scaled to the size of the US. For this example we broke it down to a smaller subset to fit the time available for the hack a day. From this finding the city shows that it is properly supplied with all forms of proper health care.

Final Code

Here is the link to github repository.

--

--