Website screenshot generator with Python

Recently, I needed a simple task to do: Build a website screenshot generator.

The idea is to enter any URL in an input field and show the generated screenshot in the same page.

I used Python/Django and the Selenium WebDriver to build the app. Selenium is a great tool for Web UI Automation.

Install Python Selenium with:

pip install selenium

If you want to take a screenshot using pure Python is pretty simple:

from selenium import webdriver
“”” Save a screenshot from spotify.com in current directory. “””
DRIVER = “chromedriver”
driver = webdriver.Chrome(DRIVER)
driver.get(‘https://www.spotify.com')
screenshot = driver.save_screenshot(“my_screenshot.png”)
driver.quit()

Selenium requires a driver to interface with the browser. I’m using the Chrome Driver. Download it from here and put it in your PATH so it can be accessed from anywhere.

In my case the driver is located in /usr/local/bin/chromedriver

You can either save the screenshot on the disk:

driver.save_screenshot(IMAGE_PATH)

or get the screenshot as a binary data:

driver.get_screenshot_as_png()

We’ll support both options. So let’s start building the app:

# Create the project
django-admin.py startproject screenshot_generator
# Create the app
django-admin.py startapp app

In the settings.py file don’t forget to add the ‘app’ created and set the variables to manage static data.

# settings.py
INSTALLED_APPS = [

‘app’
]
STATIC_URL = ‘/static/’
MEDIA_ROOT = os.path.join(BASE_DIR, ‘media’)
MEDIA_URL = ‘/media/’

In the project urls.py file, include the app urls and serve the media directory.

# urls.py
from django.conf.urls import url, include
from django.conf import settings
from django.views.static import serve
from app import urls as app_urls
urlpatterns = [
url(r’^’, include(app_urls)),
url(r’^media/(?P<path>.*)$’, serve, {
‘document_root’: settings.MEDIA_ROOT,
}),
]

Inside our app code:

First, create the home template view in the urls file:

# urls.py
from django.conf.urls import url
from django.views.generic import TemplateView
urlpatterns = [
url(r’^$’, TemplateView.as_view(template_name=”home.html”)),
]

Then, create a simple HTML template with the form to enter the desired URL:

# home.html
<h1>Enter URL:</h1>
<form action=”/get_screenshot/” method=”post”>
{% csrf_token %}
<input type=”url” name=”url” size=”40" placeholder=”Ex. https://www.netflix.com" required>
<input type=”submit” value=”Generate screenshot”>
</form>

As you can see, the form action calls a get_screenshot method that will be in our views.py . This is the method that has all the magic to generate the screenshot.

Let’s run the app and go to http://localhost:8000/ to see the preview home template:

python manage.py runserver

Before start creating get_screenshot, we should add the view to the app urls file. Our final urls.py looks like this:

# urls.py
from django.conf.urls import url
from django.views.generic import TemplateView
from app import views
urlpatterns = [
url(r’^$’, TemplateView.as_view(template_name=”home.html”)),
url(r’^get_screenshot’, views.get_screenshot, name=”get_screenshot”),
]

Finally, create the get_screenshot in our views. Let’s review it step by step:

# views.py
def get_screenshot(request):
width = 1024
height = 768

You can specify a width and height to the driver to get a custom screenshot. That’s why I have a default width and height in case these params are not specified in the URL.

Take a look at the Selenium WebDriver API documentation and see all the options.

First thing we need to do in our view is validate the request method and validate if the ‘url’ exists in that request. Also, check if the url is not null nor empty.

if request.method == ‘POST’ and ‘url’ in request.POST:
url = request.POST.get(“url”, “”)
if url is not None and url != ‘’:

Then, capture the url params if the user specify them:

params = urlparse.parse_qs(urlparse.urlparse(url).query)
if len(params) > 0:
if ‘w’ in params: width = int(params[‘w’][0])
if ‘h’ in params: height = int(params[‘h’][0])
# Ex: https://www.netflix.com/?w=800&h=600

After that, set the right driver, get the url and set the window size:

driver = webdriver.Chrome(DRIVER)
driver.get(url)
driver.set_window_size(width, height)

Now, check if the user has specify the save param. This variable will decide if we save the screenshot on disk or just serve it as a binary data.

if ‘save’ in params and params[‘save’][0] == ‘true’:
# Ex: https://www.netflix.com/?save=true

If the above is true, we’ll save the screenshot in the media directory with a name that is formed joining the current timestamp and a “_image.png” string.

Pass the full path to the save_screenshot method. Also, make sure that the media directory exists:

now = str(datetime.today().timestamp())
img_dir = settings.MEDIA_ROOT
img_name = “”.join([now, ‘_image.png’])
full_img_path = os.path.join(img_dir, img_name)
if not os.path.exists(img_dir):
os.makedirs(img_dir)
driver.save_screenshot(full_img_path)
screenshot = open(full_img_path, “rb”).read()
var_dict = {‘screenshot’:img_name, ‘save’:True}

If false, just get a binary data with get_screenshot_as_png() and save it in our screenshot variable.

screenshot = driver.get_screenshot_as_png()
image_64_encode = base64.encodestring(screenshot)
var_dict = {‘screenshot’:image_64_encode}

In both cases, var_dict is the dictionary containing the variables needed for our home template.

Finally, quit the driver and render the home template:

# Final views.py:
from django.shortcuts import render
from django.http import HttpResponse
from django.conf import settings
from datetime import datetime
from selenium import webdriver
import base64
import os
import urllib.parse as urlparse
DRIVER = “chromedriver”
def get_screenshot(request):
width = 1024
height = 768
    if request.method == ‘POST’ and ‘url’ in request.POST:
url = request.POST.get(“url”, “”)
if url is not None and url != ‘’:
params = urlparse.parse_qs(urlparse.urlparse(url).query)
if len(params) > 0:
if ‘w’ in params: width = int(params[‘w’][0])
if ‘h’ in params: height = int(params[‘h’][0])
driver = webdriver.Chrome(DRIVER)
driver.get(url)
driver.set_window_size(width, height)
            if ‘save’ in params and params[‘save’][0] == ‘true’:
now = str(datetime.today().timestamp())
img_dir = settings.MEDIA_ROOT
img_name = “”.join([now, ‘_image.png’])
full_img_path = os.path.join(img_dir, img_name)
if not os.path.exists(img_dir):
os.makedirs(img_dir)
driver.save_screenshot(full_img_path)
screenshot = open(full_img_path, “rb”).read()
var_dict = {‘screenshot’:img_name, ‘save’:True}
else:
screenshot = driver.get_screenshot_as_png()
image_64_encode = base64.encodestring(screenshot)
var_dict = {‘screenshot’:image_64_encode}
            driver.quit() 
return render(request, ‘home.html’, var_dict)
else:
return HttpResponse(“Error”)

Wait a second, we didn’t explain var_dict in detail and what we are passing to our home template.

When saving the screenshot on disk, just pass the image name in the ‘screenshot’ template tag and a template tag called save.

When getting a binary data (In this case an image), it’s necessary to encode/decode the image, so we can pass it in our dictionary as a string. We can’t pass a binary data to the render method.

In Python there is a module called base64 that help us to achieve this easily:

import base64
# Encode the image.
screenshot = driver.get_screenshot_as_png()
image_64_encode = base64.encodestring(screenshot)

For decoding, create a file app_extras.py in the dir app/templatetags/ and register the custom template tag decode_image.

# app_extras.py
from django import template
import base64
register = template.Library
@register.filter()
def decode_image(encoded_image):
return “data:image/png;base64,%s” % encoded_image.decode(“utf8”)

Now, update the home.html template to show the screenshot:

# Final home.html
{% load app_extras %}

<body>
<div align=”center”>
<h1>Enter URL:</h1>
<form action=”/get_screenshot/” method=”post”>
{% csrf_token %}
<input type=”url” name=”url” size=”40" placeholder=”Ex. https://www.netflix.com" required>
<input type=”submit” value=”Generate screenshot”>
</form>

{% if screenshot %}
<h2>Screenshot</h2>
{% if save %}
<img src=”{{ MEDIA_URL }}{{ screenshot }}”>
{% else %}
<img src=”{{ screenshot|decode_image }}”>
{% endif %}
{% endif %}
</div>
</body>

Let’s explain the above code:

If the screenshot tag exists, show the HTML tags. There is no need to show them if a screenshot has not been generated.

If we save the screenshot, then show it passing the media URL plus the image name to the src. It will end in something like:

<img src=”/media/1484738370.392261_image.png”>

Otherwise, we show the image passing the encoded screenshot and decode it with the template filter we created.

It will end in something like:
<img src=”data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABAAAAAJPCAIAAABtoTFXAAAgAElEQVR4nOS9W6xtzXIe9FV1jzHn&#10…>

Let’s run our final app to see it in action:

python manage.py runserver

As an example, I’m using the Scrapinghub website url. I specify a width/height and save the screenshot on disk:

https://www.scrapinghub.com/?w=800&h=600&save=true

You can find all the source code of this python-screenshot-generator on GitHub.

I hope this Python/Django example can help someone to understand how to take a screenshot and show it in a webpage.

Enjoy

Ronny