Python

How to scrape code from Medium using Python

Medium is filled with code snippets. Wouldn’t it be convenient to be able to extract and save them? If you answered yes, keep reading…

Jacob Ferus
3 min readMay 14, 2022

--

Generated using OpenAI DALL·E 2.

Libraries

The tools I will use are Selenium, Beautiful Soup and requests.

Selenium is a framework for programmatically controlling a browser. Originally, it was made for testing purposes of web interfaces, but it has become popular for scraping websites that dynamically load content using JavaScript.

Beautiful Soup is a library for parsing HTML and XML to extract information of interest. Everything that can be done with Beautiful Soup can also be done with Selenium, but I use Beautiful Soup here also because I find it easier.

Requests is a simple library for performing HTTP requests.

Steps

The scraping process will include the following steps:

  1. Load the site using Selenium to run the JavaScript and show all dynamically generated content (that is, the code snippets).
  2. Use Beautiful Soup to find the snippets.
  3. Extract the code snippets and save them in a dictionary.

--

--

Jacob Ferus

Looking outside the box and making sense of the world using data.