Build a web app for generating Markdown syllabus of Udemy with Flask+jQuery+PostgreSQL on Heroku

Effy.L.H
10 min readJan 1, 2022

--

Motivation

It’s easy to get tired of copying and pasting the syllabus line by line into my notes app when I’m taking courses on Udemy. So I decided to create a web application for automatically scraping syllabuses from the course webpage.

Application Stack and WorkFlow

Preview of our Application

  • This is a single-page web application :

Prerequisites

Languages: Python, Javascript(jQuery),CSS,HTML

Preparations

  1. Create and activate virtual environment folder syllabus:
(syllabus)effylh@192 syllabus %

2. Install needed packages:

beautifulsoup4
Flask
Flask-Migrate
Flask-SQLAlchemy
gunicorn
psycopg2-binary
requests
urllib3

3. Create a project folder named Syllabus-Scraper and then cd to the root:

(syllabus) effylh@192 Syllabus-Scraper %

4. Create folders we need:

(syllabus) effylh@192 Syllabus-Scraper % mkdir project
(syllabus) effylh@192 Syllabus-Scraper % cd project
(syllabus) effylh@192 Syllabus-Scraper project% mkdir static templates udemy

Preview of the directory structure

When I mention a specified file or folder name in this article, you can come back and check where it’s located in the project.

Let’s get started

1. (Backend)Build a skeleton of the Flask app and run it

  • Create run.py: it will be used to start the application via flask run or python run.py
  • Create __init__.py: this file will run automatically as soon as the line from the project import app is executed in the run.py
  • Create view.py: We need to create two routes for this application
  • Furthermore, we can use Flask-Blueprint to organize those view functions, which is quite convenient for us to expand more functionalities in the next version. Now let’s make some changes to the view.py:
  • Also, don’t forget to register the Blueprint in the __init__.py:

2. (Frontend)Build HTML pages in the templates directory

  • Create base.html: each page in our application will share the same basic layout defined in this HTML file
  • In the <head></head>block we add all the resources we need for this application: Google Fonts, a link to an external CSS file(we’ll create later), Font Awesome
  • {% block content %} {% endblock %} The portion between the two pairs of curly braces can be customized by the contents defined in child templates
  • Create index.html: as a child template, we can expand the layout based on the base.html between {% block content %} and {% endblock %}
  • As a reference, here is a wireframing sketched of the index page:
  • Then, let’s fill up the wireframing with Bootstrap components and the customized CSS file:
  • Create a styles folder in the static directory, then create and put the index.css into it.

The following is the code of the index.css:

  • Thus, we get the result shown as below (The remaining 3 tasks are indicated as well):

In the next section, we’re going to create tables and set up a local database in pgAdmin, then we’ll connect the database with our application.

ER Model

ER Model stands for the Entity-Relationship model which is a good way to represent the relationships between each tables of the database.

ER model for this project
  1. Create a database called udemy_scrape in pgAdmin:

Then, we add some lines in the __init__.py in order to connect the database with the Flask app:

2. Import needed packages :

  • SQLAlchemy
  • Migrate
  • os

3. Use app.config.from_mapping() to set configurations for the app:

  • SECRET_KEY
  • SQLALCHEMY_DATABASE_URI
  • SQLALCHEMY_TRACK_MODIFICATIONS:

4. Create an SQLAlchemy object called db and initialize the Flask-Migrate object:

Note1: For local testing:We could assign those configuration variables every time in the terminal , or a more convenient way, we could assign these environment variables (such as ‘DATABASE_URL’) inside a file called .env (we’ll create it soon)

Note2: For product testing:After deploying all the project files on Heroku, we will assign those configuration variables directly via Heroku command-line interface.

5. Create a file called .env, then set the environment variable with the URI of our local PostgreSQL database we created before:

Note: About the general form of PostgreSQL database URI, you can find more details: (Chapter: 34.1.1.2. Connection URIs) : https://www.postgresql.org/docs/14/libpq-connect.html

DATABASE_URL="postgresql://postgres:854823@localhost:5433/udemy_scrape"

6. As the ER model graph indicated above, let’s create a file called models.py, and create two tables inside of it:

  • udemy_courselist: a table storing data related to Udemy platform
  • course: a table storing data of all platforms (We plan to extend functionalities to other platforms such as Coursera and edX in the future.)

7. We design a One to One relationship between these two tables, the following is detailed explanation for the code:

  • When the One to One relationship is built, the Course objects on UdemyCourseList called course
  • backref: is equivalent to declaring a new property .udemy_courselist on the Course class. Then we can use any instance of Course instead of using the udemy_course_id foreign key to access the UdemyCouseList model as an object. (For example:courseTableInstance.udemy_courselist)
  • uselist: because it’s a one-to-one relationship, we need to set uselist to False

It’s time to build interactive features

Our goal is to turn the tasks below into reality:

Task1:Collecting submitted link from the user

Task2:Show Markdown results in the text area

Task3:Click the button and check the Markdown live preview

Now, let’s create a file called app.js.

  1. Create a URI validator:

In order to get a valid link from the user, we need to create a validator(a javascript function) for the entered link.

We pick the URL of a random Udemy course and got the general URI pattern as:

https://www.udemy.com/course/Course-Name-Is-Separated-By-Hyphens/

Then, we use that pattern to define our validator:

Link validator

2. Use jQuery to submit the form:

jQuery is a fast, small, and feature-rich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation, and Ajax much simpler with an easy-to-use API that works across a multitude of browsers — https://jquery.com/

The workflow is like this: After the user enters a link in the input box, clicks the Submit button, triggers the ajax POST request (accessing views.py on the server). Then, the script inside the views.py begins to check the database, if there is already a record there, retrieve and return it directly; Otherwise, call the UdemyAPI to fetch the data.

Let’s take a close look at the code below:

  • Firstly,We define a function $(‘form’).on(‘submit’, function(e){} that sets up a ‘submit’ behavior on the form element.
  • When the user clicks the button, the string value in the input box with the id of linkInput will assign to the variable inputValue.
  • Then, if the entered link passed the validation check, perform an AJAX request. Otherwise, display an error message.
  • Notice:we need to wrap the whole function inside the $( document ).ready(), which lets the function only run once the page DOM(Document Object Model) is ready.
  • Don’t forget to call the preventDefault() method, which prevents the page from reloading the browser whenever a form is submitted.

3. Before we fill up the jQuery function, we need to get our index.html page ready:

01. Add a Text Area box

02. Add a ‘Copy’ button

03. Add a bootstrap collapse component

04. Add dependencies at the end of the file: bootstrap, showdown(a package used to display Markdown live preview), jQuery; Specify the URL of app.js

4. Now, let’s fill up the rest of the jQuery function:

We use a set of key/value pairs to configure the Ajax request:

  • type: set the request type
  • url: the string URL to which the request is sent.
  • data: data to be sent to the server (It can be JSON object, string, or array)

if the validation check is passed:

  • $.ajax(): make a POST request to the URL ‘/update’ which is defined in the views.py
  • req.done(): once the request is complete, the callback function will be invoked

if the validation check failed:

  • display the error message

5. We also need to build a ‘factory’ for processing the data got by the app.js, then put those processed data onto the screen by the update() function defined in the views.py

Firstly, let’s create a new file called factory.py, the ‘factory’ stores all the methods we need, let’s check the purpose of each method as follows:

  • categorize(): We’re planning to extend functionalities to other platforms such as Coursera and edX in the future version, so we use this function to detect all links provided by user and assign each of them a platform code (1 represents Udemy, 2 represents Coursera, etc.)

The next task is how to get those syllabus data? Creating a webpage scraper might be an option but not recommended here. In fact, all our demands can be satisfied via APIs provided by Udemy, the only thing we need to do is apply an API client account at this page: https://www.udemy.com/user/edit-api-clients/

When Udemy approved your application, they will provide you a Client Id and a Client Secret for making requests and retrieving data from their server:

Besides, I imported a repository on GitHub as the submodule for my project, which is a quite convenient tool for us to call those API methods(if you have any questions about how to import a existed GitHub repository as your submodule, please check this tutorial):

In order to get the data we want, firstly, we need to grab the course id from the course page, then we make two requests to Udemy server to retrieve the syllabus and the name of that course:

  • getCourseID()
  • getCourseDetailsFromApi():get the name of the course
  • getCurriculumFromApi(): get the syllabus of the course

Below is how the factory.py should look like:

6. It’s time to run our ‘factory’ in the views.py, we get several tasks done here:

01. Grab the link submitted by the user via the ajax POST request

02. Detect and categorize links according to their platforms

03. Get the course id from the web page

04. Check if the course id already exists in our database

05. If the course id does not exist: make a request to the server to get the syllabus, then add a new record into our database. (If the course id already exists): retrieve and return the data from our database directly

06. Return course name and syllabus in JSON format

7. Then, we’re going to put returned values on the screen, let’s add some lines inside the req.done() function of the app.js:

In addition, while the user is waiting for the results after clicking the submit button, adding a spinner on the button might be a nice choice, let’s add one line inside the body of the if statement:

$('#submitButton').html('<span class="spinner-border spinner-border-sm" role="status" aria-hidden="true"></span>Loading...')

We also need to consider the scenario: When our user submits an invalid link and the error message shows up, then they modify and re-submit a valid one, at that moment, the error message should be set back to invisible, right? Let’s add one more line inside the body of if statement:

$('#validateError').hide()

8. Then, We are going to add functionality to the copy button:

This is how the app.js should look like:

Local Test

  • Set the FLASK_APP environment variable
(Mac)export FLASK_APP=run.py
(Win)set FLASK_APP=run.py
  • Implement migration for Flask application(run the code below a line by line)
#create a migration repository
flask db init
#generate an initial migration
flask db migrate -m "AnyMessage"
#apply the migration to the database
flask db upgrade
  • Run the application
flask run

Now, pick a random course and copy its link at the homepage of Udemy, then paste it into our application, click the ‘Submit’ button:

It works!🥳

(option1)Deploy to Heroku

Because Heroku ended up with its free tier plan, If you don’t want to use Heroku, please skip this section and directly jump to the next section: Deploy to Google Cloud Run.

1. Preparations

heroku login -i
  • (Terminal)Automatically generate the requirements. txt, which is a file used to store information about all dependencies of the project:
pip freeze > requirements.txt
  • To let Heroku know how to start up our app, we need to create a file called ‘Profile’, then add one line inside of it:
web: gunicorn run:app

run: represents the name of the python file(run.py) that runs our app

app: represents the app name, and you can find it inside of the run.py:

2. Create a Heroku app

We can easily create a new app named ‘syllabuscraper’ on the Heroku web page:

3. Set up PostgreSQL add on to Heroku server

heroku addons:create heroku-postgresql:hobby-dev --app syllabuscraper

4. Set up config variables

heroku config:set FLASK_APP=run -a syllabuscraper 
heroku config:set SECRET_KEY="Any_Random_String_Keys" -a syllabuscraper

5. Deploy our project

git push heroku main

6. Create new tables on Heroku server

heroku run python 
>>>from project import db
>>>db.create_all()

project: The directory contains the __init__.py

Now, we can use a simple line to check whether tables are already there:

heroku pg:info -a syllabuscraper

Finally, we can run and test our application:

heroku open -a syllabuscraper

(Option2)Deploy to Cloud Run

Please Read:
Deploy a Flask app with Docker, Google Cloud Run, and Cloud SQL for PostgreSQL

That’s all for this tutorial. Thanks!

If you have any questions, feel free to comment.

--

--

Effy.L.H

👩🏻‍💻 Programmer:Full-Stack Dev & DataScience ⁣ 👉 github.com/monkeyapple ⁣ 🎮 3D Artist:Modeling & Texturing⁣ 👉 artstation.com/monkeyapple