How I built the backend for my Sudoku app

Published in

SLTC — Sean Learns To Code

5 min readMar 4, 2023

In the previous post I wrote about how I built my web-based Sudoku app. The app is currently available at http://sean-sudoku.netlify.app/. The source code is available at https://github.com/seanluong/sudoku.

An issue that I didn’t discuss much in details was how to generate a Sudoku puzzle, which is obviously a key requirement of the game. This is the topic of this post.

There are 2 approaches that I could think of when it came to generate puzzles: coming up with a puzzle generation algorithm or fetching puzzles from an API.

For the algorithmic approach, the idea is to write code to generate a puzzle. The bonus requirement is to be able to generate a puzzle at a given level of difficulty. I don’t find this approach very attractive because it’s a bit too algorithmic heavy, which doesn’t seem to maximize my learning experience.

For the API approach, while there are plenty of websites that let people play Sudoku online, I was not able to find any API that allows me to fetch Sudoku puzzles. In the end, I decided that I would just scrap the puzzles from another Sudoku website i.e. the source site, since it seems to be the most fun thing to do to me. There were a few challenges with this approach

How to fetch the HTML of the source site?
How to parse the HTML to get the puzzle content?
How to rate limit requests to the source site to avoid sending an excessive number of requests to their server?

Fetching the HTML

The first road block I ran into was how to fetch the HTML. At first, I tried to use the Fetch API in my frontend code. Unfortunately the source site doesn’t have a CORS policy that allows access from the domain of neither my local machine (localhost ) nor Netlify, where I hosted my Sudoku web app.

To work around this CORS issue, I had to create a backend part for my app. The backend would then serve as the intermediary between my frontend and the source site. When the frontend asks for a new puzzle, a request is sent to the backend. The backend handles the request by sending a different request to the source site to get back the HTML content of the site, parse the HTML for the puzzle’s content, and return the puzzle in the response back to the frontend.

There were tons of options to choose from in terms of how to set up the backend. Because I only wanted to focus on coding up the backend implementation without having to be bothered by networking configurations, I was thinking of using serverless options such as Amazon Lambda or GCP Cloud Functions. As I was researching on which one was easier to integrate with a Netlify hosted web frontend, I realized that Netlify also provided its own serverless feature called Netlify Functions. As soon as I looked into the documentation of Netlify Functions I knew that I was in the right place.

The process of creating a serverless backend endpoint listening to a request from a web frontend in the same Netlify site is very straightforward.

Install the Netlify CLI into my dev environment
Generate the boilerplate code for my severless function using the command netlify functions:create fetch-puzzle
Start the dev server with the command netlify dev

At this point, the frontend web app was running at localhost:8888 and serverless backend endpoint was available at http://localhost:8888/.netlify/functions/fetch-puzzle. I was now able to fetch the HTML of the source site from my backend code. This was done using request-promise as follows

const rp = require("request-promise");
const rpOptions = {
   uri: process.env.puzzle_url,
   headers: {
      "User-Agent": "Request-Promise",
   },
};
const html = await rp(rpOptions);

Note that process.env.puzzle_url an environment variable named puzzle_url that I defined in Netlify to not expose the URL of the source site in the code.

Parsing the HTML

With the HTML of the source site in hands, the next step is to parse the HTML to get the content of the puzzle. First, I need to convert the HTML content, which is essentially a string, into a DOM tree. This was done using a library called node-html-parser.

const { parse } = require('node-html-parser');
const dom = parse(html);

From the documentation of node-html-parser, calling parse and passing in a string containing HTML content will return an object of type HTMLElement. This object has a querySelectorAll method (amount other methods) that allows users to query for HTML elements using all the kinds of CSS selectors supported in CSS3, which is more than enough for us to extract puzzle content from the HTML.

The remaining code of the puzzle content extraction logic is available as reference here. The logic heavily relies on the source site’s HTML structure. If there are any changes on their side our code will be broken.

Rate limiting requests

I guess it goes without saying that we definitely don’t want our Sudoku app to create any load issues to the source site so it is important that we keep the number of requests we send to the source site as low as possible while provide users of the Sudoku app a reasonable number of puzzles to solve everyday. Having only 1 puzzle to solve per day is definitely not fun while providing 1000 puzzles or 100 puzzles per day is probably not much different from providing 10 puzzles per day. That was when I decided that I would just let people solve up to 10 puzzles per day and if they want more they can come back the next day.

To implement the rate limiting logic, I maintained an in-memory cache of the following structure

interface PuzzleCache {
  puzzles: Puzzle[];
  today: string;
  currentPuzzle: number;
}

The rate limiting algorithm works like this

Get the current day
Check if the value of `today` in the cache matches the current day
If it's a match
  Check if the number of puzzles in the cache has reached the cap
  If the cap has been reached
    Get the next puzzle by having the `currentPuzzle` value increased by 1
  Else
    Fetch a puzzle from the source site
Else
  Clear the cache
  Set the value of `today` to the current day
  Fetch a puzzle from the source site

The code that implements the algorithm is available here.

Because the cache is shared across all users of the Sudoku web app we are able to limit the number of requests to the source site to hopefully roughly X requests per day. The value of X is stored in a Netlify environment variable, which is different between my local dev environment and the production environment. Note that it’s not always exactly capped at X requests per day because the cache is an in-memory one, which will be lost if the Netlify server restarts for any reason.

How I built the backend for my Sudoku app

Fetching the HTML

Parsing the HTML

Rate limiting requests

Written by Sean