Data Engineering Meets Full-Stack Development: How To Be A Full-Stack Engineer

Showcasing full-stack engineering applied to personalized learning

Published in

Data Reply IT | DataTech

11 min readJun 13, 2023

Data engineering is revolutionizing the way we approach learning and skill development. Finding relevant and specific information is extremely important when professionals and enthusiasts want to improve their knowledge and skills in various disciplines.

While being an expert in a specific field is always seen as a valuable asset, being able to integrate a diverse range of technologies is what really opens up new opportunities for growth and innovation.

In this article, we explore a case study demonstrating the power of a big-picture approach, combining data engineering and full-stack development, aimed at creating a personalized learning experience for the world of chess.

A little bit of background

Chess may seem like a game of strategy and intuition, but it is also a game of data. Every move can be analyzed and categorized, and studying theory and past games can give players a significant advantage.

I got into chess in 2020, with the Netflix show “The Queen’s Gambit” and Chess.com “PogChamps 3” tournament. In the beginning, I started playing casually, and I didn’t think much of the game. But after a while, I realized how much fun I was having, and how much I wanted to learn more, preferably in a structured and planned way.

I started to look for courses on chess openings. A chess opening is the set of moves, the first 10 to 15, that you memorize in order to have an advantage right from the beginning of the game. The chess opening theory is especially important at the beginning of the game, because the pieces always start in the same place, and the first moves have been studied for centuries throughout history.

I found the website Chessable.com, where many Grandmasters (the highest title a person can earn in the chess world) publish their own courses, and I bought the course “The Iron English: Botvinnik Variation”, where Grandmaster Simon Williams teaches the English opening 1. c4. The chess theory is so complex that this single course is composed of hundreds of lines, so I quickly became overwhelmed: how could I know where to start studying?

I noticed that indeed there was a way to know where to start: I could

download all the past games I played on Chess.com
find some software that could import the games
go through all the opening lines my opponents played against me
go through the opening course and find the opening lines from which to start studying

It goes without saying that this would have been tedious, time-consuming, and definitely not very fun. Instead, I quickly realized that this was an engineering problem that a web app could solve, and I’m very much used to engineering problems: I’m a Data Engineer, and my passion for programming dates back to when I was about ten years old and I was reading books about HTML and developing some simple websites with that good-old 2000’s fashion.

So this is how I found myself eager to build a web app that would help me study the lines of the course I bought, while also making it fun and smarter than what Chessable.com currently offers, which is presenting the lines to train in random order.

Objectives

The first objective of the project was to build a web app where

a chessboard would be displayed in the front-end
I could drag and drop the pieces
the move I made on the board would be sent to the back-end
the back-end would return a move that my opponent would play on the board against me

The second objective was to instruct the back-end on

checking whether the move I made on the board was the best move according to the chess opening course I bought
playing against me the lines of the course sorted by popularity on Chess.com, instead of presenting them in random order

Laying out the front-end: HTML and Javascript

For the chessboard to display, I integrated the library chessboard-element.js, a more compact and modern way of showing a chessboard with pure HTML compared to previous hybrid HTML + Javascript approaches such as chessboard.js.

Thanks to this choice, I had a fully-functioning chessboard in my simple front-end in a couple of days: dragging and dropping pieces followed all chess rules, reversing the board perspective from white to black and vice-versa was as easy as changing an attribute, and I would add some Javascript code that leveraged another library (chess.js) to track what moves were being made on the chessboard HTML element.

Laying out the back-end: Django

Since I’m fluent in Python, the obvious choice for the back-end was Django, the web framework for perfectionists with deadlines, an incredibly versatile and easy-to-learn tool for building web apps with very little time and effort.

Setting up a new Django project is as easy as

pip install django
django-admin startproject mysite
python manage.py runserver

and the back-end was already up and running.

Connecting front-end and back-end: Redis, Channels

The basic scaffoldings of front-end and back-end had been set up, but how to connect the two? And more importantly, how to make moves on the chessboard go back and forth between front-end and back-end as fast as possible?

Inspired by this project, the simplest way is using Redis, an in-memory key-value store capable of sub-millisecond latencies. On my computer, I spun up a Docker container with Redis via docker-compose:

version: '3.1'
services:
  redis:
    image: redis:5
    ports:
      - "6379:6379"

Redis can be integrated with Django with the Python packages channels and channels-redis:

pip install channels channels-redis

channels provides the Consumer interface, which serves as a basis for the implementation of a web socket for sending and receiving messages between the front-end and the back-end. For the implementation, I extended the JsonWebsocketConsumer:

from channels.generic.websocket import JsonWebsocketConsumer

class SingleConsumer(JsonWebsocketConsumer):
    def connect(self):
        self.accept()
        self.send_json(...)    def receive_json(self, content):
        """A move has been made on the front-end chessboard"""
        ...
        
        self.send_json(content)    def disconnect(self):
        ...

The Consumer would be started by Django when the app was launched, and would listen for incoming JSON messages on the Redis channel configured in Django’s settings.py:

CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "channels_redis.core.RedisChannelLayer",
        "CONFIG": {
            "hosts": [("localhost", 6379)],
        },
    },
}

The last piece of the puzzle was obviously going back to the Javascript code in the front-end Django template and adding some more code to receive the JSON message from the back-end and play the move on the chessboard.

Full-stack communication is established, what’s next?

So far so good: I had a fully-functioning web app with the following architecture:

The web-app could

display a chessboard
send my move to the back-end as a JSON message
send back a response with the move to play
play the move on the chessboard

The next step was obvious: I could play any move I wanted, and the back-end could respond with as bad a move as it wanted.

How to make the back-end play the Chessable.com course lines against me?

Exporting the Chessable course

To export the lines of the opening course, I needed to

export all the lines of the course to files (called PGN)
store the lines in a database
make the back-end look in the database if the move I played in the front-end was the best one according to the course
if it was, the back-end would respond with the opponent’s move present in the course
if it wasn’t, the back-end would send a message to the front-end to rewind my move and let me retry

Of course, Chessable doesn’t let you easily export all the lines out of fear of piracy. However, I bought and own the content, so I am legally entitled to “private copies for format shifting or backup”, a right protected by the European Parliament resolution of 27 February 2014 on private copying levies (2013/2114(INI)).

Leveraging the experience that I gained in one of my past lives as a Robotic Process Automation developer, I wrote a scraper using the Python framework Selenium, which automates web browser operations. With it, I launched a script and impersonated myself going through each chapter in the interface of Chessable.com, while recording the moves and writing them to files.

Lines database: MongoDB

What database and formats to choose for the back-end to retrieve the lines? As a Data Engineer at heart, this was the most fun part of the project.

In the beginning, I used a graph database with Neo4j, by modeling chessboard positions as nodes and moves as edges. However, I soon realized how low the performance of writing and retrieving lines as graph trees was.

Instead, I needed a low-latency online database, capable of millisecond latencies to allow the web app to be as responsive as possible, so I opted for the document database MongoDB. I added new Docker containers to the docker-compose:

a MongoDB instance with a volume bind to persist the data locally
a Mongo Express instance to visualize and debug the data during the process

mongo:
    image: mongo:4.4.1
    ports:
      - "27017:27017"
    volumes:
      - "C:/...:/data/db"

  mongo-express:
    depends_on:
      - mongo
    image: mongo-express:0.54.0
    ports:
      - "8081:8081"
    volumes:
      - "C:/.../wait-for-it.sh:/wait-for-it/wait-for-it.sh"
    command: sh -c '/wait-for-it/wait-for-it.sh mongo:27017 -- node app'

The format for storing the lines is tightly coupled with the nature of document-based databases: each item must be a self-contained collection of information, without needing to be joined with other items. For this reason, I constructed each item to be a snapshot of the chessboard, extended to include the previous and the following move, with the following fields

move number
move (in UCI notation)
the position before the move was made (in FEN notation)
the position after the move was made (in FEN notation)
the comment of the Grandmaster explaining why the move is the best one

I used the Python package python-chess to parse each file and pre-process it before ingesting it into MongoDB with pymongo.

The web app knows about course lines, what’s next?

After adding some more Javascript code in the Django template, the web app was now capable of

checking whether the move I played was the best one according to the course
letting me retry if it wasn’t
sending back to the back-end what the opponent’s best move was according to the course

At this point, the architecture looked as follows:

All in all, what I had implemented was the same experience of using Chessable.com: having a chessboard to interact with the course lines, reading the Grandmaster comments displayed on the side, and training lines presented in random order. The most important part was missing: how to let me train the most popular lines among my opponents on Chess.com? Why spend hours studying all the lines at random, given that I would probably never face a large percentage of them?

What was missing was the ability for the web app to

retrieve my games from Chess.com
create some statistics about the opening moves played by my opponents
use those statistics to sort the course lines by the popularity among my opponents, and make me study the most popular lines first

Retrieving my games from Chess.com

Games on Chess.com can be retrieved via the REST API
at the following endpoint:

https://api.chess.com/pub/player/{user_name}/games/archives/

A simple Python script using requests and extracting the field pgn from the JSON response sufficed for downloading all the games in PGN format.

url = f'https://api.chess.com/pub/player/{user_name}/games/archives/'

archives = (
    requests.get(url)
    .json()
    ["archives"]
)for archive in archives:
    json_data = get_from_cache_or_download(
        archive=archive,
    )
    
    for game in json_data['games']:
        (
            output_path
            .open("w")
            .write(game["pgn"])
        )

From games to statistics about openings

Now that I had all my games in files, I needed to extract some statistics regarding the opening moves played by my opponents.

I was facing another Data Engineering challenge: I was using a document database such as MongoDB, which is not the right tool for performing analysis and aggregations. This kind of task, theoretically possible with MongoDB, is highly discouraged because it is not natively supported by the database, hence it must be implemented separately, increasing the overall complexity of the web app.

Instead, I opted for a more natural approach: integrating yet another Docker container in the architecture with a more suitable relational database for aggregations. I chose a PostgreSQL instance.

postgres-service:
    image: postgres:15.1
    ports:
      - "5432:5432"
    volumes:
      - "C:/.../pg_hba.conf:/etc/postgresql/pg_hba.conf"
      - "C:/.../initdb-user.sh:/docker-entrypoint-initdb.d/initdb-user.sh:ro"

For this new target, I created two ingestion flows with the Python package psycopg2:

a flow to ingest into PostgreSQL the downloaded games;
a flow to ingest into PostgreSQL the course lines already being ingested into MongoDB, by reusing the expensive parsing of the PGN files performed by the Python package python-chess. These course lines would be eventually joined with and sorted by the analysis performed on the games.

Now that I had games and course lines in PostgreSQL, I needed to come up with an idea for extracting statistics about the popularity of openings among my opponents.

This was the idea. Let’s think of a course line as a sequence of positions. For each position, I could assign a numerical value based on

how many games that position occurred in;
weighing more if a position was deeper in the line, e.g. arising after 15 moves instead of 3. The reasoning is simple: the deeper the position, the more difficult it is to recall, because many pieces are scattered around the chessboard;
weighing less the games that happened further in the past. This is logical: the less recent a game, the more different my rating on Chess.com, so the more different the opponent’s mindset and preparation.

Then, the overall popularity of a single course line was the sum of all the values assigned for each position of the line.

The final product

After this exciting journey full of challenges, here’s the final architecture of the web app:

Front-end developed with HTML, CSS, Javascript
Back-end based on the Django framework, communicating with the front-end via a Redis consumer
Data architecture composed of unstructured data with MongoDB and structured data with Postgres

The original objectives have been accomplished:

a fully-functioning web app with a chessboard where to try opening moves
a back-end capable of sorting the lines to show based on the popularity in past games
a data architecture enabling both fast user experience and offline analysis

Conclusion

This case study showcases how powerful it is to have a big-picture approach to problem-solving, combining diverse disciplines such as data engineering and full-stack development. The initial idea of personalized learning was pretty straightforward but creating a comprehensive solution entailed planning long-term and integrating a diverse range of technologies.

As engineers, we are expected to solve complex real-world problems with sophisticated solutions by connecting a wide range of systems, and personalized learning becomes ever more important in this context and many fields. The success of this project highlights the potential for innovative solutions when multiple areas of expertise collaborate towards a common goal.