Wireless Communication Protocols Using ESP32

IEEE Signal Processing Society-VIT

Published in

IEEE Signal Processing Society-VIT

4 min readJan 22, 2022

Project implemented by members of IEEE-SPS, VIT

ABSTRACT:

This project consists of three parts:

Introduction to various wired and wireless communication protocols using ESP32 and utilizing it to make a universal RC Transmitter/Receiver pair.
Connecting ESP32 to Apple HomeKit.
On-Device processing of Data using ML and pre-trained Neural networks using TensorFlow Lite.

(Demonstration by implementing speech recognition with wake word detection.)

INTRODUCTION:

The first part allows users to control any device using a mobile phone or a custom Remote including features such as an accelerometer, joystick, and button control. This uses ESP-NOW communication protocol and can connect to about 20 clients simultaneously.

Features Implemented:

Mesh network to transfer master characteristics to the nearest slave on the failure of Master.
SERDES to de-serialise all the sensor data to parallel 16-bit with the first bit as acknowledgment and the last one to restart the communication. This acts as a fail-safe.

The second part includes connecting to Apple HomeKit.

The third part includes storing the pre-trained CNN model for detecting wake word: “MARVIN” in the SPI Flash File System (mini SD card inside the ESP32).

Then we record segments of audio and parse them through the CNN model to detect the wake words. If detected audio is captured for 3 seconds and intent detection and implementation are done. (Like turn on lights →turn on device(GPIO light_pin) ).

All these parts are sub-systems for implementing the most efficient AgrIoT system. (IoT in Agriculture).

All three parts of this project have been optimized for large-scale implementation so there will be no difference when scaling this project to the Industrial/Farm level.

COMPONENTS/SOFTWARES USED:

RECEIVER:

ESP32 DevKit V3
TB6612FNG dual H bridge motor driver
LM7805 5V regulator
Custom PCB for mounting all the components
12V LiPo Battery

TRANSMITTER:

ESP32 DevKit V3
Push Buttons (X4)
One axis Joystick
MPU6050 Accelerometer/Gyroscope
3.7V Li-Ion cell

LCD for the user interface. I2S Microphone to capture audio at a high sample rate.

Optional: I2S Speaker to indicate wake word detection and other intended tasks responses.

METHODOLOGY:

The first 2 parts of the project are simple so we will focus more on the implementation of CNN on ESP32 using TensorFlow Lite.

We’ll need to build three components:

Wake word detection
Audio capture and Intent Recognition
Intent Execution

TRAINING DATA:

Our first port of call is to find some data to train a model against. We can use the Speech Commands Dataset. This dataset contains over 100,000 audio files consisting of a set of 20 core commands words such as “Up”, “Down”, “Yes”, “No” and a set of extra words. Each of the samples is 1 second long.

One of these words in particular looks like a good candidate for a wake word -“Marvin”.

To augment the dataset we can also record ambient background noise. (TV, Radio, etc).

A popular approach for word recognition is to translate the problem into that of image recognition.

We need to turn our audio samples into something that looks like an image — to do this we can take a spectrogram.

To get a spectrogram of an audio sample we break the sample into small sections and then perform a discrete Fourier transform on each section. This will give us the frequencies that are present in that slice of audio.

Putting these frequency slices together gives us the spectrogram of the sample.

MODEL TRAINING:

For our system we only want to detect the word Marvin so modify our training labels so that it is a 1 for Marvin and 0 for everything else and then feed this raw data into TensorFlow datasets — we set up our training data repeat forever, randomly shuffle, and to come out in batches.

We have a convolution layer, followed by a max-pooling layer, followed by another convolution layer and max-pooling layer. The result of this is fed into a densely connected layer and finally to our output neuron.

RESULT: WE HAVE ABOUT 95% ACCURACY

Convert this model to TensorFlow Lite to generate C code.

RESULTS and DISCUSSIONS:

How well does it work?

Reasonably well, we have a very lightweight wake word detection system, it runs in around 100ms and still has room for optimization.

Accuracy is ok. We need more training data to make it robust, you can easily trick it into activating by using similar words to “Marvin” such as “marvelous”, “martin”, “marlin” etc… More negative example words would help with this problem or simply change the wake word using a custom data set.

CONCLUSION:

We now have the major parts of our AgrIoT system. Integrating them is as simple as plug and play.

Now we have to develop a computer vision system that will analyze the plant growth along with other parameters from various sensors: Temperature, Soil Moisture, Sunlight, etc.

REFERENCE:

This project was developed by Shaurya Chandra
shaurya.chandra2020@vitstudent.ac.in