ML For Enhanced Calorie Expenditure Counting

10 min readNov 11, 2023

Right now, I am building Sensofit. An AI personal trainer for obese individuals with a complementary smart ring to count calories 2x more accurately than regular PPG sensors. The majority of the idea is a novel sensor that can count calorie expenditure 2x more accurately than current systems to do so. In this article, I build an algorithm to depict raw data in the exercise that is done.

THIS IS A 2 PART ARTICLE, WITH THE 2ND PART BEING ME ACTUALLY BUILDING THE SENSOR.

Modern-day calorie expenditure counting is often inaccurate, incomplete, and unsustainable

Modern-day calorie expenditure counting can be a helpful tool for weight loss and maintenance, but it is important to be aware of the potential pitfalls and to use it realistically and sustainably.

One of the biggest problems with calorie expenditure counting is that it needs to be more accurate. People tend to overestimate their calorie expenditure by an average of 50%. This is likely due to a number of factors, such as difficulty estimating the intensity and duration of exercise, and overestimating the calorie content of certain activities.

Another problem with calorie expenditure counting is that it can be incomplete. Many calorie expenditure counting devices and apps do not track all sources of calorie expenditure. Some sources(NEAT) refer to the calories burned through everyday activities, such as fidgeting and moving around. And other sources(TEF), refer to the decrease in calorie expenditure that occurs when the body adapts to a lower calorie intake.

Finally, calorie expenditure varies widely from person to person, even for people of the same age, weight, and activity level. This is due to a number of factors, such as genetics, muscle mass, and body composition.

Case study on modern calorie expenditure tracking: Fitbit

Fitbit devices combine your basal metabolic rate (BMR) — the rate at which you burn calories at rest to maintain vital body functions (including breathing, blood circulation, and heartbeat) — and your activity data to estimate your calories burned.

How do you get BMR?

Fitbit estimates your BMR using the Mifflin-St Jeor equation, which is a formula that takes into account your height, weight, sex, and age. Along with activity data

Accuracy?

Aberystwyth University researchers found a Fitbit Charge 2 overestimated calorie burn from a walk by 53.5 percent

EVERY single wearable/device uses this method. Why?

Purley due to a limit of factors/sensor data to make it more accurate. A sensor still has not been brought to market that physically measures the calories before and after a workout. This is where I come in…

Piezoelectric sensors gather data and act on it

I will be using a Piezoelectric Energy Harvester (PEH) to convert kinetic energy released from human activities into usable electrical energy. This suggests that the output voltage of a PEH may contain information that can be used to estimate CEE(Calorie Expenditure Estimation).

Source, Image explaining the structure of a PEH

I first have to create a data logger to collect the output voltage of a PEH and the acceleration signals from a 3-axis accelerometer, simultaneously, while I perform walking and running activities. A linear regression model has to then estimate CEE based on the PEH voltage data.

The findings should suggest that the output voltage of a PEH can be used as a new source for estimating CEE. This new method has the potential to be used in wearable devices to estimate CEE without the need for accelerometers, which would reduce power consumption and extend battery life.

Data Manipulation of Fake PEH Data(Generated by GANs)

Before we get into coding, we have to get fake data to test the model on. Here is an example of fake raw data from a piezoelectric energy harvester (PEH) for a person running over a 30-minute time:

Time (seconds), Voltage (volts), Current (milliamps)
0.00, 1.23, 4.56
0.30, 1.45, 5.67
0.60, 1.67, 6.78
0.90, 1.89, 7.89
1.20, 2.11, 8.90
1.50, 2.33, 9.01
1.80, 2.55, 10.12
2.10, 2.77, 11.23
2.40, 2.99, 12.34
2.70, 3.21, 13.45
3.00, 3.43, 14.56
...
1770.00, 3.65, 15.67
1773.00, 3.87, 16.78
1776.00, 4.09, 17.89
1779.00, 4.31, 18.90
1782.00, 4.53, 19.01
1785.00, 4.75, 20.12
1788.00, 4.97, 21.23
1791.00, 5.19, 22.34
1794.00, 5.41, 23.45
1797.00, 5.63, 24.56
1800.00, 5.85, 25.67

This data shows that the voltage and current generated by the PEH increase over time as the person runs faster. This is because the PEH is converting more mechanical energy from the person’s running into electrical energy. The data also shows that the voltage and current generated by the PEH are not constant. This is because the person’s running speed is likely to vary throughout the run.

The formula to calculate calories burned based on the PEH data is as follows:

Calories burned (cal) = Voltage (V) * Current (A) * Time (seconds)

For example, if the PEH generated a voltage of 1.23 V and a current of 45.6 mA over 30 seconds, then the number of calories burned would be:

Calories burned (cal) = 1.23 V * 0.0456 A * 30 seconds = 1.68 calories

This is just an example, and the actual number of calories burned will vary depending on the individual’s characteristics, such as their weight, height, and fitness level.

We also have to take into account the type of exercise the person is doing to get better results, using the MET table:

Linear regression is perfect for this application

Linear regression is a supervised machine learning algorithm that is used to predict continuous values. It is one of the most commonly used machine learning algorithms, and it is relatively easy to understand and implement.

It works by finding a linear relationship between the input features and the target variable. The input features can be anything, such as the height and weight of a person, or the temperature and humidity on a given day. The target variable is the value that you want to predict, such as a person’s body mass index (BMI), or the amount of rain that will fall on a given day.

Once the linear relationship has been found, linear regression can be used to predict the target variable for new input features. For example, if you have trained a linear regression model to predict BMI, you can use the model to predict the BMI of a new person based on their height and weight.

Linear regression is a powerful tool for prediction, but it is important to note that it is only as good as the data that it is trained on. If the data is noisy or incomplete, then the linear regression model will not be able to make accurate predictions.

Data preprocessing: The first step in linear regression is to prepare the data. This involves cleaning the data, removing any outliers, and scaling the data so that all of the features are on the same scale.
Model selection: The next step is to select a linear regression model. There are two main types of linear regression models: simple linear regression and multiple linear regression. Simple linear regression models are used to predict the target variable based on a single input feature. Multiple linear regression models are used to predict the target variable based on multiple input features.
Model training: Once a linear regression model has been selected, it needs to be trained on the training data. This involves finding the values of the model’s parameters that minimize the error between the predicted and actual values of the target variable.
Model evaluation: Once the model has been trained, it needs to be evaluated on the test data. This involves predicting the target variable for the test data and comparing the predicted values to the actual values. The model’s performance is evaluated using a metric such as the mean squared error (MSE).
Model deployment: Once the model has been evaluated and found to perform well on the test data, it can be deployed to production. This involves making the model available to users so that they can use it to predict the target variable for new input features.

Using Python and ML to do data analysis and augmentation

First, we import all the libraries. Pandas to handle the csv data, numpy to do rough calculations, matplotlib to plot graphs, and sklearn to do the bulk of the data manipulation.

# Import necessary libraries
import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
import seaborn as seabornInstance 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics

Then we read our fake data from a CSV file, to multiply every value of current and voltage together to get a set of power values to act as our dependent variable.

# Read the dataset from 'data.csv' file into a Pandas DataFrame
df = pd.read_csv('data.csv')

# Create a new column 'power' by multiplying 'Current' and 'Voltage' columns
df['power'] = df['Current'] * df['Voltage']

# Select only the 'Time' and 'power' columns from the DataFrame
df_binary = df[['Time', 'power']]

# Rename the selected columns for clarity
df_binary.columns = ['Time', 'power']

We then do one out of our two test graphs, by plotting Time vs Power to see the correlation.

# Plot a scatter plot of 'Time' vs 'power' using Matplotlib
df.plot(x='Time', y='power', style='o')  
plt.title('Time vs Power')  
plt.xlabel('Time')  
plt.ylabel('Power')  
plt.show()

Now comes the fun bit. We split our data into training sets for our model by reshaping all of them into vector quantities and then training our linear regression model.

# Split the data into training and testing sets
X = df['Time'].values.reshape(-1, 1)
y = df['power'].values.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Create a Linear Regression model and train it on the training data
regressor = LinearRegression()  
regressor.fit(X_train, y_train)

# Print the intercept and slope (coefficients) of the linear regression model
#print(regressor.intercept_)
#print(regressor.coef_)

After that, we do our second test to compare our predicted values to the actual values of the first 25 values in our dataset, in the form of a bar chart.

Graph generated by me(x axis — data point numbers, y axis — power) — Test 1/2

# Make predictions on the test data
y_pred = regressor.predict(X_test)

# Create a DataFrame to compare actual and predicted values
data = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})

# Display the first 25 rows of the comparison DataFrame as a bar chart
df1 = data.head(25)
df1.plot(kind='bar', figsize=(16, 10))
plt.grid(which='major', linestyle='-', linewidth='0.5', color='green')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
plt.show()

# Create a scatter plot of the test data and the predicted values
# plt.scatter(X_test, y_test, color='gray')
# plt.plot(X_test, y_pred, color='red', linewidth=2)
# plt.show()

Now we manually input the amount of time(in minutes), the person has worked out for. The model is used directly to predict the power, then we use our equation to figure out the amount of calories burnt.

# Create a DataFrame with the 'Time' values you want to predict
new_data = pd.DataFrame({'Time': [60]})  # Replace with your desired 'Time' value

# Extract the 'Time' value as a 1D array
new_data_1d = new_data['Time'].values

# Predict the 'power' value using the model
predicted_power = regressor.predict(new_data_1d.reshape(-1, 1))

print(predicted_power)

# Calculate the calories burned using the predicted power and 'Time'
predicted_calories = new_data_1d * predicted_power * 60

# Print the result
print(f'Predicted Calories burned: {predicted_calories[0]} calories')

The result of my model from 60mins of running

This will be a project that really impacts billions

Phew, enough technical parts. But… What does this even mean? What will I do differently in part 2 of this article?

By creating a really accurate model like this while taking into MANY biomarkers and data in general, we can accurately and efficiently guide a user down a path to weight loss/fitness exponentially quicker.
Instead of predicting the current and voltage of the sensor, we will have to use this as a framework to estimate calories directly as we know the power from the ring. I will also have to make this as automated as possible, without any manual work.

This may seem like a very simple ML, but it is the basics of what is needed. The differentiation factor is 1) the amount of quality data, and 2) user experience.

I am super excited to get started on building the ring and making this as real as possible. I truly think that this can make a huge impact, with my recent funding for this project, I am fueled with the resources to make this!!!