Accurate weight reporting can seem a minor issue and even not worth the buzz, as obviously, physical appearance is an apparent signal. Yet, people continue to spend a lot of money on applications for weight prediction or weight management. One of the reasons is phycological because continuous weighting and reporting result in additional pressure for healthier lifestyle habits. The second reason is the accuracy because studies have shown that people usually under-report their weight, which in the long run can result in negative consequences.
One company decided to challenge this paradigm and designed a smart shoe insole, which collects the user’s gait (movement) data and at the end of the day provides the weight estimation through the app. The user just needs to input their initial weight and place the insole, the rest is completely automatic and based on the movement data.
Now let’s dive deeper into the technical details of how we accomplish that.
Data is collected from several accelerometers and pressure sensors continuously during the day. The rough data from multiple sensors is imputed in the algorithm, where they undergo some aggregations and feature engineering. For each sensor, we created different features such as median, min, max, as well as signal processing specific features such as the number of peaks, peaks height, prominence, etc. In addition, the interaction of various sensors was taken into account by creating features such as magnitude and the weighted average of signals values. The result is a similar table for a 1-minute interval per user.
The above-mentioned approach had a serious limitation, which initially put the whole project under the serious threat of existence;
- Lack of data- the product was still in the design stage and the team members themselves were collecting the data. Moreover, the idea was to mimic the real-life experience, which was very challenging for the team as they should simultaneously walk as well as label their activities. Last but not least is the limitation of models that we could use. As the data was very small it was unreasonable to think about advanced models, such as LSTM or even boosted trees as they will overfit the data and will not work for the new user.
- Bias- the data was very specific to the people who collected, which limited the ability of the approach to be generalizable for a larger population.
These issues were a serious challenge, we tried each and every possible model, starting from Bayesian regression to the boosted tree regression and none of them worked. Some models were working for some people but none of them was generalizable and even close to the target, which was weight prediction with the maximum error of 3 kg. Meanwhile, we noticed that the majority of models have the same set of important features. This actually hinted us to concentrate only on the top 10 features and try to build models only with them. This does not work as well, but strangely we again had a similar set of top 3 important features. Finally, after a lot and here I really mean a lot of experiments we came up with a simple linear equation with only 1 feature. Surprisingly this approach worked not only on the train and available test data sets but also worked for completely new users with different weights and physiological characteristics.
Just with the simple formula, we were able to solve the issue of weight prediction from gait data. Our results emphasize the importance of understanding the data, the full potential of its utilization, and model interpretation.