Predicting the Feed Consumption of a Swine Farm

Using Time-Series, Linear Mixed-Models and General Additive Mixed Models in R

7 min readFeb 22, 2022

--

In this blog post I will show you how a particular way of predicting the feed curve of a swine farm using several different types of models. Of course, the road a successful model is not the technique itself, but rather the validity of the research question and its potential implication.

Lets walk through the example. Since it is commercial data, I cannot share the data with you. Then again, it is is always best for the learning exercise if you try to attach the codes and way of working to your own data. Any dataset may suffice, as long as it has repeated observations.

So, lets start by loading the libraries and the data.

As you can see, quite a nice dataset. It is not that clean, since it is commercial data. Clean data only exist in text-book examples.

Data on three farms, and the stables / pens within each farm.

DataExplorer::plot_missing(combined4)
df <- combined4[,colSums(is.na(combined4))<nrow(combined4)]
DataExplorer::plot_missing(df)
table(df$ble_id)
table(df$vbf_id)
table(df$vbf_id, df$lcn_id)

Now, within each pen there are different feeding cycles. So, if we want to analyse the feed curve as function of day-in-curve I need to unlist the data further by adding another layer which I call newField. In the end, we have data on farm level, pen level, and newField (run) level.

Lets proceed with the data exploration. Never think this part of the process is a waste of time! It has helped me out many many times, often after building my first set of models when helping me to understand why none of them made sense. You will often find yourself back and forth between plots, models, plots and models.

My first instinct on modelling this type of data was to use time-series. Besides the nature of the data being indeed in a time-series, there are several ways of creating time-series ensembles that can pick of seasonal influences. So, that is what I tried to do first. Pick up any kind of temporal pattern.

Time-series models do not like (some actually throw a temper tantrum) when you have missing data. So, I used the Kalman method of the **imputeTS** package to fill in the blanks.

Now, lets use that tsibble and let loose several time-series models, including an ensemble.

And then THIS is what happens. To be honest, I still have not figured out why it produces NULL models, even if I ask it to only run on feed curves that have at least 30m data points included. There should also be no missing data, but it just does not work. If you try to run forecasts on this list of models it will stop immediately since the first model on the list is actually not there. It is a list with holes.

In the end, after some tweaking, I gave up and resorted to another set of models — mixed models. What I like so much about mixed models is there ability to deal with missing data (under certain assumptions), and being able to model longitudinal data in a nested dataframe. Exactly what we have here.

Comparing five models. Model 4 has some serious issues despite having the lowest AIC. Just using the AIC to determine which model to use is not the best strategy as overfitting may easily take place.

Lets look at model 4 and model 1 more closely.

The model coefficients in a nice table below.

I already showed you one calibration plot. Lets look at several more. To be honest, calibration plots are not that handy to use and compare models with, It would have been better to show density plots of the residuals across models. However, by the time I got here, I already made up my mind that Linear Mixed Models would not be the model-set to use.

What I did above was add splines to linear mixed models — a procedure I have adopted numerous times across different species. Also the modelling of feed data is not new. But this time I wanted to approach a different way of modelling, using General Additive Models in a Mixed Model format. In addition, I wanted to split the data, using train and test, to get a better grip of the model.

And the functions for each of the predictors included to model the cumulative feed curve.

So, the above exercise was on the total cumulative feed provided. But, what if we estimate the feed provision by component? As you can see, all the models I create are run on the individual level and predictions are then aggregated to form a cumulative feed curve. That seems to do the trick. It is not perfect, but no model is.

Lets try out how things go for component 1.

On to component 2.

Now, lets add component 3 to the mix, add everything together and see how the predictions do.

Component 1 and 2 together is really not that bad!

So, this was a small example of how to use General Additive Mixed Models for estimating the feed curves of a swine farm. To be continues as this was just a very small beginning!

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com

🔵 Become a Writer

Predicting the Feed Consumption of a Swine Farm

Using Time-Series, Linear Mixed-Models and General Additive Mixed Models in R

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Dr. Marc Jacobs