Predicting the Feed Consumption of a Swine Farm
Using Time-Series, Linear Mixed-Models and General Additive Mixed Models in R
In this blog post I will show you how a particular way of predicting the feed curve of a swine farm using several different types of models. Of course, the road a successful model is not the technique itself, but rather the validity of the research question and its potential implication.
Lets walk through the example. Since it is commercial data, I cannot share the data with you. Then again, it is is always best for the learning exercise if you try to attach the codes and way of working to your own data. Any dataset may suffice, as long as it has repeated observations.
So, lets start by loading the libraries and the data.
DataExplorer::plot_missing(combined4)
df <- combined4[,colSums(is.na(combined4))<nrow(combined4)]
DataExplorer::plot_missing(df)
table(df$ble_id)
table(df$vbf_id)
table(df$vbf_id, df$lcn_id)
Now, within each pen there are different feeding cycles. So, if we want to analyse the feed curve as function of day-in-curve I need to unlist the data further by adding another layer which I call newField. In the end, we have data on farm level, pen level, and newField (run) level.
Lets proceed with the data exploration. Never think this part of the process is a waste of time! It has helped me out many many times, often after building my first set of models when helping me to understand why none of them made sense. You will often find yourself back and forth between plots, models, plots and models.
My first instinct on modelling this type of data was to use time-series. Besides the nature of the data being indeed in a time-series, there are several ways of creating time-series ensembles that can pick of seasonal influences. So, that is what I tried to do first. Pick up any kind of temporal pattern.
Now, lets use that tsibble and let loose several time-series models, including an ensemble.
In the end, after some tweaking, I gave up and resorted to another set of models — mixed models. What I like so much about mixed models is there ability to deal with missing data (under certain assumptions), and being able to model longitudinal data in a nested dataframe. Exactly what we have here.
Lets look at model 4 and model 1 more closely.
The model coefficients in a nice table below.
I already showed you one calibration plot. Lets look at several more. To be honest, calibration plots are not that handy to use and compare models with, It would have been better to show density plots of the residuals across models. However, by the time I got here, I already made up my mind that Linear Mixed Models would not be the model-set to use.
What I did above was add splines to linear mixed models — a procedure I have adopted numerous times across different species. Also the modelling of feed data is not new. But this time I wanted to approach a different way of modelling, using General Additive Models in a Mixed Model format. In addition, I wanted to split the data, using train and test, to get a better grip of the model.
So, the above exercise was on the total cumulative feed provided. But, what if we estimate the feed provision by component? As you can see, all the models I create are run on the individual level and predictions are then aggregated to form a cumulative feed curve. That seems to do the trick. It is not perfect, but no model is.
Lets try out how things go for component 1.
On to component 2.
Now, lets add component 3 to the mix, add everything together and see how the predictions do.
So, this was a small example of how to use General Additive Mixed Models for estimating the feed curves of a swine farm. To be continues as this was just a very small beginning!