A Change in Diet: Fish Food Analysis

Andrew Ingalls
6 min readMar 24, 2022

--

Cavefish Morphology | credited: Alex Keene, P.h.D at Florida State University | FAU Cavefish

1.0 Study Summary

Study Objectives: Raise A. mexicanus on an alternative feed: [1] Compare the growth rate and survivability between control and treatment. [2] Compare the embryo fecundity between control and treatment.

Techniques Used: Data visualization, statistical t-tests, survival analysis, correlation

Tools Used: RStudio

Github: Mysis-Dry Comparison

2.0 Introduction

As part of my work as a Scientist, I am in charge of the R&D within my facility. This is great for me, because not only do I love research but applying it and seeing the impact is one of the coolest things for me.

Using model organisms for research is interesting, but comes with heavy responsibility. The #1 focus for R&D is improving the welfare of the organism, in this case, Astyanax mexicanus. Stress is a major variable when using biological samples. It can impact metabolism, sleep cycles, aggression, and many other functions. Outside of research, it is our job as the caretakers and stewards of these animals to ensure the best welfare possible.

While reading The Laboratory Zebrafish (Lawrence et al., 2015), I was struck by a powerful quote on the diet of animals, specifically fish.

The ‘optimal’ diet for laboratory fish is one that efficiently promotes definition and stability in nutritional profile, biosecurity, and maximal performance (growth, survival, and reproduction).

This had me thinking about our current feeding regime, and one food in particular: Frozen Mysis shrimp. While our animals consume the food with vigor, the lack of a nutritional profile bothered me. It was quite different than our pelleted diet.

Grammar of Tables is a fantastic library for publication-class tables in R

Right out of the gate you can see how much moisture is involved with the Mysis shrimp. When dried out, it may actually be a superior feed, but the stomach can only hold so much, even if that volume is just empty water.

Not only was the nutritional profile worrying, but the act of thawing and feeding out the Mysis was a time-intensive task for the team. This makes a great candidate for R&D!

3.0 Running the Experiment

Circling back to the quote from Lawrence et al. 2015, I decided to use the three main indicators of performance as my dependent variables: Growth, Survival, and Reproduction. If I can improve any of these traits, without sacrificing the others, this is a huge gain to the lab.

While I won’t go into the full details of the experiment here (I want to show some pretty graphs before I lose you), I will provide a short summary for reference.

I split the experiment into two separate phases: Growth & Reproduction. In phase 1, I measured the growth and survival of the control group (Mysis) versus the treatment group (Gemma). In phase 2, I measured the fecundity of the control and treatment groups during breeding events. Fecundity is measured as both the total number of embryos collected as well as the “viable” embryos.

Data from phase 1 were collected roughly every 40 days from 100 to 365 days post-fertilization. After the fish turned one year old, we performed a census and sex identification. These fish were then randomized within their groups and placed into new tanks with equal numbers and ratios. Embryo collection occurred every 4 weeks, on two separate consecutive days during that breeding week.

4.0 Data Wrangling

Data Wrangling | credited: Daniel Lloyd Blunk-Fernández | Unsplash License

4.1 Ingest Data

The act of turning an excel file into a data frame in R is very simple. Using the openxlsxlibrary we are able to read in the individual excel sheets as data frames. This also will convert any dates through detectDates=T.

Before moving any further with the data frames, I always check to ensure correct importation using head()and summary().

4.2 Data Cleaning

Like any dataset, I found misrepresented data types, features I no longer needed for analysis, or in the case of R, factors that needed to be set.

I’ll spare most of the details of the data cleanup, except a few choice steps that may help others using R.

The first is the use of the apply functions, in my case lapply()and sapply().

lapply()takes a function, in my case factor, and applies the function to a list. Basically, it's an interesting version of a loop! Here I use the list of columns I want to turn into a factor, then use lapply()to rewrite one column at a time within the data frame. Super handy if you have a lot of changing to do!

sapply()also takes a function, in this case, class, and applies it to a list, vector, or data frame and then reports it back as a matrix or vector. This is really great if you want to summarize a function in a readable way. This allowed me to summarize the class of each column in a quick and readable fashion.

Guru99 has great explanations and examples for the entire family of apply() functions if you want to learn more.

The second set of functions I want to show you uses the dplyr library, which, if you aren’t familiar with, is critical in data manipulation using R.

Here is the tidyverse resource if you want to learn more.

I use themutate() function to update a list of old tank locations for the experiment. These tanks were moved throughout the experiment and recorded in my workbook to update later. There were two tank movement events that I mutated.

mutate() allows the user to alter, create, or delete columns. It’s a pretty great tool, to be honest. We use it here to alter the Tank column.

I changed the actual values using recode(). This function allows you to replace numbers, factors, or characters with another value, assuming it's the same data type. Usually, this is done by setting one value equal to another in the recode function (ex. recode(df, a = 'apple')). However, we are looking to replace an entire list of strings with another list of strings based on their position within the vector. Unfortunately, recode doesn’t play that nicely. This is where !!!setNames() comes into play. For those of you unfamiliar with the !!! splice operator, here is a great resource: https://rlang.r-lib.org/reference/splice-operator.html

This operator allows us to adjust the function before R processes it. It’s important to know that setNames() creates a ‘named’ list for us. In this case, a named list where the new tank labels are the object and the old tank labels are names assigned to each object. This is very important for understanding how !!! and recode()interact.

Using !!!allows recode() to iterate over a named value within the list we created with setNames()one at a time as it recodes each name with the object. For example, 7.C.4 = ‘7.E.3-4'is the first-named value in our list, this would be placed as the second argument with the recode() function.

df_growth <- df_growth %>%
mutate(Tank = recode(Tank, '7.C.4' = '7.E.3-4'))

The process would then continue with every named object within our list!

Now that all four data frames have been cleaned up, we can begin the exploration!

Come check out Part 2 where I begin the Exploratory Data Analysis.

To entice you to click Part 2, here is a sneak peek of the density distributions for each feed group by age. We are already seeing some interesting insights from this one set of graphs.

There is a similar right-tailed skew in both feed groups, but, the control group (Mysis) has a set of outlier islands from six to twelve months. Additionally, the experimental group (Gemma) has a much more gradual slope for the tail. Find out how I interpret this in Part 2!

Also, if you enjoyed my first ever data-related post, please click the clap, share it on social, or even better…interact with me! I’m here to learn and share, so any and all conversation is welcome.

The views of this blog are my own.

--

--

Andrew Ingalls

The views of the blog are my own and do not reflect that of my employer.