The problematic relationship between coding and bioengineering research labs

Julien Rechenmann
Nerd For Tech
Published in
9 min readJun 1, 2021

Research in biotechnology and bioengineering has allowed humans to make significant progress in understanding the human body and developing medical solutions.

As the amount of knowledge has progressed over the years, experiments have become more complex, requiring a broader range of tools and involving recording data from multiple sources. For example, in neuro-engineering, it is now common to combine invasive recording of neuron activity (surgically inserted electrodes) with precise movement recording (camera) while stimulating with optogenetics (laser targeting genetically modified neurons) and sound. While this complexity is impressive and has led to amazing discoveries, biotechnology research projects remain mostly a single-human job. Each project is imagined, developed, and managed mainly by a single researcher. Publication writing and reviewing processes usually rely on teamwork, but data collection and analysis do not. It is not uncommon for biotechnology researchers to develop new pieces of hardware to run their experiments or create an entire interactive dashboard to visualize the results of their analysis.

If we applied this approach to other research fields, it would mean that each nuclear fusion researcher had to assemble their own tokamak to be able to do their research. In astrophysics, each Ph.D. student would have to send a Mars rover. While that would be impressive, it might not be the most optimized way to do research.

The impact of this increase in complexity is the sheer amount of skills required of biotech researchers. They must know programming, data analysis, statistics, machine learning, animal training and surgery, image processing, as well as developing writing and presentation skills. Most of these require high-level coding skills, which are extremely difficult for a graduate student to acquire. They can hone their coding abilities if they wish to adapt to these new demands, but the environment doesn’t push them to do this. Indeed, with the pressure to publish (the infamous “publish or perish”), researchers might prefer to start collecting data and, when possible, acquire more domain-specific knowledge. Therefore, coding skills are at best average, and the best coding practices are usually not applied. Because the entire research relies on the code they write, this lack of expertise is deleterious to laboratories. As experiments get more complex in more and more laboratories, everything researchers do is rooted in their coding skills: from experiments and data acquisition to results visualization and modeling.

The coding pain

While some researchers in biotechnologies might have graduated from software engineering, most of them haven’t. Some might have followed a course or two about programming or data analysis and are familiar with the coding mindset. Ultimately, almost none of them know about the practices of software development at the professional level. This lack of knowledge and experience in programming has a clear impact on research: low efficiency and sometimes even low quality.

Coding is time-consuming

As a researcher myself, I observed that, in many laboratories, researchers spend most of their time coding. Either the amount of code to write is too much for a single person, or the researcher’s low coding level slows down the software development. I have met several dedicated researchers who produce high-quality papers which are impacted by their coding skills. Many other researchers might easily relate to these stories.

They might have spent more than a year developing a solution for running experiments. Although the end result was impressive and well optimized with few or no bugs, the same solution could have been developed in 3 months by coding experts. So the researchers could have spent more time on other essential research aspects: reading more literature, better defining the hypotheses, or collecting more data.

With the tremendous amount of data that researchers can now gather with each experiment, the data analysis step often requires a very long processing time. In some cases, it can take one or several days to process a single subject’s recordings. During this period, the computer is unusable, limiting the number of tasks that can be performed. This is all because most researchers don’t have the time or knowledge to optimize code performance.

In some cases, low coding abilities make it impossible for researchers to perform more advanced data analysis, which drastically reduces research quality and precludes possible discoveries. The creative thinking which researchers have developed becomes limited by their knowledge and skills.

During my career, I have observed several situations like these, which result in frustration and lack of motivation, and sometimes even lead brilliant researchers to quit research.

In different circumstances, I have also encountered cases where researchers with different skill sets successfully team up.

An outstanding example was the partnership between a young researcher with a biomedical background and low coding skills, and another researcher from a field where advanced coding was required. The young researcher was very knowledgeable in her field and took the time to prepare her experiment by defining the hypotheses and parameters she wanted to observe. The other researcher, who had a strong background in software engineering, designed the experiment’s code and the main structure of the data analysis. Finally, the young researcher was able to smoothly run the experiments, achieving excellent results. Teaming up in research can generate frustration, especially given the common idea that researchers should work entirely independently. Nevertheless, this works towards more efficiency. In the above scenario, the developer certainly did not want to waste time coding something useless — which is often the case when researchers don’t have time to think thoroughly about the hypotheses and analysis required before starting their experiments. This pushed the experimenter to properly define the solution’s objectives and requirements in advance.

Bugs in research

The software development path, be it in research or in industry, is paved with bugs. The software industry spent decades improving its management, organization, and development techniques to deliver more reliable solutions. Most researchers are not trained to follow these guidelines, and the words “clean code” cause either fear or gales of laughter. The expressions “unit tests” or “quality assurance” are not part of the researcher’s vocabulary.

Let’s have a look at another situation that you might have encountered in your career. A data analysis solution has been developed internally and is used every day by the entire laboratory. Over the years, generations of researchers accumulated bugs, some minor, some major, leading periodically to incoherent results. Recreating what other researchers have done became more and more difficult. Some members decided to rewrite their own solution, introducing other bugs and shielding themselves from their team members’ help. After a few more years, every new researcher has to develop their own tool with their own batch of bugs. It is common to find in the same lab several large chunks of code or tools with the same purpose. The global lack of experimental and processing pipelines implies that optimization and productivity are at their lowest point, and research quality suffers as a result.

Animal experimentation and clinical trials are two situations where bugs can be inconvenient. The code quality in clinical trials is not different from any of the previously mentioned situations. Medical companies cannot apply custom-made software for running clinical trials while researchers are given much more freedom — with the risks that come with it. Most of the time, clinical experiments involving an automatic interaction with the subject are developed by researchers who don’t know how to code in a professional environment. This means that, most likely, the code has not been thoroughly tested or validated, and there might not be any safeguards.

How do we get rid of those bugs? How can we be sure that the figures generated have not been tweaked by some latent error? How do we ensure the safety of the subjects during clinical trials? The data science industry is still trying to answer this question, while the software engineering world has already solved it.

What coding level to expect from researchers?

Should researchers spend more time learning how to code? Should they be well versed in object-oriented programming, clean code, optimization, GPU parallel processing, and data structure pros and cons? Should they write unit tests, do code reviews, and add automatic integration and deployment of their software solution? My passion for coding screams “Yes!”, but calling for such a high standard of coding means researchers have to spend less time developing other skill sets that are just as important, if not more so. Of course, this depends on the research you are leading. Laboratories taking a more mathematical or model-based approach would require a higher level of coding skills and methodology. However, there are some tools and best practices that should be applied in every biotech laboratory. From my experience, with a bit of self-discipline lab members will quickly learn how to use them, and this won’t take time away from their research. It will soon be the new norm in your laboratory, improving the overall research quality.

Coding Tools

A versioning tool such as Git coupled with GitHub/GitLab is rarely used in research. It facilitates handling several versions of the code, tracking issues, collaboration on the same project, and knowledge transfer between generations of researchers. This is a must-have as soon as your lab produces a single line of code. It will also help researchers using third-party libraries and solutions. Research is slowly moving forward with open source, sharing both code and data. Git and GitLab tools are helping laboratories that would like to open up their work to other researchers around the world.

Clean code methodology

The famous clean code methodology is the Holy Book of every developer. It helps coders to develop large solutions and maintain them over long periods. The method starts with the basics: naming convention, code structure, comments, etc. Following the guidelines will create an easy-to-read code, almost like a novel. It will make bugs easier to spot, facilitate code knowledge transfer in the laboratory, and generally speed up the development. This is the minimum that every researcher should know. The rest of the clean code method would be reserved for more mathematical/mode-based research laboratories, as it delves into object-oriented programming, unit testing, Liskov principle, and so on. For more information about clean code, I highly recommend the book Clean Code by Robert C. Martin.

Code review

Here comes the most controversial advice: code review. Researchers don’t want to review their peers’ code as it’s already a pain to write their own. We all know the results. Publications are filled with figures that are often inaccurate owing to bugs in the data analysis code. The software industry solved this issue by imposing several layers of quality control. One of these is code reviews. Code reviews represent a big part of a developer’s work time as they have to review all of their teammates’ code. This supposedly time-consuming task scares researchers as they are already short of time. Nevertheless, the researcher’s coding rate is not as high as a software developer’s, so the reviews will be scarce and short. In contrast, the advantages are plenty: code reviews will speed up new members’ training time, drastically reduce the number of bugs in the code, and facilitate knowledge transfer among lab members.

Externalizing development

As a researcher, how many mice are you ready to sacrifice to test and debug your code? As a subject, would you trust the code created by a graduate student who has less than a year of programming experience? Developments for clinical trials and animal experiments can be done in collaboration with experts. It will reduce the number of animals wasted and the number of failed recordings. At first, externalizing development might seem more expensive than developing the solution internally. However, the same software/data analysis pipeline created by experienced developers will be developed faster, and the quality will always be higher. In the long run, better quality of code will help researchers focus on what matters, therefore reducing costs.

Conclusion

The current bioengineering and biotechnology research fields require more diverse skills than ever before. It is unrealistic to expect new students or experienced researchers to be masters in all those fields. The motto “publish or perish” pushes individual researchers to work more and more. It is high time all entities from the research infrastructure follow. Research laboratories must adapt how they lead their research projects: working in teams, setting up a project management structure, and requesting occasional help from external sources such as consultants for specific implementations or training. Hiring specialized software engineers would be too expensive for a single laboratory, but faculties could provide a software engineering team for more significant or collaborative projects. This team would support the researcher from the beginning of their project. At the same time, to allow things to really change, research grants should integrate management and administrative costs of external help into their budgets. Several institutions are already working on this issue by providing data science and engineering centers to work closely with laboratories. Some laboratories have also changed their organization to promote teamwork and make more optimized use of the skills and knowledge of every lab member.

--

--

Julien Rechenmann
Nerd For Tech

Data science consultant for research laboratories and startups.