Engineering a Pleasant Research Experience

An ever-evolving, personal collection of thoughts and tips to (hopefully) boost the efficiency, odd and morale of product development process.

Yu Liu
15 min readJan 14, 2020

Introduction

Choosing science and engineering as a career means one has to be able to handle setbacks on a daily basis. I used to think this is part of the package and I just needed to accept it and keep pushing forward because failures are the stepping stones to success. Gradually, it occurred to me that a more positive mindset may be better. Instead of passively accepting failure, proactively increasing the chance of success is what a researcher should strive for. I also believe there are ways to increase the odds of success, reduce development costs and meet most schedules.

This blog post is a collection of useful ideas and tools that may prove helpful or even critical to the success of research projects. For some veterans, what’s listed here has become their second nature. For me, this post deserves frequent revisiting, reviewing and even rewriting. I hope this blog post helps myself get a more clear picture of good engineering practices and also invite suggestions from peer researchers.

This discussion is meant for research and development processes in biotech, diagnostic and instrumentation companies. I use research, engineering and investigation interchangeably unless defined explicitly.

Key claim

Here is the key claim of this article: the more “engineering” and the less “scientific” a research project can be, the more likely it will succeed.

Let me elaborate. The dichotomy between engineering and science was discussed in great detail in Michael Polanyi’s Personal Knowledge (about 10 years ago, I even attempted to translate a few pages of the book).

  • An engineering investigation is similar to finding a route on a map with specified origin and destination. No matter how complex it could be, there is a set of rules to follow and a solution is almost always reached sooner or later.
  • A scientific investigation will not succeed without a “jump” or discontinuity in the process. This discontinuity is often called the eureka moment: a sudden and mysterious emergence of the solution seemingly from nowhere.
Michael Polanyi (https://en.wikipedia.org/wiki/Michael_Polanyi#/media/File:Michael_Polanyi.png)

Michael Polanyi’s book was his attempt to elucidate why scientific discovery is so rare and how to cultivate the eureka moment. Engineering was defined as the routine or even mundane part of research activity, in contrast to the excitement associated with a true scientific discovery. Here I am looking at the other side of the coin. We may leverage the predictability of an engineering process to increase the chance of success of a research project.

I am not suggesting we should always take the safer route in research. If we don’t take risks, there would never be any revolutionary products. What is critical is to manage the risk of a project by converting it to a combination of multiple engineering problems and few yet critical scientific explorations.

Engineering mindset: Axiomatic Design

Axiomatic Design, proposed by Prof. Suh of MIT, provides very instructive guidance for an engineering project. Its central dogma is a chain of reasoning from functional requirements to design parameters.

Prof. Nam P. Suh (left, http://web.mit.edu/pccs/people/suh.html) and axiomatic design workflow (right, https://doi.org/10.1016/j.compind.2018.04.009).

Following this framework, the first question one has to ask is what is valuable to customers. Catering to customers’ needs on certain functions, researchers provide solutions. For people with a technical background, it is very natural to spend time and effort to solve a problem without even asking whether the problem is worth solving. This is a habit formed by years of traditional school education — students are supposed to answer whatever questions posed by the teachers. Of course, a rigorous school education is vital to preparing the students for technical challenges in real world projects, but one needs to be aware of this side effect.

This mindset also applies to everyday work. Any research activity must serve a specified purpose well aligned with the project goal. Again, people with a technical background, especially the better ones, are always eager to roll up their sleeve and start generating data. After all, this is what they enjoy and where they find satisfaction. But resources may also be squandered this way.

Even if the goal is clear, how to evaluate it must also be defined before starting the job. In school, teachers take care of the evaluation. In an R&D lab, it may not be so clear who will evaluate it and how to do it. Sometimes one has to reach out to customers for specifications. Unfortunately the customers may not be clear or realistic about evaluation matrices. I had experience with customers interested in our digital PCR instruments but had very vague idea about their assay requirement. I also met customers with unreasonable expectations. Engineers should take this opportunity to educate the customer and reach a consensus before spending resources fulfilling vague or unrealistic requirements.

In addition to formalize a spec-driven engineering practices, axiomatic design also facilitates engineers to employ other best practices. A case in point is orthogonal design theory. It provides the mathematical foundation why a modular design approach is desirable and encourages its adoption on both high level and low level design processes. What’s more, it showed the next best practice when a completely modularize design is not feasible.

Engineering practice: Test-driven, data-driven and interaction-driven

With a clear goal and evaluation method, actual work can finally start. Here I would like to borrow some of the time-tested practices from software industry.

  • Test-driven development:

I was very surprised when first read the book on test driven development by Kent Beck almost 10 years ago. It seems that a software developer is a slave of the test results and lack of initiative. I was not sure at that time if test driven development would be relevant in my field where testing is the downstream of design and prototyping.

Test-driven development by Kent Beck

After more exposure to product development, I changed my mind as discussed in the previous section on axiomatic design. To reiterate, tests should be in the driver’s seat in the development process. Defining a goal and its criterion should be a prerequisite to actual research activities. In software industry, testing is normally handled by a separate team. In instrument research labs, especially at the early prototyping stage, it is usually left to the developer to decide to what extent the prototype is to be tested. This is a psychological trap: the developers essentially fight with themselves when testing their own designs. It is borderline inhumane to expect someone to be critical of themselves all the time and find pleasure in discovering one’s own mistakes. To be honest, many engineers don’t even bother testing their prototype; they expect the downstream users to do it. What I am advocating here is to push testing upstream to be at the same stage or even earlier than development.

Another obstacle to perform proper testing is related to resource. In software industry, automatic testing framework allows the frequent execution of tests with few key strokes. In experimental work, however, manual labor is often needed. Clearly defining test items still help a lot because it streamlines the testing process and allows the researcher to focus more on the analysis of the data instead of how to acquire them. In addition, if tests can be clearly define, they can be delegated to junior team members, making it possible to scale up the team and its output.

  • Data-driven development:

No matter how well the plan has been devised, there are unexpected issues emerging from unexpected corners. Failure analysis and troubleshooting thus consume a large chunk of the resources. For people with a background in scientific investigation, the mode of operation is the seemingly endless cycle of hypothesis and testing. Indeed, this is how science is taught at school: to explain some unexpected phenomenon, a hypothesis was proposed and then validated by testing its prediction. As argued by Polanyi, coming up with the correct hypothesis is irrational sometimes. The more important the discovery turns out to be, the less likely the hypothesis arises from step-by-step reasoning. Following a similar approach, when troubleshooting a failure, an engineer tend to propose a possible root cause. If there is a logical connection between the presumed root cause and the observed failure, many engineers would set out to fix the root cause. The result often comes back negative: the fix doesn’t prevent the failure. One only needs to understand Bayesian statistics to understand why this is the case. Suppose the failure can be explained by the malfunction of a module (M), it doesn’t mean M actually is the culprit. There can be lots of alternative modules responsible for the failure. If one has to rule out every one of the possible causes, it would take forever to develop a product.

Since a large percentage of engineering problems can be solved by using existing methodology, troubleshooting should not involve the mysterious enlightenment indispensable for making scientific breakthrough. To troubleshoot a problem, data should be the driving force. In many cases, one just needs to methodically compare the conditions of successful experiment and failed ones. The factor responsible for the failure will reveal itself.

For example, a few months ago, one of my new prototype instruments didn’t work as expected. Instead of assuming certain failure mode and resolve it, I first compared the successful runs on other instruments and failed runs on the new one. It turns out a few things changed. Some quick control experiments suggested the software might be defective. The software developer, however, didn’t believe it at first, because that’s a very minor update. But data is more convincing and reliable than human perception. It was indeed a software bug. To be honest, I actually suspected another reason and planned a more complicated experiments to study that, but I wouldn’t allow myself to do it until data agree with it. By deferring to data, I’m immune to my own bias.

Another scenario where data instead of theory or mechanism play a more important role arises from practical reasons. In one of my previous companies, the choice of glue was decided after extensive screening. More than ten years has passed since the adoption of that bizarre material in DNA sequencing flow cells, but nobody can explain why it worked. Many such empirical, data-driven engineering decisions contribute to the success of a marketable product. From science textbook and technology articles, people tend to think a disruptive product owes its success to few core technology innovations. But that is far from the whole story. To make a marketable product requires much more than a few successful runs enough to generate data for publication. There are so many knobs to tune and so many ways things can go wrong. In reality, no one has the luxury to investigate the exact mechanism of every part involved especially in a fast-paced startup setting. Even for companies such as Intel, there are process decisions based entirely on data without a clear understanding of the mechanism. In my field too, people have been trying to understand the mechanism and predict the generation of water-in-oil droplets in a microfluidic device. Instead of adopting or working out a model with predicting power, it is more practical to collect empirical data in the setting unique to my own purpose and then do some interpolation or careful extrapolation. Recently a method based on machine learning was used to predict droplet generation process. As we all know, machine learning relies on training data set rather than physical laws. Of course, physical laws still play critical roles especially at a qualitative level and early scientific stage of the investigation, but are not expected to drive all engineering decisions.

Although conceptually straightforward, it is not always easy to follow the direction of data. We all aspire to be great minds that can make wise predictions. In meetings, people may start heated argument against each other and defend their own reasoning. It could become a finger-pointing game between different teams. “The product doesn’t work if stage is not flat enough, so that’s a mechanical problem rather than my biochemistry problem.” Sounds familiar? To cultivate a more productive culture, I believe, every team member should respect data rather than their own biased hypothesis.

  • Interaction-driven development.

Interaction driven development is a fairly new software engineering practice with a long history going back to the legendary Donald Knuth. Recently, the developer of fastai, Jeremy Howard, started to promote this practice with a tool, nbdev, specifically designed for this idea. Interaction-driven development adds a perturbation step to the traditional test-build cycle. A perturbation is essentially tuning some knob in the design in the hope to get some response in the test result. Based on the test result, we can decide what to perturb and by how much in the ensuing steps. It might sound every similar to traditional DOE which is still relevant. What Jeremy is trying to emphasize is that in many engineering projects, it is hard to design the whole suite of experiments, execute them, analyze the data and get a conclusion. Instead, one has to take one small step each time, analyze the result and take the next step based on that result. This is a highly interactive and iterative process.

On a different dimension, we may need to explore different perturbations and thus multiple interactions simultaneously. There is mathematical reasons why it is desirable. Imagine we have three proposals to solve a problem. Unfortunately the chance of success of each proposal is only 50%. It’s hard to choose one, because they are equally bad. If we can afford all three, then the chance of at least one being successful is 1–(0.5)³=87.5%. In another word, it’s very likely we can find a path forward among the three possibilities. People know this for a long time, but it is still amazing to look at it in numbers. Keeping too many options alive, however, may not be manageable, especially when each one option leads to a few other options. The number of combinations will grow exponentially. At the initial stage of the project, such a divergence may be beneficial and necessary, but it has to converge in order to finish.

Exploration of different options to a problem simultaneously (diverging) and settle on one solution (converging).

Engineering mindset — map routing

All the engineering practices discussed above reflect an engineering mindset. Borrowing Polanyi’s analogy, engineering practice is akin to route finding on a map. It then follows that one has to be very clear about where the destination is. It is fine to pivot when necessary as long as the goal remains well-defined. When planning a route on a map, we always keep the destination in mind and avoid unwarranted detours. When facing an obstacle, we analyze the environment and take incremental steps to remove or circumvent it. Sometime we need to send out scouting teams to explore different options, but always want to make a timely yet informed choice between different routes to control how divergent our effort is.

Enabling tools

Living in silicon valley, people are puzzled by how certain industries enjoy exponential growth and ample opportunities while others don’t. All being engineering in nature, some fields follow better practices while others don’t. It is not because people in certain fields are smarter, but because they have access and motivation to use proper tools. Without appropriate tools, only a small percentage of very talented people can follow these practices. A company will find it very hard to scale its practice. With better tools, more people are able to contribute. More job opportunities and a larger candidate pool will then follow.

I started this post from Axiomatic Design. It looks like there are commercial softwares available to enforce the design rules, but for most people Excel is sufficient. The key is to follow the principles.

To manage test-build cycles of a project, one could borrow existing tools from software industries. I personally use Redmine. One could download a prepackaged version that can be installed on a PC. It provides a web interface so that one could take experimental notes remotely. Redmine also has very powerful search function. That means even if records were not curated following strict rules, relevant entries can still be retrieved rapidly. For my company, Zentao is used to plan research activity and manage tests. Project management software from IT industry could be very sophisticated. The learning curve, however, is pretty flat since we only need to use a small subset of its functionality.

Redmine issue tracking (left) and Zentao project management (right)

To facilitate data-driven development, data organization, retrieval and analysis is very important. Decades of practice in software industry has proved that a well-designed relational database can handle very sophisticated datasets. But in a research setting, the data are extremely heterogenous and unpredictable. They could be numbers, texts, images or even proprietary binary data from commercial instrument. Adding to the complexity is that we cannot confidently list all critical data that need careful curation. Sometimes we had to redo the experiments because we didn’t realize some factors were important. In some cases, this problem can be solved by collecting all available data. Organizing and making sense of them is possible but not trivial. For a small company, there is usually no infrastructure maintained by a dedicated group to facilitate the proper curation of data. Still, Excel can be used as a poor man’s database. The key is to adhere to some basic principles of relational database with each column a variable and each row an experiment record. The filter function can then be used to quickly retrieve results from certain experimental conditions for direct comparison. Pivot table is another useful tool to gain insight into data. If one needs to use more advanced analytics tools, such as Python, organizing data this way saves a lot of effort on data cleaning.

Excel definitely is not for even medium amount of data. One could upgrade to easy-to-use database solutions. I personally know two biotech companies using FileMaker to set up their in-house database. With the advent of cloud computing, a lot of providers offer online database, Air Table being one of them. Thanks to their effort, you don’t need to be an SQL guru to manage a database. Further upgrade could be a dedicated knowledge base/data management/project tracking hub. In biotech industry, Benchling has a lot of adopters. The problem with commercial lab information systems is always its lack of flexibility to suit different company’s needs.

A framework for R&D

Another concept, framework, in software engineering is worth borrowing. At the high level, a framework provides an abstract architecture for most software projects. On one hand, it is abstract enough to cover most of the application. On the other hand, it is a concrete guide to help navigate the development process. Is there a framework for experimental research?

The answer is, I believe, yes. Research activities are driven by a specified goal and consists of iterations of experiment design, execution and analysis. Multiple sets of experiments are then examined together to reach a conclusion. This is the abstraction and guide of most if not all research activities. The next question is whether there are tools available to enforce the framework.

A framework for research (top) and its requirements for the tool

In the figure above, I listed some user requirements for an imaginary tool enforcing research framework. In essence, it provides a tool to systematically organize and analyze experimental data. Some of the features are necessary to ensure data are recorded properly in a consistent way, for the benefit of future retrieval and scrutinization. I am not aware of any particular software providing such functionalities, although common tools including Excel can be repurposed. The following figure shows the schema of tables involved in the framework.

Schema of tables involved in the research framwork

If research is performed with the underlying framework in mind, one would be used to wearing different hats in different stages. At the beginning, the researcher serves as a project manager driving the specification of requirements and tests. After one design iteration, the researcher becomes a test engineer trying to find potential flaws of the design. With the free transition in mindset, guided by a clearly defined framework and enforced by lab informatics tools, research could be done with high efficiency and low cost.

Research should be an enjoyable journey if it can be “engineered” to follow proven engineering practices from different industries. Imagine that all customer requirements are covered by the design, every decision in the development is supported by test result, relevant data from a long time ago by another researcher can be quickly retrieved for review. Not only it offers a pleasant work experience, it also contributes to the confidence of researchers.

The biggest challenge to get there is to the lack of proper tools. If running tests involves complicated operation and unstable instrument, people would hesitate to do it more often. If data are buried in the stack of lab notebooks full of sticky notes and handwriting, it would be hard to gain insight.

Still existing tools can be adopted or repurposed to boost the efficiency and success rate in a research field. Over time, I believe, more tools will become available and lead to widespread use in my field, contributing to its long-await take-off.

--

--