Of Mice and Salesmen

William L. Weaver
TL;DR Innovation
Published in
4 min readFeb 16, 2018

Biological Algorithms for Data Analysis

Happy New Year and welcome to this column sporting the new title “Data Acquisition and Analysis.” Well, it’s not exactly a brand-new title. The old title simply sprouted a tail of “and Analysis.” The column has evolved in response to advances in the measurement and automation community.

Photo by Kapa65 on Pixabay

In June of 2000, I wrote about data acquisition evolving into “informatics,” a catchphrase currently contending for top billing in the Hackneyed Quarterly. Rival laboratory information management systems (LIMS), struggling through adolescence, are beginning to develop the industry standards exhibited by more adult technologies. LIMS permit us to collect, archive, search and recall oceans of values from disparate instruments distributed across all four corners of the earth; serving as a life-boat to laboratories drowning in their own data. Their prominence and pervasiveness are growing rapidly.

However, confusion arises when we define “growth.” The business world is currently ensnared by the biological “plant” definition of growth meaning “larger.” Simply adding 10 pounds of holiday corpulence does not mean I have grown as a person. The market makes the identical error when it punishes the stock of a company for not achieving its projected increase in sales volume.

Unless the sole purpose of a laboratory is to deliver diagnostic testing results, simply boosting the number of terabytes in its databases does not signify real growth. True success should be measured in terms of growth meaning “maturity, ability, and understanding.” The current malaise in the information technology (IT) sector is a product of its own success at the “larger, faster” game. Corporations have spent a king’s ransom to recall instantly decade-old customer purchases and inventory records. All the while, the game has matured from simply knowing these facts and figures into a palpable need for the data interpretation necessary to leverage them for the enhanced provision of goods and services. The name of the game has evolved from “who, what, where and when” into “how and why.”

Although the biological plant-growth definition is inadequate, we can look to other areas of biology for more appropriate solutions. The concept of natural evolution is an excellent paradigm for growth. The same techniques used by scientists for the production of genetically modified smarter, stronger and more disease-resistant mice can be adapted to develop superior data analysis methods.

One such analysis problem elegantly solved by these “genetic algorithms” (GAs) is the traveling salesman problem (TSP), wherein the optimal solution is the shortest round trip route through a list of cities that must be visited only once. In the GA approach to the TSP, a population of possible routes is generated randomly and each route containing its list of cities is treated as a “chromosome.” The total round trip distance predicted by each chromosome is calculated, and those having shorter routes are considered to have greater fitness. The fittest chromosomes are “mated” by crossbreeding the best features of each solution to produce subsequent generations. Random mutations applied to the chromosomes suppress weak features resulting from successive inbreeding among the elite solutions. Unlike true biological genetics, new generations can be spawned in an instant, producing an optimal solution in short order.

A second biological paradigm for growth is learning. Artificial neural networks (ANNs) model our current understanding of the physiology of neurons as a network of summing amplifiers connected by weighted inputs. During the process of supervised learning, the weights of the ANN are adjusted until the desired outputs are produced in response to known inputs. The most appropriate pattern of weights, resembling the optimal city route of the TSP, is not known a priori. The trained ANN can be used to make intelligent decisions or “dissected” to study the solution it discovered. An emerging ANN technique is “unsupervised learning” where the ANN is simply taught to recognize correlations and given a database of raw values as input. In this way, new relationships are discovered and presented to researchers for further investigation.

A third growth paradigm is societal cooperation, currently modeled as “swarm intelligence” (SI). A colony of simple, independent algorithmic units programmed to mimic the behavior of communal organisms such as insects, birds or bacteria is released into a database where the units cooperatively navigate the data to discover emergent global patterns. As with ANNs, the desired analysis patterns can be pre-programmed or discovered autonomously. Biological computing paradigms such as these are products of artificial intelligence (AI), a research area dominated by the academic, defense and gaming communities. Laboratory informatics and business IT analysis needs may be the killer applications sought by AI to propel it into the mainstream.

This material originally appeared as a Contributed Editorial in Scientific Computing and Instrumentation 21:2 January 2004, pg. 16.

William L. Weaver is an Associate Professor in the Department of Integrated Science, Business, and Technology at La Salle University in Philadelphia, PA USA. He holds a B.S. Degree with Double Majors in Chemistry and Physics and earned his Ph.D. in Analytical Chemistry with expertise in Ultrafast LASER Spectroscopy. He teaches, writes, and speaks on the application of Systems Thinking to the development of New Products and Innovation.

--

--

William L. Weaver
TL;DR Innovation

Explorer. Scouting the Adjacent Possible. Associate Professor of Integrated Science, Business, and Technology La Salle University, Philadelphia, PA, USA