Spelunking v. Thunking

Published in

TL;DR Innovation

4 min readFeb 2, 2018

Acquiring Data from Acquired Data

While working on his Ph.D. thesis at Yale’s Artificial Intelligence Laboratory in the mid-1970s, James Meehan developed a LISP program capable of generating Aesop ‘s Fable-style stories from a database of facts and character interaction rules. The program, called TALE-SPIN, was set to the task of creating the fable of the Fox and the Crow, wherein a smooth-talking fox is able to swindle a piece of cheese away from a vain crow. During an initial run, TALE-SPIN produced the following fable; “Once upon a time there was a dishonest fox and a vain crow. One day the crow was sitting in his tree holding a piece of cheese in its beak. The crow became hungry and swallowed the cheese. The End.” Though considered an inappropriate output at the time, this result is an example of artificial intelligence at its best — the construction of an unexpected, but otherwise correct pattern or natural conclusion gleaned from a database of known values.

Thunking is a term used to describe the down-conversion of 32-bit into 16-bit data representation suitable for submission to legacy 16-bit functions. It can also be used to describe the function of analog-to-digital converters (ADCs), which is to “thunk” an analog signal’s infinite resolution down to a digital finite-bit representation. The prolific deployment of sensors and ADCs throughout instruments, processes, and entire enterprises has resulted in the collection, storage, and management of literally mountains of data. The data acquisition (DAQ) community is currently smothering under the weight of its own success.

Developments in data access technology during the 1980s, including relational database management systems (RDBMS), structured query language (SQL), and open database connectivity (ODBC), facilitated orderly data storage and retrieval from large databases. These necessary and valuable tools permit a customer to ask an attendant behind a terminal at the local Home Depot a question, such as, “In which isle can I find 1-inch paintbrushes and how much do they cost?”

The 1990s brought advances in data warehousing, decision support, and online analytic processing (OLAP), whose mission, according to the industry’s OLAP Council, is to “slice, dice, or rotate” data into any view requested by the analyst. This affords the Home Depot purchasing manager the ability to plot historic paintbrush inventory as a function of time and geographic store location to assist in the determination of next month’s order from the supplier.

The next step in this evolution is the development of algorithms capable of autonomously searching or “mining” the databases for patterns and trends that have not been considered by the analyst. For example, an intelligent algorithm may inform the Home Depot manager “73% of paintbrush purchases were accompanied by the purchase of masking tape. Consider displaying this item in close proximity of the paintbrushes.” Data mining is the “killer application” long sought by an artificial intelligence (Al) community that has been quietly developing techniques that heretofore have enjoyed little popular fanfare in the business, manufacturing, and scientific communities.

Data mining integrates seamlessly with RDBMS and OLAP servers to produce the desired analyses. Initial mining algorithms are based on Al techniques such as decision trees, clustering, neural networks, and genetic algorithms. Decision trees are branched structures representing sets of decisions used to generate rules for the classification of new, unclassified data. Clustering is an expectation method that uses iterative refinement to group data into neighborhoods of data exhibiting similar, predictable characteristics. Neural networks utilize non-linear predictive models that learn through training and resemble biological neural networks in structure.

Genetic algorithms are optimization techniques based on the concepts of evolution that utilize the processes of genetic combination, mutation, and natural selection. As the demand and funding for data mining algorithm development increases, the strengths of each technique may hybridize or lead to the exploration of lesser-known AT research.

The Human Genome Project (HGP) has with deservedly great fanfare completed a “working draft” of the approximately 30,000 genes in human DNA and has set to the task of sequencing the 3 billion chemical base pairs it contains. The instrumental analysis and DAQ technology utilized by the HGP have made this goal a reality, however the burden of leveraging this data into useful information including the causes and cures for cancer and genetic disease, sits squarely on the shoulders of nascent data mining technology. On a less grandiose scale, it is easy to envision the Al component of a not-so-distant HPLC system greeting me with “Bill, I have detected a 1.4% increase in peak tailing in the week leading up to the restocking of solvent over the past quarter. You may wish to verify the purity of the solvent supply or consider on-site purification.”

This material originally appeared as a Contributed Editorial in Scientific Computing and Instrumentation 18:6 May 2001, pg. 16.

William L. Weaver is an Associate Professor in the Department of Integrated Science, Business, and Technology at La Salle University in Philadelphia, PA USA. He holds a B.S. Degree with Double Majors in Chemistry and Physics and earned his Ph.D. in Analytical Chemistry with expertise in Ultrafast LASER Spectroscopy. He teaches, writes, and speaks on the application of Systems Thinking to the development of New Products and Innovation.

Spelunking v. Thunking

Written by William L. Weaver