Model and Data Fusion

Thoughts on the efficient frontier for analysis.

Edmund W. Schuster
5 min readMay 31, 2020

The problem of effectively fusing data and math models into real world applications has been around for decades. The tremendous advances in computing power and math modeling are of lessor value in the absence of a solution.

I view the inability to quickly match models to the proper data and applications as being the major constraint for the field of data science. A recent survey gives painful evidence for the lack of practical artificial intelligence (A.I.) applications, a subset of all models. The biggest barriers, “are leaders who don’t appreciate its value, and the difficulty of finding business problems in these firms for which AI might be useful.[1]

The number of practical A.I. applications of value is far lower than the investors, via market capitalization for startups, believe to be fair. The true path is in ramming the modeling process toward the efficient frontier. In this way, rational modeling applications will thrive. The hype will be less, and success more. The fusion of models and data using networks is the direction.

A further issue surrounds the volume of data growth. This first arose during the .com boom. At Smart World 2004, Sunil Gupta of SAP paraphrased Samuel Taylor Coleridge by saying “data, data everywhere but not a byte to use.[2]” During that time the amount of data growth was as much as 40–60 % for many organizations.[3] In 2004 alone, shipments of data storage devices equaled four times the space needed to store every word ever spoken during the entire course of human history.[4]

Circa 2015 ushered in the Fourth Industrial Revolution. Hence the need for data and model integration will become even more critical. However, despite advances in A.I., machine learning (ML), and other technologies, United States total factor productivity growth since 2005 has been static.[5] This does not reflect similar historical effects from advances like electricity, transportation, mass production, and computing.

Some argue that the boom has yet to come.[6] Evidence exists in this regard.[7] However, the certainty of the future is in great doubt.

The premise is that large data sets from stationary processes will hold useful patterns to aid decision-making across the board. As well, all business benefits from the general-purpose technology of A.I. and ML. These patterns, if they exist, are unidentifiable without modeling. This is the basic philosophy.

The drawback is the need to access massive amounts of data. Only companies in the Fortune 500 are big enough to have enough data access. Likewise, if a change occurs there needs to be new data. This can take a long time to get. If no new data exists, bad decisions happen. For example,

“Many pre-pandemic models for many business functions are no longer useful; some might even point businesses in the wrong direction.[8]

This is hard to predict and makes the financial return far less certain.

Pointing out to a wider audience what many have known for a long time in the modeling community takes courage. This is especially so in the context of A.I. and ML. An exact assessment is as follows:

“Top algorithms are left flat-footed when data they’re trained on no longer represents the world we live in.[9]

— Gary Marcus, New York University

The same is true for all learning.

Compounding matters, a large amount of unstructured data exists that are extremely hard to organize using current computer-based approaches. Unstructured data includes images, text, emails, and engineering designs. In all these cases, object representation requires more than a serial number stored in a database or a URL. It is only through using words in a machine understandable way that descriptions of these objects can become useful for search, organizational, and analytical purposes.

Dealing with the increasing volumes of structured and unstructured data will require new standards and information architectures. Integration and communication between hardware, software, and business entities must improve. This becomes important as companies look to overcome the barriers that limit the seamless transfer of data, internal and external to the firm.

The basics are always important to keep in mind. This offers a sound footing in any engineering discipline.

Math models are simple representations involving characteristics of the real world that are determined to be important.[10] Models highlight facts and interests at hand. They depict only part of reality. Some go as far as to say that the human thought process is a specialized model of the real world.[11]

Math models are especially useful in making sense of complex situations. Beyond finding prominent issues and serving as an aid to communication, models supply the greatest value in suggesting explanations for observed events.

Though math models are extremely useful in providing insight, the process of building models often lacks productivity because development seldom follows a linear path[12] [13], and because separate natural, mathematical, and computer representations are needed for managers, model builders, and computer programmers.[14] This increases the necessity for detailed interfacing that tends to inhibit seamless sharing of models within a network. As a result, implementing math models is complex, time consuming, and requires advanced technical capabilities and infrastructure.

Specialists often develop models internally within business organizations or academia. This is an application specific job and the same model building technique must be re-invented afresh for each new situation. Though internal development can lead to significant breakthroughs, this approach depends on trial and error to find what works in practice, combined with intuition and an extensive knowledge of technical publications. The primary motivation behind M Language, which began as a research project at the MIT Data Center program, was to make modeling less of a custom task.

The need in industry is for a computer language to describe and share models across the Internet and to interoperate data, increasing the Clockspeed [15] of modeling. This is the best direction for data science as compared to a single-minded obsession on A.I. and ML. Like other models, these approaches are powerful in specialized applications. But they suffer from the need for substantial amounts of data. Stable patterns are not certain to exist. If something changes, all earlier data become useless.

REFERENCES

[1] Mims, C., 2020. AI isn’t magical and won’t help you reopen your business. The Wall Street Journal, May 20.

[2] Gupta, S., 2004, “Empowering a consumer driven demand chain with global data Synchronization,” Smart World 2004 — Semantic Modeling: Cambridge, MA, December 8.

[3] Park, A., 2004, “Can emc find growth beyond hardware?” BusinessWeek, November 1.

[4] Lyons, D., 2004, “Too much data,” Forbes, December 13.

[5] Gordon, R.J., 2017. The Rise and Fall of American Growth: The U.S. Standard of Living since the Civil War (The Princeton Economic History of the Western World) (p. 628). Princeton University Press. Kindle Edition.

[6] Brynjolfsson, E. and A. McAfee, 2011. Race Against the Machine. Digital Frontier Press.

[7] Brynjolfsson, E., D. Rock, and C. Syverson, 2018. The productivity j curve: how intangibles complement general purpose technologies. NBER Working Paper Series (25,148).

[8] Ibid, ref. 1.

[9] Ibid, ref. 1.

[10] Attributed to Professor Gregor M. Reinhard of Gannon University, GH-501 Pubic Policy Process, taught in 1984.

[11] Forrester, J.W., 1961. Industrial Dynamics, Waltham, MA: Pegasus Communications.

[12] Willemain, T.R., 1994. Insights on modeling from a dozen experts. Operations Research 42:2, pp. 213–222.

[13] Willemain, T.R., 1995. Model formulation: what experts think about and when. Operations Research 43:6, pp. 916–932.

[14] Geoffrion, A.M., 1987. An Introduction to structured modeling. Management Science 33:5, pp. 547–588.

[15] Fine, C.H. (1998), Clockspeed. Reading, MA: Perseus Books.

--

--

Edmund W. Schuster

Fabric for Dispersed Knowledge (TM): dedicated to the best in analysis and insight - schuster.us.com .