Data Science: A Retrospective

Anyone familiar with the field of Data Science is no doubt aware that the utilization of statistics and the models derived from it is deeply rooted within the field. Data Science developed through statistics and since then continues to evolve and incorporate various emerging practices and methodologies including, but not limited to, the Internet of Things, DataOps, Machine Learning, and even Artificial Intelligence.

When vast quantities of data first became readily available, businesses primed themselves to extensively collect and store it. This unstoppable flood of new information is what we have come to understand as Big Data and when businesses sought to enhance decision making and drive revenue they called upon the emerging discipline of Data Science to utilize this Big Data to assist them in this endeavor and as corporations welcomed this revolution many other domains such as medicine, the social sciences, and engineering followed suit.

As we find ourselves immersed in this new and exciting domain, I find that it is integral to provide a brief but thorough history of the field to the class for as noted scientist Carl Sagan once said: “You have to know the past to understand the present.”

The timeline will cover the field’s humble beginnings up until the point it is widely acknowledged as one of most desirable professions in the current job market.

The Timeline:

When relaying the history of Data Science, historians usually begin with the year 1962 when mathematician John W. Tukey published his seminal paper “The Future of Data Analysis”. There he correctly predicted a paradigm shift in the world of statistics at a time when the field began to merge with the budding prospects that computers had begun to show. For the first time in human history Statistical results were released in mere hours as opposed to the days or even weeks it would take if done manually. “For a long time, I thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and doubt… I have come to feel that my central interest is in data analysis… Data analysis, and the parts of statistics which adhere to it, must…take on the characteristics of science rather than those of mathematics… data analysis is intrinsically an empirical science…” (Tukey, 1962)

Yet his vision’s materialization came slowly and the next significant development would occur more than a decade later when in 1975 Peter Naur published his “Concise Survey of Computer Methods”, a survey of current trends in data processing and analysis that were being utilized in an eclectic number of fields even at the time. In this text, the term “Data Science” was utilized frequently and freely for the first time to refer to this emerging discipline. (Naur, 1975)

The first tangible manifestation of Data Science would first be realized in 1977 where The International Association for Statistical Computing (IASC) was established as unit of the International Statistics Institute (ISI). Its mission was to forge a link between traditional statistical procedure, contemporary computer capabilities, and the knowledge of experts in various domains to convert harvested data into practical and applicable knowledge and information.

With the growing need to equip budding Data Scientists with the technical know-how to meet the market’s growing demands, Gregory Piatetsky-Shapiro organized and chaired the first Knowledge Discovery in Databases (KDD) workshop. More would follow in 1991, 1993, and 1994 where finally in 1995 it became the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining and an official Special Interest Group (SIG) within the Association for Computing Machinery (ACM) the world’s largest scientific and educational computing society. This forum is now widely considered the most influential and significant in data mining and knowledge discovery research (Schauerte, 2014).

The momentum it had been building for decades would finally allow it to breakthrough into the mainstream in 1994 where the cover story on Business Week titled “Database Marketing” finally brought this emerging trend to the public conscious while it was still in the nascent stage of its eventual wide-spread industry use. (Jonathan Berry, 1994).

The term “Data Science” was unquestionable heavily utilized by the mid-90s but the term was used interchangeable with others such as “Datology”. That all changed when in 1996 the International Federation of Classification Societies (IFCS) held their biennial conference in Kobe, Japan and for the first time included the term Data Science in the title of the conference (“Data science, classification, and related methods”) officially recognizing the term (Hayashi, 1998).

For the first time, a definite plan was laid out in 2001 by William S. Cleveland to train future Data Scientists and meet the needs of the budding market. The action plan that was titled “Data Science: An Action Plan for Expanding the Technical Areas of the field of Statistics” described methods by which university departments could provide the technical experience necessary for students to train to become proficient data analysts. To complement developing research six areas of study were specified to university departments. This was not only addressed to academics and academia but for corporate and government research.(Clevland, 2001)

The International Council for Science: Committee on Data for Science and Technology then began publishing the Data Science Journal in 2002. The publication was the first to exclusively focus on various data-related issues such as the effective publication of data systems, legal issues, and practical data applications.

In January 2009 a report by the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council titled Harnessing the Power of Digital Data for Science and Society was published calling for the national promotion and recognition of Data Science as a legitimate discipline that was deemed necessary for the future of the scientific enterprise. “The nation needs to identify and promote the emergence of new disciplines and specialist’s expert in addressing the complex and dynamic challenges of digital preservation, sustained access, reuse and repurposing of data. Many disciplines are seeing the emergence of a new type of data science and management expert, accomplished in the computer, information, and data sciences arenas and in another domain science. These individuals are key to the current and future success of the scientific enterprise. However, these individuals often receive little recognition for their contributions and have limited career paths.” (“Harnessing the Power of Digital Data for Science and Society”, 2009)

Data Science received the greatest boost in its entire history when in 2011 job listings for Data Scientists increased by a monumental 15,000%, validating the profession’s significance in the modern world (Kelly, 2013).

Finally in September 2012 Tom Davenport and D.J. Patil published “Data Scientist: The Sexiest Job of the 21st Century” in the Harvard Business Review acknowledging the journey Data Science had gone through from a niche developing academic field to one of the most coveted disciplines of the new millennium (Davenport and Patil, 2012).

References:

Cleveland, William S. “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics” International Statistical Review / Revue Internationale de Statistique, vol. 69, no. 1, 2001, p. 21., doi:10.2307/1403527.

Davenport, Thomas H. and Patil, D.J. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review, 26 May 2017, hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century.

“Harnessing the Power of Digital Data for Science and Society: Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council” Publisher NCO NITRD, Jan. 2009, catalog.data.gov/dataset/harnessing-the-power-of-digital-data-for-science-and-society-report-of-the-interagency-wor.

Hayashi, Chikio. “Data science, classification, and related methods: proceedings of the Fifth Conference of the International Federation of Classification Societies (IFCS-96)”, Kobe, Japan, March 27–30, 1996. Springer-Verlag, 1998. Jonathan Berry. “Database Marketing.” Bloomberg.com, Bloomberg, 5 Sept. 1994, www.bloomberg.com/news/articles/1994-09-04/database-marketing.

Kelly, Meghan. “Data scientists needed: Why this career is exploding right now.” VentureBeat, VentureBeat, 11 Nov. 2013, venturebeat.com/2013/11/11/data-scientists-needed/. Peter Naur. “Concise survey of computer methods. New York: Petrocelli Books, 397 p. (1975).” Journal of the American Society for Information Science, vol. 27, no. 2, 1976, pp. 125–126., doi:10.1002/asi.4630270213

Schauerte, Boris. “Conference Rankswww.conferenceranks.com/visualization/msar2014.html?field=Data Mining&visualization=Bars.

Tukey, John W. “The Future of Data Analysis” The Annals of Mathematical Statistics, vol. 33, no. 1, 1962, pp. 1–67., doi:10.1214/aoms/1177704711.