Data Collection: An Analysis of Primary and Secondary Data Collection

Winston Jian
CISS AL Big Data
Published in
4 min readDec 15, 2021
Difference Between Primary Data and Secondary Data (Javatpoint, 2021)

“In the age of information, ignorance is a choice.” — Donny Miller

Bytes, kilobytes, megabytes, gigabytes, terabytes, petabytes, exabytes, zettabytes, yottabytes… Each one scaling above the previous by a factor of 1000, data in the modern world has grown so tremendous in volume that individuals can no longer comprehend the magnitude of data, let alone process such data.

However, data has only become more valuable to us over the years as data scientists figure out paths to analyze them. Like the universe constantly expanding at a faster rate, data is accumulating exponentially. And like the universe full of unknown matters and properties, data are messy and full of erroneous measurements. Nevertheless, just like how we endeavored to explore outer space, the galaxy Milky Way, and the universe far beyond, data scientists have developed intricate methods to gather data, discovering patterns and order amid apparent chaos.

Although considered an early stage of data analytics, data collection is a complex process whereby the researchers collect information from all available sources to answer their research question or topic. In doing so, they identify the type and source of data they consider as best suited for answering their research problem and evaluating their outcomes. Data collection is never an easy task, challenging the researcher to find suitable datasets that will assess all the parameters for the research project. For example, modeling solar photovoltaic cells would require multiple independent variables such as location, date, time, latitude, longitude, altitude, month, hour, season, humidity, ambient temperature, the power output of the solar panels.

Due to the complexity of data collection, researchers have recognized the two major ways to collect data, namely primary source data collection and secondary source data collection.

In primary data collection, researchers set up variables that perfectly fit their research question. This first-hand data is unique as no one else has done the same research. Furthermore, it identifies all the researchers’ variables for their analysis. Researchers may collect primary data through surveys, key informant interviews, observation of program implementation, or even large-scale collection that ranges from social media posts to online websites. Secondary data collection, however, relies on the work of previous researchers. Using the publicly available datasets, researchers may advance the research in a specific field or provide new insights into the dataset by analyzing a different facet. These may come from databases and data archives such as Data.gov and Kaggle.

Screenshot of Kaggle.com (Kaggle, 2021)

Nevertheless, both primary and secondary data collection face limitations and serve different advantages and disadvantages. Whereas primary data collection is costly and time-consuming, secondary data collection may lack originality.

In cases when researchers test multiple variables, primary sources may strengthen the research as the research defines all the variables. In contrast, secondary data sources may lack clear documentation of variable meanings.

In cases when the researcher attempts to study a large population or long duration of time, however, secondary data will employ a very sophisticated research method and contain large amounts of data points that show change over time. In contrast, primary data collection may often lack the sophistication in methods or the time frame for the time series of analysis.

Compared to primary data sources, secondary data sources are more complex and intricate than primary datasets. When researchers develop their unique variables and conduct experiments for primary research, they gather structured or semi-structured data. The former is data that can be stored, processed, and accessed through a fixed format, and the latter is data with a fixed structure but lacks organization or vice versa. When researchers gather secondary data, however, the most common types of data they find are semi-structured data, which requires extensive and laborious data wrangling to organize into a usable dataset. When researchers incorporate secondary data sources, however, they may encounter issues in which the variables of the rows and columns are not well defined, causing confusion and uncertainty.

Ultimately, many challenges reside within primary and secondary data research, with each presenting certain benefits and potential setbacks. Nevertheless, researchers have employed a common strategy in which the secondary data collection and analysis form the basis for their preliminary analysis. This literature review allows them to advance from what had been done. For instance, medical records may consist of data across time that are incomprehensibly detailed. Consequently, these fields often present the dilemma of an abundance of data but a lack of organization and structured data for secondary analysis.

Primary and secondary analysis has stood the test of time as different yet equally valuable forms of research. Some researchers prefer primary analysis for the primary data’s originality and specificity, some prefer secondary analysis for the secondary data’s sophistication and cost-effectiveness, and some prefer merging those two for the optimum combination. Regardless, any research must find its balance between originality and sophistication, variable specificity and topic generalization, and data quantity and structure. In the age of information, researchers may gain invaluable insights into data, but a great challenge lies in the first step — making sense of chaos through data collection.

References

Business Jargons. (2016, July 9). What are secondary data collection methods? Business Jargons. Retrieved October 14, 2021, from https://businessjargons.com/secondary-data-collection-methods.html.

Business Jargons. (2016, July 9). What is data collection? definition and meaning. Business Jargons. Retrieved October 14, 2021, from https://businessjargons.com/data-collection.html.

Formplus Blog. (2020, January 15). What is secondary data? + [examples, sources, & analysis]. Formplus. Retrieved October 14, 2021, from https://www.formpl.us/blog/secondary-data.

Question Pro. (2021, July 19). Secondary research- definition, methods and examples. QuestionPro. Retrieved October 14, 2021, from https://www.questionpro.com/blog/secondary-research/.

--

--