The Different Types of Data: Primary and Secondary

Yujun Jung
CISS AL Big Data
Published in
4 min readOct 19, 2023
Fig. 1: Data Collection (https://www.geeksforgeeks.org/what-is-data-collection-methods-of-collecting-data/).

As the name implies, Big Data analysis requires enormous amounts of data. Something that may appear intuitive to humans, like recommending the next video on YouTube, is a feat that is achieved by ‘The Algorithm’ using video history data from millions of users. Without the necessary data, we wouldn’t be able to start on the ‘analysis’ part and get actual insights. What types of data from where should we use? How can we collect all that data? How do we know what is good and accurate data and what is not?

Fig. 2: Primary Data VS Secondary Data (https://www.javatpoint.com/primary-data-vs-secondary-data).

Data can be categorized by data types such as text, audio, video, numbers, and images, but it can also be categorized into 2 broader categories: primary and secondary data, as seen in Fig 2. Primary data is data that is collected directly by the researchers. For example, when we are doing an experiment to prove a hypothesis, we record data ourselves whether it is a high school physics experiment or a simulated plane wing with a million data points. On the other hand, secondary data is data that was already collected and processed by another entity for their own purposes and then shared later. Primary data can become secondary data depending on the way that it is used. When YouTube collects video history records for advertisement placement on its website, it could be considered primary, but if a different user used that data for other purposes, it could be considered secondary data. This information can contain qualitative data such as books, articles, and research papers, and also quantitative data such as government censuses, weather data, and sales records.

For Big Data applications, we could generate data ourselves, or use data that is already out there or being created live time by countless sources. If we want to collect primary data, we can try methods such as interviewing, surveys, experiments, online tracking and directly tapping into or collecting live data. This has the advantage of being able to collect more specific, accurate (of course, assuming that your data collection process is accurate) and up-to-date data. However, we may not have the resources to conduct large-scale interviews, and surveys and use mass sensors, the topic could be something that is unsuitable for experimentation, and we may not have the authority to access live information. If we want to collect secondary data, we can simply search for a topic online or use web crawling programs to automate the process. This has the advantage of being less time and resource-consuming and gives old data a new purpose by generating new insights from it. However, it can be more difficult to verify the authenticity of some secondary data and it may be outdated compared with fresh primary data.

Fig. 3: data.gov, An Open Dataset Site Managed By The U.S. government

The quality of collected data is directly related to the quality and validity of the result. Even if we collected terabytes of data, it wouldn’t be so useful for us if that data was inaccurate, leading to nonsensical correlations or worse, completely misleading results caused by biases in the data set. For primary data collection, we as the collectors ourselves should set up the process so that it would lead to as little error and bias as possible. For example, a survey could accidentally lean towards a certain side depending on the phrasing. For secondary data collection, since we cannot meddle with the data (that would be data manipulation or nullifying the point of using secondary data), the only way is to make sure the source is valid and was collected during a suitable time period. The easiest way would be using public databases collected and maintained by reputable organizations and governments, like the example of data.gov above.

Works Cited

Longe, Busayo. “What Is Primary Data? + [Examples & Collection Methods].” Formplus, www.formpl.us/blog/primary-data

Longe, Busayo. “What Is Secondary Data? + [Examples, Sources, & Analysis].” Formplus, www.formpl.us/blog/secondary-data

Cote, Catherine. “7 DATA COLLECTION METHODS IN BUSINESS ANALYTICS.” Harvard Business School Online, 2 Dec. 2021, online.hbs.edu/blog/post/data-collection-methods

--

--