Primary vs. Secondary Sources in Data Collection

Madelyn Tran
CISS AL Big Data
Published in
5 min readNov 2, 2023

Edward Tufte, a Yale computer scientist, once said, “There are only two industries that call their customers “users”: illegal drugs and software.” In a world in which data has become the new oil, it’s not hard to see why Tufte made the comparison. Within Big Data, data is information and that information holds invaluable insights into the world around us. Hence, many companies will turn to mine data from wherever they can — whether it’s from sensors, experiments, or even their customers.

Fig 1: Data collection (GeeksforGeeks, 2023).

In these examples, companies are participating in data collection — the collection of both qualitative and quantitative data to be able to answer hypotheses and gain insights, as shown in Figure 1 (Business Jargons, n.d). The use of data collection combined with Big Data provides companies, and researchers, with incredibly valuable opportunities to be able gain insights into the world around them that would have never been possible. One of the key defining characteristics of Big Data lies in the idea of n = all. This essentially means that Big Data utilizes large amounts of data, so large that it’s impossible for humans and our everyday technology to be able to comprehend. It’s in this that data collection really shines, as it acts as the source for all the information that Big Data draws upon for its insights.

Fig. 2: Siri (Kelly, 2022).

Given the value of data collection to many big companies, it’s no wonder that corporations such as Google invest so much money into data collection. However, this comes at a cost. Despite data being all around us, the sheer volume of data that’s collected causes their expenditures to skyrocket. Furthermore, data collection comes with far more serious consequences than its operating costs. For instance, data collection can come with major privacy concerns. For instance, for each request you ask Siri, as seen in Figure 2, all the processed conversation transcripts will be sent to Apple (Apple, 2023). Despite all the concerns, data collection holds many benefits for society and trumps its consequences.

Fig 3: Data collection methods (Semler, 2021).

There are two types of data collection methods: primary and secondary data collection. Primary data collection relies on the researchers or companies to be able to gather their own research to be able to get their own data. On the other hand, secondary data draws upon data that others have already collected. These differences are illustrated in Figure 3. Both options have their advantages and disadvantages, as well as different collection methods.

Starting with primary data collection, this method is far more labor and cost-intensive in comparison to secondary methods. Data collection methods for this will typically include interviews, surveys, and experiments, however, it isn’t limited to these methods (World Bank, n.d). For instance, with datafication, the process of collecting and transforming information around us into quantifiable data (Mapsted, 2023), everything around us can be turned into data to be used. This method of data collection seems like it offers endless possibilities — but given the constraints of our world, that’s not the case. Primary data collection is an extremely expensive process for companies and individuals. Not only is it expensive to be able to have the resources to collect the data, but also processing that data requires machines that have astronomically high costs. Of course, primary data collection offers the complete benefit of allowing researchers to curate their data sets to have the categories needed to be able to answer their questions. Additionally, in this collection process, researchers are able to get a big-picture idea of what the data will show and, therefore, have a better understanding of the data and its insights.

Secondary data collection, on the other hand, is much less costly than primary data collection but comes with separate disadvantages and advantages. Secondary data collection lies in the collection of data that are already collected by others. Typically this comes from public sources, such as government or organizational data. However, due to this, researchers can’t access the raw data and only have the final product that was published by said organizations. As such, it’s harder for researchers to be able to curate data without getting rid of points — which goes against the purpose of Big Data. Despite these disadvantages, secondary collection holds numerous advantages. For instance, since the data is already collected and organized, it’s far less costly and labor-intensive in comparison to primary data collection. As such, completing research projects comes with far fewer risks in committing to a topic that may not have any possible insights.

Overall, data collection has many different collection methods with different uses. As such, when completing different types of research for Big Data projects, researchers may choose to complete a combination of the two different methods to be able to complete a comprehensive collection of the data used in their project.

References

Bhat, A. (2023, August 8). Secondary research: Definition, methods & examples. QuestionPro. https://www.questionpro.com/blog/secondary-research/

GeeksforGeeks. (2023, July 14). What is data collection?: Methods of collecting data. GeeksforGeeks. https://www.geeksforgeeks.org/what-is-data-collection-methods-of-collecting-data/

Kelly, S. M. (2022, November 7). Why Apple may be working on a “hey Siri” change. CNN Business. https://www.cnn.com/2022/11/07/tech/apple-hey-siri-change-trnd/index.html

Legal — ask Siri, dictation & privacy. Apple Legal. (n.d.). https://www.apple.com/legal/privacy/data/en/ask-siri-dictation/.

M, M. (2016a, July 9). What are secondary data collection methods? Business Jargons. https://businessjargons.com/secondary-data-collection-methods.html

M, M. (2016b, July 9). What is data collection? definition and meaning. Business Jargons. https://businessjargons.com/data-collection.html

Semler, E. (2021, December 10). Exploring data collection and its forms. Medium. https://medium.com/ciss-al-big-data/exploring-data-collection-and-its-forms-f97a50c180e9

What is datafication? understanding the concept and its impact. Mapsted Blog. (n.d.). https://mapsted.com/blog/what-is-datafication

--

--