Building the Foundation: Data Collection within the Supreme Court

Cathy Tu
CISS AL Big Data
Published in
6 min readOct 24, 2023

Let’s take a look at the graphic in Figure 1 below. To build a solid house, you must have the fundamental materials like bricks and mortar. Likewise, when embarking on the journey of Big Data analysis, a crucial prerequisite is an effective and high-quality dataset. That is where the significance of data collection comes in.

Fig. 1: Data collection makes up the foundation of Big Data analysis (GM Vector, 2022).

Defining Big Data Collection

Before exploring data collection in specific fields of study, let’s discuss the distinction between statistical data collection, which is a more familiar form of data collection, and Big Data collection. In statistics, a common approach to data collection is random sampling, which enables researchers to create representative samples (“Simple random sample,” 2023). This often involves obtaining information from a defined source like a structured survey or an experiment. On the other hand, Big Data collection is a method of gathering and measuring vast amounts of information from diverse sources to obtain a comprehensive and accurate picture (Pratt, 2022).

The data in Big Data collection may not be as organized and precise as it would typically be for statistical analyses. However, this “messiness” is embraced in Big Data. To extend the analogy used at the beginning of this article, when constructing a brick house, we observe that the walls consist of bricks with different appearances, each possessing unique jagged lines. Similarly, the data collected for Big Data analysis may not neatly conform to a uniform structure, as research may involve diverse aspects or units of measurement.

Variables in Big Data

There are still similarities between statistics and Big Data, though — both possess independent and dependent variables. There is a key distinction between the two types of variables, as shown in Figure 2. Let’s define the two:

  1. Independent variables: An independent variable is “a variable that is changed or controlled in a [study] in order to test the effects it will have on the dependent variable” (Data Science Team, 2019). In statistics, there is one independent variable per relationship, and as many constants are kept as possible to ensure that the relationship between the independent variable and dependent variables is not tainted by variables that are not considered. However, in Big Data, because n equals all, there are times when constants are not kept.
  2. Dependent variables: As the name suggests, a dependent variable is a variable “whose [value] depends on [that of the] independent variable. It is the variable that is being tested and measured” instead of formulated in the first place (Singh, 2023).
Fig. 2: Independent vs. dependent variables (Craiker, 2022).

Let’s take my project as an example: In my study, I aim to explore how the gender demographics of the Supreme Court of the United States (SCOTUS) justices’ offspring play a role in the justices’ voting and decision-making in gender-relevant cases. My data will range throughout the years 1970 to 1990, as many major SCOTUS decisions on women’s rights were established during the specified time period (“Timeline of Major Supreme Court Decisions on Women’s Rights,” 2017). The variables of the data that will be collected include the nature of the SCOTUS cases, e.g., whether or not they were gender-relevant and which specific gender issues they are associated with; the names of the nine justices voting on each gender-relevant case; the genders of the justices themselves; how each justice voted on each gender-relevant case; the number of sons each justice had; and the number of daughters each justice had.

Although the key independent variable of the study is the number of sons and daughters each justice has, it is impossible to ensure that all the justices have the exact same background in terms of other aspects, such as education level, financials, race, age, etc. We call this intersectionality. Legal scholar Kimberlé Crenshaw defines intersectionality as “a metaphor for understanding the ways that multiple forms of inequality or disadvantage sometimes compound themselves and create obstacles that often are not understood among conventional ways of thinking” (Crenshaw, 1989). This is portrayed in Figure 3. The application of intersectionality in this specific study would be acknowledging that every judge’s identity is multidimensional.

Fig. 3: Intersectionality of a human (“What does intersectional feminism actually mean,” 2018).

The dependent variable of the study is how each justice votes on each gender-relevant case. This would be associated with the messiness of Big Data because “how supportive of women’s rights” each decision was would be qualitative, not quantitative. There is no number that can be plastered on such findings.

The intersectional and messy nature of this project’s data makes it especially suitable for Big Data rather than statistics.

Types of Data Sources

Fig. 4: Primary data collection through surveys and questionnaires (Mahmutovic, 2021).

Data collection comes in two forms:

  1. Primary data: Primary data refers to raw data gathered by the researchers themselves. Survey data would be a classic example, as shown in Figure 4, as researchers often design questionnaires for participants of the study to fill out (Longe, 2023). In the example in Figure 5, the original piece of artwork would be the primary source.
  2. Secondary data: Secondary data, on the other hand, refers to “data that has already been collected through primary sources and made readily available for researchers to use for their own research” (Longe, 2023). In the example in Figure 5, the analysis or discussion of the piece of artwork in the book would be a secondary source.
Fig. 5: Example of primary vs. secondary sources (“Primary vs. secondary sources,” 2023).

The information necessary for my project will fully rely on secondary data, as the SCOTUS justices and decisions would all come from past real-world cases. Unfortunately, there is no readily available dataset that provides all the needed information in one place. That means information must be collected from a variety of sources to provide a full picture. The information is to be manually stored in an Excel spreadsheet.

In conclusion, data collection is a crucial step in conducting Big Data analysis, providing the foundation for meaningful insights and conclusions. While statistical data collection involves random sampling and structured surveys, Big Data collection involves gathering vast amounts of information from diverse sources to create a comprehensive and accurate picture. Every Big Data analysis project involves independent and dependent variables. In the context of SCOTUS, my project aims to explore the influence of the gender demographics of justices’ offspring on their decision-making in gender-relevant cases. The data collected will span from 1970 to 1990, a period when significant SCOTUS decisions on women’s rights were established. Secondary data from various sources will be collected, including information on the nature of SCOTUS cases, the justices involved, their voting records, and the number of sons and daughters they had. The data collection process will involve thorough research and compilation from multiple sources to ensure accuracy and completeness.

References

About Oyez. (2023). Oyez. https://www.oyez.org/about

Craiker, K. N. (2022). Dependent variable: Definition and examples. ProWritingAid. https://prowritingaid.com/dependent-variable

Data Science Team. (2019, December 29). Independent and Dependent Variables. Data Science. https://datascience.eu/mathematics-statistics/what-is-the-difference-between-independent-and-dependent-variables/

GM Vector. (2022, July 26). Construction worker laying bricks for construction engineers day concept flat vector illustration isolated. Freepik. https://www.freepik.com/premium-vector/construction-worker-laying-bricks-construction-engineers-day-concept-flat-vector-illustration-isolated_29883390.htm

Longe, B. (2020, July 1). What is primary data? + [Examples & collection methods]. Formplus. https://www.formpl.us/blog/primary-data

Longe, B. (2020, January 15). What is secondary data? + [Examples, sources, & analysis]. Formplus. https://www.formpl.us/blog/secondary-data

Mahmutovic, J. (2021, February 26). 3 different types of data collection: Survey vs questionnaire vs poll. SurveyLegend. https://www.surveylegend.com/customer-insight/survey-questionnaire-poll/

1970–1971 term. (2023). Oyez. https://www.oyez.org/cases/1970

Pratt, M. K. (2022). How big data collection works: Process, challenges, techniques. Data Management; TechTarget. https://www.techtarget.com/searchdatamanagement/feature/Big-data-collection-processes-challenges-and-best-practices

Primary vs. secondary sources: how to distinguish them. (2023). Paperpile. https://paperpile.com/g/primary-vs-secondary-sources/

Simple random sample: advantages and disadvantages. (2023). Investopedia. https://www.investopedia.com/ask/answers/042815/what-are-disadvantages-using-simple-random-sample-approximate-larger-population.asp

Singh, V. (2020). Difference Between independent and dependent variables. Shiksha Online. https://www.shiksha.com/online-courses/articles/difference-between-independent-and-dependent-variables/

Timeline of major Supreme Court decisions on women’s rights. (2023). American Civil Liberties Union. https://www.investopedia.com/ask/answers/042815/what-are-disadvantages-using-simple-random-sample-approximate-larger-population.asp

What does intersectional feminism actually mean? (2018, May 11). International Women’s Development Agency. https://iwda.org.au/what-does-intersectional-feminism-actually-mean/

--

--