Longitudinal Studies

We recently launched the second wave of a study internally named, The New Account Creator Longitudinal Study or “Project NAC.” We’ve been following roughly 3,200 new users since September 2015, checking in periodically to ask:

  • How they plan to or use GitHub (purpose for signing up).
  • What their biggest challenge at the moment is (technical or social).
  • How they are/aren’t solving that challenge.
  • How and where they discover and learn about new developer tools.
  • What are the considerations that shape their Git/GitHub workflow.

A single picture vs. a baby book

Longitudinal studies are complex and, at times, difficult to complete. A person in the cohort may respond to survey #1 of the series, but not #2, and then pick back up at #3 or disappear entirely. As you can imagine this makes assembling the data a messy process –we feel accomplishment each time we get the data back into order! For all the work, longitudinal studies are worth the effort, yielding incredibly rich data on experiences and from processes that evolve over time vs. looking only at a cross-sectional view of data (think a single picture vs. a baby book).

This post discusses longitudinal studies, and we hope you’ll learn more about:

  • The difference between a longitudinal and cross-sectional study.
  • Why longitudinal studies are valuable and worth the investment.
  • How a longitudinal study is put together (why are there so few of them?!).

Newcomers & git for version control

We know from our studies (and even personal experience) that it takes time for new users to gain confidence with git and learn and integrate GitHub into their workflows. We also know that we lose more people who sign up for a GitHub account along the way than those who succeed. The NAC study is an effort to help us understand the most current journey from sign up to established “tenured” user. TL:DR our community of users is quickly changing and this is the best way for us to study the change –but, it’s also slow-going.

What is a longitudinal study?

Longitudinal studies are a type of observational research study where subjects are observed over a long period and data is repeatedly collected at intervals. Unlike experimental designs (variance testing) in observational studies the researcher does not control subjects exposure to any particular treatment. Instead, observational studies investigate natural relationships as they occur, without any interference from the researcher.

We tend to opt for cross-sectional studies to help us move quickly. Studying people over time is a special treat for researchers (my Halloween candy). If you’ve ever been sucked in by a “where are they now” magazine headline, you know the intuitive appeal of finding out how things turned out. The enduring popularity of the UK-based Up series, the profound insights from the Harvard Grant Study (following a cohort of 268 men for 75 years as they died and aged into their 90s), and the critical policy findings from the Panel Study on Income Dynamics demonstrate the cultural and scientific power of this type of study.

However, these studies require an investment of time, in which researchers are highly patient, deliberate, and willing to age alongside their subjects.

Understanding Cross-sectional vs. Longitudinal Studies

Observational studies may be either cross-sectional or longitudinal. Our research team makes use of both of these research designs, which are useful for studying different types of relationships.

Cross-sectional studies are (relatively) straightforward: choose a population to study, create a survey instrument, select and recruit a representative sample, collect and analyze the data. An example of this would be our annual GitHub Tools & Workflows survey.

In contrast, longitudinal studies are complex and resource intensive: Go through all of the steps above, for each of multiple waves of data collection.

Among longitudinal studies, there are two general types:

- Cohort studies select a population based on some shared experience (e.g. birth year, exposure to a vaccine, etc), and repeatedly sample from this population to study how the experience impacts the subjects over time.

- Panel studies begin with a single sample and study that same sample over time at repeated intervals. This is a particularly resource intensive method, because the same subjects must be located again, contacted, and convinced to continue participating in the study. At each wave, the recontact rate is typically around 10%.


In addition to the difficulty of collecting the data, analysis of panel studies is particularly complicated because repeated observations of the same individual violate assumptions about the independence of observations that are necessary for many statistical methods, so this type of data requires specialized models. Additionally, attrition rates between waves must be examined for systematic bias, since higher attrition rates among certain types of respondents will introduce bias into the final data set.

GitHub’s Project “NAC”

The New Account Creators study is a hybrid cohort panel study of new account creators. The study population includes (non-spammy, email-verified) users who created their accounts within 90 days of the study launched in September 2015. We evenly sampled users who had been active on GitHub since opening their account (whom we call “Creators”) and those who had not yet taken any action on the site (“Explorers”).

From our initial sample of 90,000 accounts, we received about 3,200 responses. Last week, we recontacted most of the initial sample. From the set of respondents to the first wave, between 12 and 19% participated in the second wave, as well as over 2,300 users who did not respond to the first wave.

With new users we’re looking at them in three phases for the next 12-months:

  1. The beginning (inception) — When newcomers sign up, poke around, & experiment. It’s harder to find them after they leave (we rely upon email outreach).
  2. The messy-but-sticky middle — When newcomers are regularly active; in GitHub where the workflows & workarounds happen (they imprint onto & are imprinted by the product experience).
  3. The end — Where newcomers have abandoned the product; GitHub “inactives.”

Longitudinal Studies are rare in our industry (both of apps and our efforts are babies ourselves), but they are essential if we want to assemble and learn about human lives –the effect that an application like GitHub has on skills advancement and career narratives for example.

This article was written as a partnership. Research is better when you get to discover new things about the world with someone who has a different perspective — someone who is willing to challenge you. I’m grateful to the people I worked with at GitHub. We were a small-but-mighty team.