Decoding the NYC School Admission Lottery Numbers

Amelie Marian
Algorithms in the Wild
13 min readJun 7, 2021

This is the first part of a series of posts on NYC HS admissions. You can read the following parts at:
Part 2.
Gaining Insights from the NYC School Admission Lottery Numbers
Part 3.
NYC High School Chances of Admission Predictions
Results. Results from the
2022, 2023, and 2024 NYC School Admission Lottery Surveys.
You can also read a more in-depth account of the study from a paper I presented at the 2023 ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT’23):
Algorithmic Transparency and Accountability through Crowdsourcing: A Study of the NYC School Admission Lottery, or watch the conference presentation

The NYC Department of Education uses lotteries to assign students to schools but is reluctant to provide much information on how the lottery numbers are drawn, or on the odds of being assigned to various schools. Using data crowdsourced from families, I try to shed some light on the process.

This year (2021 admission season), the NYC Department of Education (DOE) moved to a lottery-based approach for admissions to all middle schools and some selective high schools that were previously using academic screens and auditions. Lotteries are not new in NYC school admissions, the DOE has used them for years for preK and Kindergarten admissions, and to break ties when there are more qualified applicants than seats at a school. But the widespread use of lotteries this year has raised multiple questions from parents: How are the lottery numbers drawn? Is it possible to see my student’s lottery number? What are the odds of gaining admission to my preferred school?

Incredibly, at first, the DOE declined to provide families with any information on their lottery numbers, or much details about the process. This is all the more surprising given Mayor De Blasio stated commitment to transparency and accountability in the city’s automated decision systems, of which the yearly school admission matching algorithm is a prime example. (Mayor De Blasio’s efforts to improve algorithmic transparency ultimately failed.) The DOE first told parents that the numbers were “truly random,” but could not be shared with families because they consisted of long strings of numbers and letters that they — the families — could not understand. In addition, the DOE claimed that the numbers would not give information to families as to their relative chances in their preferred schools, as these chances depend on the choices of other applicants. It wasn’t until a parent group launched a campaign asking parents to request their student’s lottery numbers under the Freedom of Information Law (FOIL) — as it turns out you are legally entitled to see any information that is used to decide your child’s admission to public schools, that the DOE relented and agreed to provide the lottery numbers, upon request.

In this post, I will explain how the lottery works using crowdsourced data provided by parents who shared their student’s lottery number, applied school choices, and outcomes, with me through confidential surveys.

The NYC school matching algorithm (part 1): the general algorithm

The lottery numbers are just one part of the NYC school assignment system, which uses a student-proposing deferred acceptance algorithm, the “NYC School Matching Algorithm, designed by a team of renowned economists, including a Nobel prize winner, almost 20 years ago.

The NYC school matching algorithm (part 2): how the matching works with lotteries, priorities, and set-asides

The algorithm (which I describe in detail in the two linked videos) optimizes the outcome for the students based on its inputs: students’ choices, schools’ rankings of students, and system priorities (zones, continuing students,…). It has been tweaked over the years to include more system priorities, such as set-asides for low-income students and various admission priorities.

One number per student

Each student is assigned a single random lottery number that is used to determine their admission priority for schools that admit their students through total or partial lotteries, and to break ties for schools that admit their students through priority groupings, or batches, if there are more students in a batch desiring to gain admission than there are available seats at the school for that batch.

The decision to use the same lottery number for all the schools, rather than having a separate lottery number for each school, is one that often puzzles and infuriates parents who believe the system is unfair to students with an unlucky draw and that students would have a better chance if they could draw one number per school.

Discussion on the differences between a single lottery number or one per school.

In fact, using a single lottery number for all schools is something that the DOE got right. Counterintuitively, the literature shows that using the same number for all schools does not penalize students, instead, it slightly increases their chances of matching to their top choice. In a single number system, students with a good lottery number are more likely to be assigned to their preferred school, but if schools hold separate lotteries, to get their top school, students have to receive a (lower odds) good lottery number at their preferred school. The chance of not getting an offer is roughly the same, as illustrated in the framed example.

Families can now request their lottery number by simple email to the DOE, replies usually take about a week. Some families received information from the DOE to help them interpret their numbers, but some just received, without any explanations, a lottery number that looked like:

6ba829b3-fa99-4752-a931-2119fb0c1fea

This is a hexadecimal number with 32 characters. Hexadecimals are base 16 (they use digits 0–9, then a-f) numbers that are often used in programming because computers encode everything in binary (bits are base 2: 0 or 1), and 16 is a power of 2. A hexadecimal character can be represented in 4 bits. Hexadecimals can be converted to decimals easily, but there is no need to do so to understand and compare lottery numbers.

How to decipher the Hexadecimal Lottery Numbers

tl;dr: To get a rough idea of how “good” your lottery number is, just look at the first character of the hexadecimal string. If it is a number, your lottery number is in roughly the first two-thirds of all ordered lottery numbers, the lower the better. If it is a letter, your lottery number is in the last third. A lottery number starting with 0–3 is in the first quartile, one starting with c-f in the last.

These 32-characters numbers are in a format called UUID and are likely generated using a random number generator (see below for details) that creates uniformly distributed numbers. The numbers are compared left to right, in increasing order: from 0 to f (0–9 then a-f). This means that the first character is enough to give you a rough idea of how good your number is: a lottery number that starts with 0 is in the first 1/16th (6.25%), one that starts with F in the last 1/16th.

First two characters of lottery number to percentile

To differentiate further, we can look at the first two characters: a lottery number that starts with ‘00’ is in the first 256th (0.4%), ‘01’ the 2nd 256th, and so on. The first two characters are sufficient to identify where your number is expected to stand in comparison to other numbers, with a 0.4% precision. The attached table shows how to convert a lottery number’s first two characters to a percentile.

So why are the numbers so long? As mentioned above, they are UUID (Universally Unique Identifiers), Version 4 (you can identify the version by looking at the first character of the third block of characters — the 13th character). UUID V4 are used to generate random unique identifiers, a random version-4 UUID will have a total of 2¹²² (5.3 undecillion) possible numbers. There are several existing UUID V4 number generators available, and it makes sense for the programmers of the NYC DOE lottery to have used an existing — and well-tested — random number generator library function, such as the python one used to generate 10 UUIDs in the example below.

Randomly generated UUID V4 numbers

What does not make as much sense is for the DOE to provide the full numbers to families. For a decision process to be transparent and accountable, it needs to be simply explainable. The 32-character numbers look unnecessarily complex because they are. Most of the characters are just noise and have no impact on the student’s admission outcome, only the first 8 characters will ever be used. It would be much easier for families to understand their chances if the DOE were providing them with information in this format:

Your lottery number is ‘fa8058a5’, it is in the 98th percentile
Your lottery number is ‘9cf4f694’, it is in the 61.3th percentile
Your lottery number is ‘1f5124de’, it is in the 12.5th percentile

Some families who requested their lottery numbers by email did receive a detailed explanation with percentile information, along with the numbers. Many just received the 32-character lottery number without additional details. The fact that the lottery numbers and their explanations are not given to all applicants, but only upon request, creates another source of inequity, where some in-the-know families are given more information than others. There is no reason why the lottery number information shouldn’t be available to all on the MySchools portal.
***update: for the 2022 admission season, the DOE has provided families with their lottery number on their MySchools account***

Are the first 8 characters really enough? Each cohort of applicants has historically been between 60,000 and 80,000 thousand. In practice, the implementation will never go past the first eight characters to compare students' lottery numbers: the first 8 characters of the lottery number provide 16⁸, or over 4 billion, possible combinations. Interestingly, similar to the birthday paradox, there is actually a 50% probability that, citywide, two students in a given year will share the same first 8 characters. However, the numbers are not used as identifiers but as tie-breakers, so one duplicate every other year seems acceptable and unlikely to have any real impact (the implementation can still use the longer numbers if preferred). If it isn’t acceptable, then the first 12 characters would guarantee a 99.999% probability that there are no duplicates.

The truth is that families don’t care much about the actual numbers, rather they want an idea of their student’s chances, and guarantees that the numbers were generated fairly. The use of overly long and opaque numbers is raising more questions than they answer: parents on internet boards are convinced that the DOE is tipping the scales by favoring students from some schools, or demographics, over others; that the numbers are encoding all types of information used in the match. They use anecdotal data to confirm their fears. The lack of transparency is the main cause of mistrust. If the DOE had clearly stated how the numbers were generated (maybe sharing which library function is used), and explained how the numbers are processed from the beginning, families would have more trust in the system.

So, what are the odds?

One of the reasons for the DOE to originally refuse to give lottery numbers was that, on their own, the lottery numbers were not very informative because the chances of gaining admission at a school depend on the lottery numbers of the other students applying to the school. That is not correct: statistically, the distribution of the lottery numbers of all applicants to any school will follow the same uniform distribution as the citywide pool, so 1/16th of all applicants at a school are expected to have a lottery number starting with ‘0’, etc. The chances of gaining admission to a lottery-based school depend on the number of available seats, the number of applicants, and how they ranked the school. The interplay between students’ choice rankings and selectivity of schools is something that has repeatedly not been made clear by the DOE or education reporters: a school that is ranked 1st by a set of applicants will have different odds of admission from one that is ranked 12th by the same set of applicants, due to the mechanism of the matching algorithm. How far each school will go down their list is not as simple as dividing the number of seats by the number of applicants, and depends on how applicants ranked the school.

To identify what were the odds of being admitted to various MS and HS in the 2021 admission cycle, I am running two crowdsourcing surveys (MS survey, HS survey). The project is still in the data-gathering phase, and there are some delays as participants wait for the DOE to fulfill their lottery number request. I will report some preliminary data analysis below.

Methodology

Families were asked to enter their lottery number, the school to which they matched, and the schools they ranked higher that their match. For HS, they were also given the option to provide their rank at each school (available in the HS offer letter). They were also asked to enter information on which priority group their student qualified for.

For lottery-based schools (all MS and some HS), I identified the “worst” lottery number that received an offer to the school, and the “best” that didn’t. (To preserve privacy, I only report the first two characters of lottery numbers). I then computed the corresponding percentile to get a lower and upper bound of the odds of matching to a given school. (Note that these odds represent your chances of matching to a given school or to a school you ranked higher on your list.)

MS Survey Preliminary Results

*** Selected results, updated 06/08/21***
***update 02/27/22: additional results in
follow-up post***

Lottery odds for selected MS — 2021 admission cycle

The first table shows results for selected middle schools, for students who do not qualify for any set-aside. The estimated odds ranges can be quite large; they will be refined as more people participate in the surveys. There were a few surprising discoveries: in D2, the odds of getting accepted to Clinton, a popular MS, are at least 18.8%, possibly higher. In D3, the odds of being accepted to Booker T. Washington are higher than 85%.

HS Survey Preliminary Results

*** Selected results, updated 06/08/21***
***update 02/27/22: additional results in
follow-up post***

Lottery odds for selected HS — 2021 admission cycle

The second table shows results for a few selected high schools that ran lotteries to select students that met a GPA threshold. Many of these schools also had seats set aside for low-income (identified as FRL) applicants, and students in this group could therefore be admitted with a different lottery cutoff. I report both cutoffs when data is available.

Note that the odds of getting into schools are not independent events: a (non-FRL) student applying to Eleanor Roosevelt, Lab, Baruch, and Columbia Secondary has (at most, probably less) a 16.8% chance of gaining admission to any one of the four schools, which one will depend on how the student ordered the schools on his application and on the actual cutoffs. Odds for FRL students are better, with at least a 25% chance of gaining admission at Lab, but still far from a guarantee (less than 33%).

Rank cutoffs (rounded) for selected HS — 2021 admission cycle

The third table gives the lowest accepted rank (rounded to the lower tenth to preserve privacy) and the highest rejected rank (rounded to the higher tenth) for some selected HS that admitted students based on a composite score. The corresponding composite score cutoffs for many of these schools are very high: at least 97 or 98 for Nest, Clinton, Millennium Manhattan, or Hunter Science.

Transparency

The HS match this year was reported to have a large number of students not getting matched to any school on their list, especially in one Manhattan district (D2). This was in large part due to diversity- and pandemic-driven changes in admission policy, which greatly reduced the odds of admissions to a number of schools for these students. The DOE did not communicate well the impact of the decrease in odds to families. This coupled with the difficulties of researching new schools during a pandemic meant that D2 students continued applying to the same set of schools as their counterparts had in the past. The outcome was predictable: fewer D2 students getting admitted to former D2 priority schools, resulting in more applicants for the Manhattan screened schools, whose cutoffs would then become higher, and more students (from multiple districts) not getting any of their choices.

The HS admission process in NYC is complex, but it could be less stress-inducing if steps were taken to make it less opaque:

  • Clearly communicate the lottery odds and grade cutoffs: while these will vary from year to year, historical data can help families assess their chances. The limited data available in the school directory is given without context and is often misleading. The MySchools student portal could provide, for each school on a student’s list, the odds for that specific student to get an offer based on historical patterns; if the admission policy has been changed this should be explained as well. If providing this information for each of the 12 schools on the student’s list is too cumbersome, then at the very least students should be given the estimated probability of them not receiving a match to any school on their list, and given the option to update their list to improve their odds.
  • Provide families with their lottery numbers: at the very least all families should be given their lottery numbers in their offer letters. A better policy would be to provide the lottery numbers at the time of application to help families focus their research on schools to which they have a chance of getting assigned. Researching schools is a time- and resource-consuming process that taxes both families and individual schools (who have to run countless open houses and field questions), and favors students who have the social capital to navigate the complex process. By providing families with their lottery draw ahead of time, families and guidance counselors can help students research and select schools for which they have a reasonable chance of gaining admission. ***update: for the 2022 admission season, the DOE has provided families with their lottery number on their MySchools account***
  • Show the default school assignment at the time of the application: one of the most inequitable policies in NYC MS and HS school admission is that some students have a guaranteed spot at a school through continuing student priority or zoning, while others may be randomly assigned to any school in the system if they do not match to any school on their list. Establishing a known default school for each student would help a great deal in removing stress and uncertainty.

Thank you to all who participated in the surveys.

--

--

Algorithms in the Wild
Algorithms in the Wild

Published in Algorithms in the Wild

Advocating for transparent and accountable decision processes. Studying the behaviors of algorithms when they get out of the minds of designers and take on a life of their own.

Amelie Marian
Amelie Marian

Written by Amelie Marian

CS Professor at Rutgers — I like to explain algorithms and advocate for accountable decision processes.

Responses (10)