Gaining Insights from the NYC School Admission Lottery Numbers

Published in

Algorithms in the Wild

20 min readFeb 27, 2022

*** Results for the 2022 admission cycle are published here. Results for the 2023 admission cycle are published here. Results for the 2024 admission cycle are published here.***

***This is a follow-up post to “Decoding the NYC School Admission Lottery Numbers.” If you want to learn about how students are matched to schools, and how to understand the long hexadecimal random numbers that the NYC DOE assigns to students, you should start by reading the first post. This follow-up post gives more details on how to use that information when applying to schools. This post was written in early 2022 and discusses the 2021 cycle results as well as the 2022 admission cycle. The NYC DOE has once again changed the screened schools' admission rules for the 2023 cycles, which will have an impact on admission odds at each school.***

Last Spring I wrote a post to explain the NYC school matching algorithm and how to interpret the lottery numbers that the NYC Department of Education uses to assign students to schools; I also provided preliminary results from a crowdsourcing experiment to understand the odds of admission at various schools.

Since then there have been several changes to the admission process, which impact the odds of admission. In addition, after requiring parents to jump through hoops to access their child’s lottery number via Freedom of Information Law (FOIL) requests, a campaign by a parent advocacy group, and resolutions from several elected parent councils, the DOE finally relented and as of mid-February 2022 has included students’ random lottery numbers in the MySchools application portal (in the student’s profile).

In this post, I will try to answer some questions I have received from parents on the lottery process and share information that has come to light since my last post. I will also provide more results from last year’s crowdsourcing experiment.

Before I proceed, I want to note that I am not affiliated with the NYC DOE and that the following is based on my understanding of the process and on data provided by parents. I encourage you to discuss your application with your guidance counselor to identify the best strategy for your student.

Why is it useful to know my student’s lottery number?

This is the most common question: if the matching algorithm is strategy-proof, as the DOE says, and as I explained in my videos, then why does it matter to know your student’s lottery number: just list the schools in which you are interested in your true order of preference.

The truth is, the matching process is not completely strategy-proof: while your best strategy is always to order the 12 schools in your list in your true order of preference, there is some strategizing when it comes to deciding which schools to include in your list of 12 choices. Recent changes in admissions have made lottery numbers the primary factor for admissions to MS and most HS. Knowing your lottery number is helpful for several reasons:

Researching schools is time- and resource-consuming. Parents and students have to attend numerous (virtual or in pre-COVID times in-person) open houses, learn about the course offerings of the different schools, understand the schools’ admission priorities, figure out if the school requires additional admission material (auditions, essays), for at least 12 schools, likely more, to build their list of 12 choices. On the school side, hosting multiple open houses requires a significant time investment from school staff. When tours were held in person, some popular citywide MS were hosting several tours in the Fall for only a handful of seats available for non-continuing students. Knowing their lottery numbers allows families to focus on likely matches and not waste time researching and comparing highly-selective schools if their random number is unlikely to give them access to these schools.
Avoid being unmatched. Last year various changes had a significant impact on the chances of admissions at various high schools. In particular, the move to a lottery process for several popular schools, coupled with the removal of some geographical priorities and the lack of clear communication about the changes and their impacts resulted in a large percentage of HS applicants not receiving an offer for any school on their list of 12 choices. Such outcomes are devastating for families: students who do not have another HS placement (e.g., Specialized HS) are assigned by the DOE to a school that rarely matches their expectations. This was particularly true for one Manhattan district (D2) whose students lost priority access to several schools, but because the choice behavior of applicants did not change, many students with bad lottery numbers ended up unmatched.
A parent shared with me the answer to a FOIL request for the results of the HS match in the 2021 application season, detailing the number of students who received an offer for one of their top-3, top-5, top-10 choices, or any choice on their list, citywide and by district. The results are shown in the image below. The bolded column shows the percentage of applicants who received an offer in the main round; the last column includes offers from the Specialized HS process in the percentage. As discussed above, students in D2 received the worst outcome, with 18% being unmatched, but many districts also had a high rate of unmatched students in the main HS application round (12% in D3, 11% in D22, 9% in D15, D21, D24, D25, D26), and 7% of applicants were unmatched citywide.

Percentage of NYC HS applicants receiving an offer from their ranked list for the 2021 admission cycle (data received via a FOIL request)

Managing expectations. Lastly, and perhaps more importantly, we are talking about children and teenagers on whom the process can be extremely stressful. Receiving the news that you did not match to a school that you had your heart set on while your friends did, or worst that you did not get any match, can be hard, especially for MS applicants who are only 10 or 11 years old; knowing the lottery numbers in advance can help parents steer their children’s hopes towards schools that they have a reasonable chance of getting matched to.

What can be learned from the random lottery number?

This is the first question that parents have when seeing their lottery number. I provided an explanation, along with a table to translate the long hexadecimal numbers to percentiles in my previous post. I am including an easier-to-read copy of the table below. You can download a pdf version here.

Theoretical Percentiles of the two first characters of the lottery numbers

Note that the table provides the theoretical percentiles, i.e., the expected percentile based on a uniform distribution of random numbers. Last year, after the match, the DOE provided applicants who FOILed their numbers with the actual percentiles, which matched the theoretical ones. This year, the DOE has not provided that information, stating:

Until the application has closed and we know the full pool of applicants, the relative rank of this lottery number cannot be calculated.

This is technically correct. The random numbers are drawn from a uniform distribution and random samples may differ from the expected values. However, sample percentiles are known to be asymptotically normally distributed around theoretical percentiles, with the variance depending on the sample size. For large samples (such as lottery numbers for all applicants to MS or HS), this means the sample percentiles will be very close to that of the theoretical distribution given above. In fact, for a sample of 60,000 numbers (similar to a cohort of HS or MS applicants), the sample percentages will almost always be identical to that of the theoretical samples provided in the table above. If you are interested in a more in-depth analysis, I ran simulations for various sample sizes and provide details at the end of this post.

What does that mean in terms of chances of admission?

Lottery numbers are just one part of the matching process. The chances of admission at a school depend on the number of seats available, the number of applicants to the school, their admission priorities, and how they ranked the school. The DOE only provides the number of applicants and seats but without more details on applicants per priority group and ranking information, there can be large variations in the chances of admissions to schools with the same number of applicants per seat (more on this in the next section).

In my previous post, I provided some preliminary results showing the odds of admissions for some schools based on crowdsourced data in the 2021 admission cycle. Additional results are provided at the end of this post. The data is only partial, as it relied on parent-provided information, and only covers a small subset of the schools. But it shows that it is possible to compile that information and gain some insights about the odds of admissions at all schools. Of course, the odds of admissions may change year to year, but some patterns are likely to continue, especially for MS admission where the process has not changed. Several factors will however impact the odds in both directions:

Decrease in enrollment. There are several reports that show that the number of students enrolled in DOE schools went down. This likely will result in fewer applications and may improve the odds of admissions at popular schools
New admission priorities. MS admissions now include sibling priority (only for students with 6th-grade siblings), and several schools have added or increased Diversity in Admissions (DIA) set-asides. Such changes in priority will obviously improve the odds for students within these priority groups, and decrease the number of seats (and therefore odds of admissions) available to students who are not part of these groups.
Major changes in screened HS admissions. Last year screened schools were either ranking students in a full order based on grades and scores — lottery numbers had no impact except as a tie-breaker, or were grouping students that matched some grade and score threshold and accepting qualified students in lottery order. This year, all screened HS (except a few schools that are allowed to require essay and audition materials) are using the group method. The grouping, done centrally by the DOE, has thresholds that are lower than last year’s so more students will qualify, which will decrease the odds of admissions. In addition, more schools are using a lottery approach, which may also have some effect on admission odds. ***Oct 2022 update: the screened school process was once again changed for the upcoming 2023 admission cycle. The grouping, now called “tiers,” results in smaller groups of students. It is likely that for students in tier 1, odds of admissions will be better than in Spring 2022, but it is impossible to know how much.***

So last year’s odds of admissions are informative, but should be considered in context: things will change this year. That said, having an idea of how things panned out last year can give a good intuition of your student’s chances at various schools based on their lottery number.

Why is the DOE not sharing historical odds of admissions?

The DOE should make the historical cutoff information available. It does not.

Last September, I filed a FOIL request to receive all school cutoffs from the DOE. Unfortunately, the DOE declined my request, stating (bolding mine):

Please be advised that the DOE’s Office of Student Enrollment (OSE) has informed the undersigned that a compilation of such data does not currently exist, and responding to your request would involve more than a simple extraction of data from a single computer storage system. Rather, it would require matching records across more than one computer storage system, and require extensive coding and programming, which the FOIL does not obligate an agency to do. Thus, your requests are denied because FOIL does not require more than “reasonable effort” in the retrieval or extraction of records or data.

First, the claim that compiling the information would “require extensive coding and programming” and “matching records across more than one computer storage system” is ludicrous. By necessity, the matching process has all the data in one system and should be easy to retrieve. But what is truly mind-boggling is that FOIL denial implies that the DOE is not compiling the odds of admissions (cutoffs) for every school. This is concerning for several reasons:

These school cutoffs should be logged as a by-product of the matching verification process. How can the DOE (or its third-party vendor) ensure there were no errors in the match without such verification? Errors in the match have been reported in past years, yet the DOE has always declined to provide any information on their validation and verification processes.
How can the DOE train its admission staff and guidance counselors to offer accurate and useful advice to students on their chances of admission if they do not have that information?
How can the DOE assess the impacts of changes in admission policy without understanding their outcomes on admission odds?

It is also possible that the DOE has the information but denied my request based on a technicality. But in that case, why wouldn’t the DOE be transparent about the school cutoffs, which should be publicly available?

What is the best way to list schools?

As part of the advice on how to strategize your school choices on the DOE website, the following suggestion is included:

List a balance of high-demand and average-demand programs on your application.
* If you have, say, four programs of interest that are high-demand, you should definitely list them!
* But also consider adding other options that are average demand. This helps ensure that you get an offer to a program you want.

The issue is that the DOE does not give a clear indication of what constitutes a high-demand, or an average-demand, program. In various admission events, DOE staff have suggested looking at the previous year’s number of applicants per seat, which is available on the MySchools description of each available program. While the number of applicants per seat is a reasonable measure of the school demand, it can be misleading:

The way applicants rank the school is important. Two schools may have the exact same set of applicants, but if one of them is consistently ranked higher than the other on the applicants’ lists, it will have lower odds. Consider two pairs of D2 MS in the 2021 admission cycle:
- Lower Manhattan Community MS (02M896) and Manhattan Academy of Technology (02M126): both are downtown schools, have around 95 seats and 500 applicants last year (5 per seat), yet based on the results of the crowdsourced data, the cutoff for Manhattan Academy of Technology was somewhere between 43% and 79% (a student with a number starting with ‘c9’ did not gain admission), while the cutoff for Lower Manhattan Community MS was above 89% (a student with number ‘e4’ was admitted).
- East Side MS (02M114) and Yorkville East MS (02M117): two UES schools, both with 8 applicants per seat last year. An applicant with ‘7a’ was not accepted at East Side MS, while one with ‘7f’ was accepted at Yorkville East MS (02M117).
In each case, the group of students applying to both schools may not be exactly the same, although both pairs of schools are in the same neighborhood so some overlap is likely. But despite having a similar number of applicants per seat, the schools have different admission odds, likely because one of them is routinely ranked higher than the other. (Note that the crowdsourced data may contain errors, as it is self-reported by parents, so it is possible these numbers are incorrect if, for instance, parents did not report their priority groups correctly)
It does not take applicants per priority group into account. For instance, Mamie Fay (30Q122) is a Queens school that has priority admissions for students from two elementary school programs. It is listed as having 7 applicants per seat, but no information is given as to how many seats are assigned to priority applicants, which can significantly impact the actual number of applicants per seat number for non-priority students.
Behavioral changes due to the release of lottery numbers will affect the number of applicants per seat in the future. This is especially true for HS applications where the need to strategize is more important than for MS. As families with bad lottery numbers will adjust their lists to avoid being unmatched and will drop high-demand schools from their choices, the number of applicants to these schools will decrease yet the odds of admissions (as measured by lottery percentiles) will likely not be affected.

What about waitlists? Is there a strategy for them?

This is where the DOE is throwing another curveball to families. From the DOE page on waitlists on their website:

How is my position on a waitlist determined?
* For waitlists you are automatically on, your child’s unique position on each is based on several admissions factors.
- These factors include admissions priorities, admissions methods, and randomly assigned numbers or ranked numbers (high school only) used during the matching process, as described at MySchools.nyc(Open external link) and either schools.nyc.gov/Middle or schools.nyc.gov/HSGuide.
-For high school programs where applicants are ranked based on screens or auditions, students will be ordered based on their rank within their priority group.
* For waitlists you add yourself to, your child’s position on these waitlists will be determined by the same factors, but your child’s waitlist position for this program will be after other students in the same priority group who listed that program on their high school application.

In addition, from another page on the DOE website:

For waitlists, each applicant receives a new random number for each waitlist they are on.

This information does impact ranking strategies for the main round, as families may consider waitlist chances when deciding to rank a school. A student with a bad main round lottery number may be lucky in the waitlist round and receive a good lottery number at a high-demand school that they wish to attend, but to have a realistic chance of being admitted from the waitlist, they have to list the school in the main round (otherwise their waitlist position will be after all students who have listed the school).

***Update July 2022: Although parents were told during DOE Zoom admission events that students’ random numbers would be regenerated for each waitlist a student was on, and that information was included on the DOE website (as copied above) throughout the admission process (as late as early July), it became clear to parents that the waitlists numbers were not changed for the screened schools: students with bad original random numbers had consistently bad waitlist numbers, students with good original numbers were at the top of many waitlists. This led to unfair situations where students with good placements in the main round were given multiple waitlist offers to popular schools, while students with no match in the main round had no hope to get an offer. In early July 2022, the DOE updated the waitlist information on their website and confirmed that for screened schools without assessment, the original lottery number would be used for waitlists unlike what had originally been announced:

Waitlist Positions for SCREENED Programs, NO ASSESSMENTS — High School Only
Students who are automatically on the waitlists for these programs remain positioned in the same order as during the admissions process, by admissions group 1–4 (determined by students’ grades) and within that group, by their application random number.
Students who add themselves to the waitlists of these programs will be positioned in their admissions group (1–4).

This was a blow for students with bad application lottery numbers who had hoped to have better luck in the waitlist process. It also meant many families had wasted some of their choices on false hope. It is not clear why the DOE decided to handle no assessment screened schools’ waitlists differently from Open or Ed. Opt. schools’ waitlists, for which random numbers were re-generated. End Update***

Optimizing for waitlists is risky as it requires allocating one (or more) of the 12 choices in the main round for a potential shot at a good waitlist outcome. This will mechanically increase the chances of being unmatched for students who do not have a guaranteed placement (and further add to the inequitable treatment between students who have a guaranteed placement through continuing student priority, or zoned priority, and those who don’t since those with guaranteed matches can “play the waitlist odds” without risk). It also requires families to apply complex game theory reasoning to their school application decisions, and evaluate the tradeoffs between the risk of being unmatched and the opportunity for an additional chance in the lottery for their most preferred school(s). Finally, there is very little information as to the odds of admissions from waitlists; last year anecdotal evidence suggests that many schools did not admit any students from their waitlists, others made extensive use of them. However, as usual with the NYC DOE, there is no transparency on the number of students who were admitted from waitlists, so any ranking decision based on waitlists' chances would be fraught.

The process to apply to MS and HS in NYC is complicated and stressful. The DOE could alleviate some of the stress by making it more transparent. The release of lottery numbers to families before applications are due this year is a step in the right direction. More information on the actual chances of admissions, and risks of not getting a match on your list, should be provided as well.

***Results from the 2021 MS and HS crowdsourcing surveys are reported below, as well as a simulation of various sample sizes for lottery numbers to illustrate the percentile variance.***

Results from the 2021 Crowdsourcing Surveys

Here are the final results of the survey for the 2021 admission cycle. A few things to note:

Data is self-reported by parents and may contain errors. Parents self-reported priority groups (FRL and SWD), but the survey did not explicitly ask for sibling, continuing student, or other priority, which may impact results. In addition, the survey did not differentiate among multiple programs at the same school.
The data is sparse, and therefore incomplete. Data is missing for many schools, it is skewed towards some schools and districts (where the survey was shared). This does not make the information incorrect, but it does limit its usefulness.
In many cases, I could not narrow the cutoff precisely, so I am providing both the highest lottery number that got in and the lowest that did not, along with their percentiles. Actual cutoffs are within these bounds.
Results are for information purposes but are not predictive. Odds are likely to change this year, especially for HS.
Students qualifying for ICT designation are entered into a different lottery process, with separate seats and different odds. The survey received too few results to report.

MS Admission Crowdsourced Data

There were 125 answers to the survey, 70 of which included lottery numbers; most answers were from D2, D3, D15, D20, D21, and D30. Results for the FRL priority group are also provided when available. The results are in the table below.

HS Admission Crowdsourced Data

There were 136 answers to the survey, 107 of which included lottery numbers. Results for the FRL priority group are also provided when available. Results are in the table below.

Last year’s process for screened schools was different: results from schools that fully ranked students are omitted from the table below, as that information would not be applicable this year; results from schools that use a lottery for admissions (Open and Ed. Opt) or screened schools that used a batch method similar to this year’s process are included. Students within the same batch were assigned the same “rank”, reported in the table (school rank was computed differently at each school).

Note that the grouping for such schools was done differently last year and the changes are likely to significantly impact the odds of admission this year. In some cases, results are conflicting because the survey did not differentiate for specific programs at one school, or did not account for geographical differences.

Explanatory notes are given in the table. Schools that received unmatched students (and therefore had available seats after the main round) are identified. Some notable points:

The survey did not differentiate Ed. Opt. groups.
The cutoff for one school (02M414 : N.Y.C. Museum School) was determined with precision as two students with close numbers (‘457’ and ‘45b’) had different outcomes.
Students eligible for FRL set-asides typically have better odds but by no means guaranteed admission. For instance, general odds of admissions for 02M412 : N.Y.C. Lab School for Collaborative Studies were somewhere between 4% and 16%, and odds for students qualifying for the set aside were between 25% and 33%.
When a rank is provided, it means the school was grouping the students in “batches” based on their qualifications according to the school screen. Lottery numbers were used to differentiate students within the same batch. For many selective schools, only students with both a rank of 1 (first batch) and a good lottery number were admitted. For instance, to be admitted to 01M696 : Bard High School Early College students needed to be both ranked in the first batch by the school and have a lottery number better than ‘64’.

Lottery Sample Simulations

If you are interested in statistics…
As mentioned above, the lottery numbers are drawn from a uniform distribution and random samples may differ from the expected values. However, sample percentiles are known to be asymptotically normally distributed around theoretical percentiles, with the variance depending on the sample size. To illustrate this, I ran a simulation of 1,000 samples for 4 different sample sizes: 60,000, 2,500, 600, and 100, and plotted the theoretical percentiles (black line), median sample percentile (dark blue line), 10–90 percentile range of sample percentiles (where 80% of sample percentiles will fall, in medium blue), and a full range of observed sample percentiles (in light blue). I show the results in the four plots below. The x-axis represents the first two characters of the lottery numbers, and the y-axis is the observed percent of lottery numbers that are similar or better in the simulated sample.

First Two Characters of Lottery Numbers to Percentile Conversion — Empirical Data with Different Sample Sizes

The sample sizes were chosen to represent various comparison groups: 60,000 for a cohort of citywide students (there are typically between 60,000 and 80,000 students in each grade cohort), 2,500 represents a large sample of applicants (e.g., all applicants from a district, or to a high-demand HS — some HS have more than 5,000 applicants), 600 and 100 represent medium and small samples of applicants (e.g., average- or low- demand HS, or MS)

60,000 Lottery Numbers: As expected, the sample percentiles track very closely with the theoretical percentiles, showing that the table provided above gives a close approximation of how a given lottery number compares to all other lottery numbers in the city. The DOE could safely provide the theoretical percentile information to parents, along with their random number on MySchools.
2,500 Lottery Numbers: With a sample size that is not as large, we start seeing some variations, but the actual percentile will be within 1% of the theoretical percentage 80% of the time (medium blue ribbon). For MS applications, where students apply within their district, or for application to high-demand HS, the theoretical percentiles are a good approximation of the actual percentile.
600 Lottery Numbers: A smaller sample size will have more variance. In that case, the percentiles given above are more approximate, but can still give you a good idea of where a lottery number stands compared to others. For instance, for a school with 600 applicants, a lottery number starting with ‘3a’ is expected to be in the 23rd percentile. In the 1,000 simulations, that lottery number was between the 21st and 25th percentile 80% of the time, but in some extremely skewed cases could present in the 18th or 28th percentile.
100 Lottery Numbers: Finally, the distribution of small sample percentiles is obviously more skewed as more variations are possible. Imagine rolling a die: if you roll it 6 times, you are more likely to have a result that differs significantly from the 16.67% probability of rolling a 6, than if you rolled it 6,000 times. This is due to large variability in small samples and can be shown in the last plot. This means that a lottery number will have more variability in how it compares to that of other applicants in a school with a small number of applicants. However, this is unlikely to have much impact as schools with low numbers of applicants are typically not selective.
Another reason I included this plot is that it also illustrates another common occurrence with another type of small samples: applicants from the same school, or applicants within a group of friends. Parents often believe that the system is rigged because they know a group of applicants with unusually bad (or good) lottery numbers. This is known as the Law of Small Numbers: people make inferences from small samples, but small samples are very likely to have outlier results.