Data-Driven Approach to Shortlist Universities for an MS in the US, using https://admits.fyi

Pranav Ramarao
Nov 2 · 7 min read

Feel free to skip “The Story” section and go directly to the “Product Demo” section :)

Some background: I applied to universities for an MS in the US back in 2015 for the Fall 2016 term. I eventually went to the University of Michigan, Ann Arbor for a Masters in Computer Science. Right now, I am working at Google in Mountain View as a Software Engineer. This is merely a free side project to help the community of students aspiring for graduate studies, based on difficulties I and a couple of my friends faced in the past. We hope the tool is useful to you and can ease the process of graduate applications.

The Story

Applying for an MS in the US was quite a daunting task for me back in 2015. It wasn’t exactly clear how to go about choosing universities. Seniors from college would be the major source of insights, sometimes siblings/relatives. They would suggest applying to a mix of ambitious, moderate and safe universities, while also keeping tuition fees and opportunities for RA/TA in mind. There was, however, limited information on past admits and rejects. I felt that was a critical piece of information as just by looking at the data, you can answer several questions:

  • Will my GRE score be sufficient to get admit from University X? Have they ever admitted someone with such scores?
  • Is there a university that has traditionally liked picking students from my college?
  • Which universities are best suited for a GPA say between 7and 8?
  • I have few years of work experience, which colleges give weight to that?

Answering these questions solves the problem of knowing not just which universities to apply to, but also the ones to NOT apply to. 100,000+ students apply for an MS in the US every year. That is a ton of data points that could be very useful to answer a lot of the above questions.

So me and a friend of mine, set out trying to gather data points from across the internet. At first, we were putting together a big list of Facebook Pages that contained a lot of google sheets, excel files floating around and compiled that list. Next, we created some Google forms and passed it around to folks currently doing their MS. We took the help of some friends to pass around the form. After this, we had a decently sized database. But we didn’t want to stop there. There were a lot of websites out there that contained profiles of students, with information on their admits and rejects. This ranged from forums where the data was highly unformatted, to some well-formatted sources. It was clear that manual effort wasn’t going to scale, hence we had to write a lot of scripts to fetch this data.

Fast forward 3 weeks and we, were sitting on top of 350,000+ admits and rejects. When we started, we were super skeptical of getting anywhere close to this number. Now, we were excited to share this data with the public! :)

But there were some key problems:

  1. Unclean and noisy data!
  2. How do we share this data? On google sheet? Pass around an excel file?

Problem 1 — Unclean & Noisy Data

The data from all the sources was unclean. There were hundreds of forms, for the same university eg ASU, Asu, arizona state university, Arizona State, etc. Some universities even had multiple abbreviations.

We even had data from the days when GRE was not in a 340 scale but a 1600 scale. Besides, Grade points were on different scales: 4, 10 and 100

Undergrad college names were a total mess. “BITS Pilani”, “B.I.T.S Pilani”, “Birla Institute of Technology and Science”. Besides, there were also different sister campuses, in India, there were affiliated colleges. There were just too many of them!

Solution:

We once again had to resort to writing code to deal with the scale. We had multiple techniques (out of scope for this article) that helped us massively clean the data at scale. It was initially an annoying process but we were able to make it a fun task, with some interesting solutions.

Problem 2 — Data Presentation

Now that we had all the data we needed, we weren’t sure how to share it with the others to use it! We tried putting it all in a Google Sheet but the data was too big for it! It would often not handle the load and crash. We decided to take the website route as we had a lot more control over the user interface. If you think about it, users are not interested in all the 350K+ data points. For any given question a user had, there were usually only about a few hundred data points of interest. We wanted to make it a smooth experience, and hence took help of a UX freelancer. Since we had gone through this problem ourselves, we had a couple of ideas on how we wanted the tool to be.

Our Solution: https://admits.fyi

We decided on a couple of key principles while building the tool:

  1. An easy to remember URL
  2. Clean data with limited redundancy in naming
  3. Quick access to the data (no sign up required)
  4. Free, no ads, unobtrusive interface
  5. Easy filtering of the data to answer questions

Our final solution is admits.fyi :)

Product Demo:

Here, we will go over some example type questions students generally have,

What are my chances at ASU Mechanical Engineering with a GRE score below 305?

Sample results on admits.fyi with the above filters applied

When you want to answer such a question, there are multiple data slices that reflect the nature of the question:

Filter 1: Interested only in Arizona State University

Filter 2: “get into ASU” => See only Admits

Filter 3: Interested only in Mechanical Engineering

Filter 4: GRE Total score < 305

To apply these filters on the data, simply click on the column headings of the respective fields and start applying the filters. For instance, click on the “Total” column header, and adjust the slider, restricting it to 295–305 (the lower filter is so that we keep the profile range tight).

For the textual columns like University name etc, you should see a text box appear. Don’t worry! The Autocomplete system is pretty powerful and it can pretty accurately lead you to the university/college you are looking for.

Where did past students from my college receive admit from?

Sample results on admits.fyi with the above filters applied

Here, you can apply 3 filters; one for admits, one for target major = Computer Science and the last one for undergrad college being “BITS Pilani, Hyderabad Campus”. Once again, start typing the college name, and you should see the autocomplete assisting you immediately :)

Seeing profiles of Admits into U. Michigan Industrial Eng. over the last 3 years

Apply a filter for University, Status, Term, and Target Major For the “Term” column header, you can restrict results to just “Fall” or “Spring” terms as well.

Complex Uses: See all Admits & Rejects for profiles similar to mine

For a more complex use case such as this, you can apply a filter to all columns that relate to your profile. For instance, if your profile is as follows:

GRE Score: 322, GPA: 9.2, Work experience 2 years, TOEFL score 108, from PESIT Bangalore, you can apply the below filters:

GRE score: 318–324; GPA: 9–9.3; TOEFL scores 106–110 and College=PESIT Bangalore

This way, you are filtering for results which are a few points within the range of your scores. The reason why such a query could work is the size of data we have that could potentially support a wide range of student profiles!

Example of a more complex use case

A quick “Clear Filters” button to start over the search!

Clear filters button

My favorite, a quick way to restart the process and start answering a different question, as I have plenty :)

We hope this is indeed useful for all of you, do share it with your friends as we know how stressful it can be, without actually getting objective data points such as this.

Plans for Future

We were overwhelmed with the response we received. A lot of students pinged us telling how useful the tool was to them. We got a great feedback and suggestions too! We do have a lot of future ideas, however, we are short on time as we have full-time jobs which is our current bottleneck. Nevertheless, putting out the ideas:

  • Achieving a higher cleaning rate (99%+). Right now, we are at ~90%
  • Provide an option to flag certain rows as absurd, to ensure data quality and encourage crowdsourcing.
  • Better mobile experience (seeing tables on a mobile is not great right now)
  • Smarter tools with recommendation engines / ML models that take it one step further to ease the burden on students.

Thanks and happy to take any questions you might have in the comments section. Also, feel free to share your profile/results in the comments to make the database even bigger! :)

-Pranav & Abdul