Open Research Initiative

Ayushi Sinha
Retrospectare
Published in
11 min readJan 13, 2021

Open Research Initiative is a new resource to inspire research topics for STEM-focused students at Princeton. Eno and I reached out to non-profits, startups, and industry leaders to curate cutting-edge real-world research topics (and hopefully share relevant data) that students might use to find interesting research problems for their Junior and Senior research. In this partnership, industry provides opportunities, research questions, and funding, and Princeton provides professors and students to collaborate.

As part of the Computer Science curriculum, Junior complete semester-long projects (“IW”) and Seniors undertake year-long independent theses.

Key Takeaways:

  • Princeton (can be extended to undergrads at all schools, we just started with a small scope!) students are doing really interesting and novel research. However, there’s a difference to note in the scale and expectation of undergraduate research compared to graduate research.
  • Big tech companies with large research arms and existing partnerships with academia are the ideal primary persona to partner with.
  • A lot of relationships between academia and industry are based on the people. Is it possible to scale relationships between academia and industry? (e.g. Facebook AI researcher working on NLP was in the same PhD program as a Professor who is leading NLP research at a given institution OR a certain company has been funding longer-term research at a given research lab).
  • We learned that it’s not easy to replicate/engineer the relationship aspect of research which complicates any system attempting to organize research partnerships
  • Is 6M — 1Y a meaningful amount of time to work on industry-supplied research questions? As Eno says, “one i found to be especially pertinent, it was clear after our research that for the most part academic and commercial entities operate on completely different timetables. Any system facilitating collaboration between the two would have to deal with this weird timing imbalance and this imo reduces the obvious use cases of such a system.”
  • What’s the role of mentorship and hand-holding required from the company’s side?
  • Distilling personal interests and interesting problems into a research question is a core part of the research experience.

Motivation

Open data

I am personally SO bullish on the future of open data and believe introducing students, who are the future of tech after all, to the world of open data is really important.

At the same time, I understand why private companies may not want to share their data with private & for-profit entities. As our friend David puts it, “companies are more likely to collaborate with universities than with other companies since universities are not commercializing their research”

On the other hand, there seems to be less scrutiny with sharing data with academia, and especially students. Moreover, academia often is the biggest champion of reproducibility and will attempt to reconstruct experiments.

Real-World Inspiration

So much of the research we do seems theoretical or is rarely used. Finding real-world inspiration for a senior thesis is incredibly motivating.

For example, for my junior CS research seminar, two friends and I built EyeBeat Revolution. We built an experience enabling individuals with disabilities to play musical instruments, received a provisional patent, and presented at Princeton Research Day. The interest in pursuing a real-world application was there: the three of us care deeply about VR, accessibility, and music education. Yet, we didn’t pursue applying our work to the “real world”, though our research was inspired by the real world.

There are example models of industry-academic collaborations at other universities, such as:

  • The University of Michigan and Ford Motor Company have partnered to form the UM & Ford Center for Autonomous Vehicles (FCAV)
  • The Berkeley DeepDrive Industrial Consortium “partners with private industry sponsors and brings faculty and researchers together from multiple departments and centers to develop new and emerging technologies with real-world applications in the automotive industry. Research is proposed by UC Berkeley faculty and approved by a BDD advisory board composed of faculty and sponsor representatives.”
  • The BAIR Open Research Commons (“BAIR Commons”) is an industrial affiliate program designed to accelerate cutting-edge AI research.
  • CMU Argo AI Center for Autonomous Vehicle Research

Limited Resources

Advising Resources

Princeton CS is growing increasingly popular, putting a strain on advising resources. In response, the Independent Work seminars offer a way to scale introducing CS students to research, without requiring a 1:1 professor to student ratio.

Compute

From David: “compute is an underlooked bottleneck for doing research (especially in AI). Princeton itself only makes GPU clusters like Ionic available to undergrads if they are affiliated with a research group/professor. Thus if Princeton wants to make research more accessible, it needs to think about how to provide students with more computing resources.” Partnerships with industry is one avenue.

Connection to industry

We found that both industry and students are interested in cultivating relationships that may be beneficial to both parties (in the context of recruiting).

The relationship between academia and certain professors (and by extension, their graduate students) are quite strong. Some professors receive research grants and access to proprietary data through these formal or informal partnerships. From our interviews with CS faculty, we found that these partnerships are often grounded on prior experience working together as well.

How it would work

V0: A running list of problems and/or research questions supplied by our industry partners.

Eno and I would be responsible for reaching out to startups and curating a list of problems/research questions. This was inspired by Professor Brian Kernighan’s list of inspirations for projects for the popular class COS 333.

As a proof of concept, we cold emailed to high-tech startups, asking the following questions.

  • Name
  • Contact Information
  • Please tell us a bit more about your problem space
  • What is the core problem you’re trying to solve?
  • What is your current technical approach to solving that problem?
  • What are the hardest technical challenges about that problem space?
  • What is the biggest technical obstacle your team faces?
  • What is the most promising technical solution to your problem that you are not exploring?
  • Do you have any advice for people interested in learning more about this technical problem?

We included this disclaimer in our follow-up:

We want to note that while your responses will only be shared with undergraduate researchers, please do not share any information that your business might consider sensitive. The goal of this project is to provide real world-context for interesting STEM problems, so feel free to be as specific or general as you’d like.

Unfortunately, only a few responded to our outreach. While we stopped here (more info on why in the “blockers” section). However, for this exercise, we’ll share the next steps:

V1: Bumble for Research Collaboration

Eno and I would create a profile for each company and include

  • List answers to the above questions
  • Add contact information
  • Indicate if data was available
  • Link to related work

Like Bumble (lol), only one party (in this case the student) has the option of initiating conversation.

V2: Co-Advising for Junior and Senior Research

This is an option for the companies who have the resources to share their data and offer some mentorship. This would require establishing partnerships between students, industry mentors, and faculty advisors.

In our user interviews, we found that many companies didn’t have the manpower to dedicate weekly meetings. To lower the barrier, we would only require the industry mentor to meet three times with the student & faculty advisor. First, at the beginning of the semester to set expectations and timelines and discuss the sharing of data and problem scope. Second, at the midpoint to ensure the student was on track and had all of the necessary information and relevant updates. Third, at the end of the research project, where the student would present the findings, research paper, and next steps to the industry mentor. From here, it’s up to the industry mentor to implement these recommendations and findings.

This cadence was inspired by the cadence of manager check-ins during our internships at Microsoft. All managers checked in at the beginning, midpoint, and end of the internship with optional weekly meetings.

Blockers

Institutional support

  • Since junior and senior research is a requirement and incorporated into the student’s GPA, we needed the university’s full support of this program. Without the official stamp, we could offer optional “match-making” between students and industry but wanted to start this program on solid footing, hoping to create a program that lasted past our time at Princeton. While we had a decent amount of success talking to administrators / the institution, we never received commitment or support from the administration.

Industry interest

  • We had minimal interest from our cold emails to startups and warm introductions to liaisons of industry research at big tech companies fell through (aka they left us on read).
  • Eno’s learnings: “this was an immediate indicator that either we were not presenting the idea clearly enough to the startups, or it was just not interesting / desirable of a proposition for them to respond.”
  • Our friend David gave us some really insightful feedback on how to navigate understanding industry interest and where to loop in the professor: “It would be much better if a professor reached out. Because the only reason an industry lab would share information is if they see some benefit to it (like a collaboration). Coming from a pair of undergraduates it sounds like you are trying to steal their business ideas…But actually in practice if a company really needs help, it would be them reaching out to the professor. Not the other way around”

COVID-19

  • In the midst of this proposal and our conversations with the CS department, COVID-19 struct, and this was simply no longer a priority for the department (and understandably so! The whole school had to quickly shift virtual and prepare for a follow virtual semester).

Strength of pre-existing relationships

  • Eno and I were doing something disruptive by trying to create brand new relationships between industry and not established and accredited professors, but no-name, inexperienced students.
  • As Eno notes: “this fact might even create a self-reinforcing cycle. one would imagine that the people who enter a system and think “oh this could be better”, are usually not the people who are experienced / established (who over the course of becoming established become comfortable with the existing system)”

Timeline

  • A partner with Union Square ventures brought up that getting MBA students to help with non-technical projects was scopeable to a 6M-1Y commitment and didn’t require any technical onboarding or continued mentorship. She was concerned that a part-time 6M or 1Y commitment would be too small of a time and too much of an investment to ask of the company. (It’s important to note a potential miscommunication here. She indicated that she was thinking of it like an internship-lite type experience whereas I think we had imagined a slightly more hands off role from the private company partner.)
  • From David: “Research is never done. What’s important is to have stakeholders in the company and university (professor) who are driving the research long term. As long as you have this then, scoping out a project for a student to join or do is straightforward. Ideally a student commits to working on something for at least a year.” He also had a great counter argument: “But if they don’t then their work just gets handed off. It’s not really different from how pure academic research works right?”

Resources

  • Companies were concerned that too many resources would be required to onboard students for projects that had no long term timeline (e.g. only last 6M-1Y).
  • Companies were also concerned about depending on and integrating a non-employee’s code into their code-base. Not all projects are created in a vacuum, and sometimes the ones with the most “impact” have dependencies and potentially impact other projects the companies are working on.

A weird mentorship triangle

  • A student taking a first stab at CS research isn’t a “dehydrated engineer” and needs some level of mentorship. As Eno put’s it, “you cant just add water to a student and have them output code like an actual companies engineer.”
  • How much hand-holding is required from the industry liaison/point of contact?

Follow Through

  • A student is incentivized to follow through with the project because a grade depends on this. However, what is incentivizing the industry liaison/point of contact to keep up the commitment (of access to data and mentorship)? What happens if they leave the student hanging, thus resulting in an incomplete project? While crafting a research project that mitigates dependencies on the industry partner may be one solution, isn’t that antithetical to the whole point of this initiative.

Inspiring trust between both parties

  • Since so many of the relationships between academia and industry are based on previous working experience or relationships, we would have to inspire trust between these two parties. This is especially hard because students don’t necessarily have a strong track record of research behind them.

The Role of the Graduate Student

  • There’s a reason why many research labs with industry experience have a whole team of graduate students — reliability and longer timelines.
  • From David: Anything involving real research must be graduate student centric because graduate students are ultimately the ones who are trusted (i.e. they can’t easily drop out whereas an undergrad can easily quit and in practice do often quit). They are also the most qualified and have the most domain knowledge. The industry-academic affiliate programs I linked above are all focused on pairing graduate students and their advisors to industry labs. Plenty of undergraduates are involved but they are not the lead; they are hired and led by the PhD students. Establishing a foundation that starts with PhD students and professors is important before undergraduates get involved. This is both to encourage buy in from the company who expects publications and tangible outcomes, and also to ensure that projects don’t die because an undergraduate leaves.”

Next Steps

  • Current students should totally reach out to industry contacts (or cold email!) for research inspiration and/or access to data. Eno and I both took advantage of “coffee chats” just to learn more about different projects and problems across many different teams at Microsoft, and would highly recommend that same approach to students looking for inspiration. For example, Peggy Johnson (former EVP @ Microsoft) responded to my cold email about my senior thesis in Facial Recognition.
  • Can the CS department incorporate some (or all!) of our proposal into the existing IW research structure or offer this as an additional resource?
  • Are any current alumni working on projects that would benefit from “another set of eyes” or want to mentor a young researcher?
  • Actively maintaining a database or page of research ideas. As a twist to our proposal, David suggests “curating this list from professors and explicitly asking them to recommend projects that they would like to explore but have not had the time to do. Then the onus is on the student to reach out to the professor if they see something that they like. Princeton CS has a page like this but it’s clearly outdated.”
  • Our big vision was to extend this past Princeton and create an online portal + community for academia and research to collaborate on open research projects. Can we still achieve some part of that larger goal, by sharing open data and engaging in online challenges? From David: “Online challenges like those from computer science conferences are a huge opportunity for learning and professors are likely more willing to invest time on students who are motivated enough to pursue things like this. CVPR hosts challenges on various datasets each year like this or this or this, and Kaggle also is a great source of competitions.”

This proposal was co-authored by Ayushi Sinha and Eno Reyes, and incorporates comments from David Fan. Big Shoutout to Soham Daga, Evan Wood, and the Princeton CS Department (Professor JP Singh and Professor Jennifer Rexford) for feedback on this idea! And Professor Brian Kernighan’s COS 333 list for inspiration.

--

--

Ayushi Sinha
Retrospectare

MBA @ Harvard, co-founder @ yustha.yoga | Princeton CS, investor @ Bain Capital Ventures, Microsoft