Eugene Wu
thewulab
Published in
10 min readJun 11, 2018

--

Year 1: They Don’t Teach This in Grad School

They really don’t give you a manual when starting the professorship. It’s like you get your driver’s license, and they hand you the keys to a Boeing 777. Like, sure they both technically have wheels…

Year 1 was trial by fire, where I learned a lot about the other 80% of research — recruiting, mentoring, collaborating, teaching, and management — based on what didn’t work. This post is about many things that went wrong, and some that went right.

Recruiting and Mentoring

Recruiting is really really hard — you’re betting 6 years on a paper application and 3–4 hours of Skype calls.

Theoretically, a benefit of the postdoc is to recruit PhD students the year before you start, so that you’ve found a group of first year students to work with when you start the professorship. I did a fair amount of recruiting and had phone calls with over a dozen candidates. PhD students cost a huge amount of money and time, so most advice was to be very conservative. I was very good at recognizing students that will go to top-5 schools. The problem was that they all went to the top-5 schools.

It’s hard to scale out research without students to help do the work. Since I didn’t have any PhD students and didn’t yet have a sense of how to assess applicants, I was liberal in accepting undergraduate and masters research applications (almost 10) to work on a variety of promising research leads.

The hard part is that lots of students don’t yet have the technical background nor research experience to jump onto an open-ended problem, and need more structure. That was something I did not anticipate nor provide, which led to a huge amount of wasted effort on my and, unfortunately, the students’ part. Most students disappeared by the end of the semester, and many within a few weeks.

Based on this experience, there were two areas I tried to focus on improving. These are similar to Andy Grove’s management advice: that managers serve primarily to motivate and to train.

  • Assessment: I found that ambitious students end up going much further, and doing much better work, than any other student. The ambition itself can be anything — curiosity, grad school, getting rich, etc — but those students tend to think of new ideas and ultimately create new opportunities. The signals for identifying these students is a bit simpler, because they’ve driven projects in the past, and think of research as a stepping stone, which it should be!
  • Mentoring: Students still need the tools to conduct research. I initially thought that students would “figure it out” when given an open problem, but they got stuck at different parts of the research process. This ranged from mechanical issues (e.g., needing more advanced coding experience), to conceptual issues (e.g., understanding the problem, learning how to google or read papers to answer questions), to human issues (e.g., forcing functions, overcommitment). I started to both identify and explicitly give guidance for these issues, and switch to a more “progressive” on-boarding model, where students get started by implementing the crudest, most trivial version of a solution, and read papers to think of better alternatives. That provides something tangible to start with, but gives the student ownership.

One thing to note is that significant technical background and research experience are not strictly necessary. Ambition is. Many students have joined the group with little or no experience in the topic area and learned it all on the fly.

There are three Columbia students from Year 1 that I do want to mention to look out for. Daniel Alabi and Zhengjie Miao were masters students that are now PhD students at Harvard and Duke, respectively. Hamed Nilforoshan was a freshman that was excited about crowdsourcing, came up with a bunch of ideas from reading papers on his own. He is now the most senior (by membership) student in the WuLab and will be presenting his ICWSM paper later this month!

Research Progress

I take a lot of cues from WuTang. One is to Diversify Yo Bonds. I’ve always done this by working on a variety of projects with different people. We still worked on and submitted a bunch of papers in Year 1 thanks to some amazing collaborations.

UMass Amherst: In late 2014, I had just finished job talks about explaining errors in data visualizations, and was thinking about where else explanations could be interesting. I’d been following Alexandra Meliou’s work for a while since I knew her from my undergrad at Berkeley and because she had moved to UMass Amherst. I emailed her with a wacky idea about finding and fixing queries that introduced errors in the database. Miraculously, she and her student Xiaolan Wang were open to the idea and we started working together! It’s a real testament to Xiaolan that we slogged through the many many months of tough work and dead-ends before landing on a clean constraint formulation of the problem. I felt so fortunate to work with them, and learned a lot about how to be a great collaborator and to be precise in my writing. In 2016, we presented a nice demo at SIGMOD and our full paper was rejected.

OSU: Arnab had suggested that his student Lilong Jiang help out on my series of perceptual experiments of animated bar charts. The idea was to quantify how accurately humans can understand simple animated bar charts by varying the data, the frame-rate, and different classes of approximations. This would be cool because we might learn that users are less sensitive to particular types of approximation, and leverage that to make visualizations faster. He did a huge amount of work understanding the background material, designing the experiments, and figuring out which statistical tests to actually run.

This was an example of a sad collaboration where we put in a tremendous amount of work, yet were repeatedly rejected. After consulting with a visualization researcher, we concluded that the goal was far harder than we were capable of answering at the time, and no amount of revising would fix that. The short story is that perception is a deeply personal and subjective task, that varies between people and also for a given person depending on e.g., prior knowledge, fatigue, goals, surroundings, etc. Exploiting it for the sake of performance runs the risk of “quickly getting the wrong answer”, which is a big no-no. Rather than “flip” the work, we decided to back off and build more of the foundations.

Berkeley: The final, and longest lasting, collaboration was with Sanjay. We have a habit of working on (what we believe are) very cool, innovative ideas and then getting our paper submissions hammered. I suspect it’s partly because we get so excited about the ideas that the writing and execution is slightly sloppy. The ActiveClean paper had been rejected, so we threw together a SIGMOD demo submission in a week (it won a best demo award!) and focused primarily on a full paper resubmission.

Teaching

Great researchers are often excellent instructors. I wanted to do a good job teaching the Introduction to Databases course, so arrived a month before school started to work on developing the course. I recreated all of the lectures, revamped the semester-long project from PHP+Oracle to Python+PostgreSQL, and updated many of the assignments. I even bought nice leather shoes and a blazer!

Despite this, the teaching evaluations were abysmal and there was a revolt on piazza (not as delicious as a pizza revolt).

To not bury the lede, I learned to focus on two things in the future.

  1. Students are busy people, so providing clear expectations up front on how the class will run is important for them to plan. Too much creativity and unexpected variation makes it difficult to assess how much work the class will require, and how much to second guess the grading policy.
  2. Classes already feel competitive, so it’s so important that assessment is, and just as importantly, feels fair.

The midterm serves as an excellent example of the issues. I wanted to make the midterm colorful; I had just started watching Rick and Morty, and wanted to create a funny question based around the relationships between Rick, Morty, Meeseeks, and the Butter Machine. The issue is that students that have never watched the show felt disadvantaged due to the unfamiliar terms that have nothing to do with databases. Worse, I accidentally introduced an ambiguous question, so that some students were erroneously marked incorrect due to an ambiguous interpretation during grading.

A student brought this up after the exam and demanded a resolution. I tried to solve the issue on the spot and said that everyone would get extra credit equal to the question’s value. This was deeply unpopular because some students that got the question wrong would receive undeserved credit, while other students would get double credit. In reality, extra credit is added after curving and deciding grade cut-offs, so it benefits everyone, but was perceived as unfair.

Now, all of these mistakes could have been redeemed, if not for the final nail: I accidentally allowed piazza posts that are anonymous to the staff. This is a killer. A student started a thread that body-slammed my character and qualification to teach based on how I handled the midterm. The thread grew to 30–40 comments, with multiple factions for and against me. In retrospect, it’s pretty funny, but not at the time!

The upside was that the students’ experience were bimodal. I am always supportive of students that put in the work to learn, and spent a lot of time with many students that had less programming experience. Several students started the class with very minimal programming background, and over the course of many office hours, progressed to the point that they were able to meaningfully contribute to their semester-long web application project. I recently learned that these students are termed “Conversational Programmers” whose goals are to be able to communicate with software engineers, but not write programs themselves. I was honored to write a recommendation for one student’s successful HBS application to that effect.

Summer

Having no students meant that I was quite liberal in taking summer interns and visitors, and worked with them on a bunch of ambitious, early stage ideas. None of this work was immediately published during the summer, but sowed many of the seeds that are now starting to blossom. Research can take years before they fully mature.

I was surprised at how large the lab expanded to during this summer. Hamed, Zhengjie, an undergraduate James, and a masters student Sharan wanted to stick around over the summer. They were joined by a larger cast:

  • My grant with Joe had gotten funded after adding the illustrious Jeff Heer to the line-up (like adding KD to Golden State), and Joe had admitted a student Yifan Wu to work on database+vis research. We thought it would be fun if Yifan spent a summer month at Columbia to work with us. These sorts of visits can help create lasting collaborations!
  • By an incredible stroke of luck, Fotis Psallidas was switching advisors, and wanted to see if we could work together. I had tried a similar experiment in my first semester that went completely nowhere, however Fotis had a database background, and a first author Sigmod paper, so I decided to take a chance. Holy crap, that was an excellent decision!
  • My buddy Phillipe Cudre-Maroux is a professor that lives it up in Fribourg, Switzerland. His student Laura Rettig wanted to hang out in NYC for the summer, so I agreed to host her as well, and see if there was anything to do at the intersection of deep learning and data integration.
  • Finally, HaoCi Zhang was a junior at Tsinghua University, that for some reason that I will probably never understand, emailed me asking to intern over the summer. I generally ignored these emails, but he was part of the Yao class at Tsinghua, which is basically the top class of students in the top university in China. It can feel risky to take on interns for just the summer, but I feel like my experience has been pretty good, but required interviews similar to assessing PhD candidates.

The summer was a smashing success! The lab finally existed, and was lively. I didn’t have a long-standing project for people to jump onto, so everyone was exploring new problems to pursue. Fotis was finishing up some previous work and trying to understand my crazy SQL+Vis language, Yifan was thinking about visualization design and system performance, Laura explored generating examples for deep-learning, Hamed and James worked on the beginnings of a system to automatically generate writing feedback, Zhengjie continued to develop his mouse prediction model and use cases, Sharan explored clustering over streams, and HaoCi started a prototype to generate interactive interfaces from query logs.

Organization

One of the amazing parts of Sam Madden’s lab was how much great work emerged organically simply from students within the group sharing what they’re up to. My existing approach of weekly meetings with each project wouldn’t allow for that, so I wanted to experiment with some organizational structure to foster a similar culture

I switched to a “standup” model, where we met at 10AM every weekday during the summer. During each standup, we discussed some technology current events, what each person had achieved in the past day, and what their goals were until the next meeting. This kept everyone aware of the lab’s activities, help younger members see how things work, and served as a forcing function to get something done everyday. The informal poll at the end of the summer suggested that the model worked, and I’ve kept some version of standup since then.

We also started a paper reading group. My view on most papers is that the core technical idea is pretty simple and going through every minutae isn’t beneficial for everyone in the group. So we focused on picking apart the arguments in the abstract and introduction, sentence by sentence. Once we decided what the real problem in the paper is, then we could assess whether or not the technical contributions and experiments made sense. The goal was to help students develop a taste for good problems, since the emphasis in classes is typically technical details rather than problem selection.

Lessons

From one perspective, Year 1 was a super-fail, in terms of research productivity, teaching, growing a research group, grants, etc. On the other hand, it was pretty fun because I learned so much from the failures. Thankfully, research is a multi-year process and not a 3 month sprint, and my collaborators all have good humor about paper rejections.

--

--