My Journey to Data Science

Today I turned 23. To reflect on a central thread of my recent years, and what I’ve done with my time, I wrote this blog post. Enjoy ☺

Initial Conditions

Since I was a kid, I was possessed by a very singular idea. I thought that human groups could be understood as a very large and complex meta-organism, where each individual was a cell and together constituted multicellular organism. For a long time, I dreamed of being able to “zoom out” to a far enough distance that I could visualize the form of this meta-organism.

Singularity University GSP 2012 — 2nd row, 6th from the left.

After sophomore year of college, I had a chance to spend a summer at Singularity University’s Graduate Studies Program. Singularity University was started by inventor Ray Kurzweil and entrepreneur Peter Diamandis, to gather entrepreneurs and graduate students in a focused examination of cutting edge “exponential” technologies — technologies whose developments followed rapid rates of change. My exposure to these fields led to a decision to change fields. My previous strategy to understand this meta-organism was to study biology, and though I had a chance to work in the exciting field of bioengineering, I realized that it was not abstract enough to answer my question. During the summer at SU, I found myself repeatedly asking the question “What should I be doing in light of these exponential technologies?”

After some reflection, I focused on a common trend that the different technologies shared –which was that they allowed for a new format to interact, manipulate and process information. Data. It was the key to the exponential technologies, as well as the lens through which I could come to visualize and understand the meta-organism that is human society.

Academic Preparation

When I returned to Brown in the fall, I decided to switch tracks, and instead of focusing on biology, to learn the mathematical and computer science techniques to manipulate information. I had some basic statistics and programming experience from engineering, but doubled down, starting the introductory sequence to CS and more advanced classes in statistics. My most formative experience was a sequence of classes taught by Stuart Geman, a legendary professor from Brown in the Applied Math Department, who co-invented a statistical technique called Gibbs Sampling that allowed one to sample from high dimensional distributions. I won’t get into the exact details here, but it sufficed to say that it was extraordinarily useful, and we owe many of our advances in image processing, computational biology, financial modeling and many areas other areas to its application. As a great professors do, Geman not only taught us various methods of statistical analysis, he also taught a framework for thinking and reasoning about the problems, of which these techniques were clever solutions to.

Partially as a result of Prof. Geman’s excellent structure, I came to see that one bottleneck in my learning in other classes was a problem of motivation. While I was excelling academically, I felt uncomfortably disconnected from the material I was learning. Having not faced personally the practical problems that motivated the theoretical constructions, I felt like my knowledge was like a castle built of cards, and with about the same level of stability.

Around this time, I came across an article in Harvard Business Review, by DJ Patil and Thomas Davenport, that described a new role called Data Science. I was surprised to hear about such a thing, which seemed like the perfect application of the math and computer science I was learning. In addition, it also seemed like there was an abundance of “data” problems that existed in the world, and I itched to work on something more tangible than homework problems.

At the end of my junior year, I decided to take a risk, and venture out to Silicon Valley, where I had spent the two previous summers, and try to get some real world experience as a data scientist.

It was not a blind journey with no preparation. I tried to line up, and actually gotten an offer from a startup during the summer, but needs at the company changed before I moved West and it didn’t work out. Disappointed, though undeterred, I packed my bags and boarded the plane anyway.

I arrived in California in September with nothing but two trunks of clothes and a few telephone numbers. The first few weeks, I slept on a living room couch in San Francisco. Thankfully, one of my numbers was a friend who was in the process of starting his own company, and he generously rented me a room in the neighborhood of Palo Alto.

It’s a long road to the top if you want to data science

I’ll be frank — it was hard getting a job as a data scientist. First, the Valley was just beginning to understand the term, so not many companies were explicitly looking for data scientists. Second, of the ones that were, many of the analytics companies dealt a lot more more with the idea of data storage than actual analysis. Finally, to top the structural challenges off, the typical application that the companies had been trained to look was someone who had graduated with a PhD, and here I was not yet done with undergrad.

My coursework gave me the confidence that I was a diamond in the rough though, so I set my standards high. I didn’t want to work for just another standard-analytics–as-a-service company, and wanted to find someplace that was unique. In hindsight, I was looking too closely at the pure analytics companies, and could have also found on companies that were solving a specific problem through data, but at the time, the only companies that seemed interesting to me were Quid, The Recorded Future, and this mysterious place called Ayasdi.

A screenshot from Ayasdi’s software — analyzing geomes of cancer patients

I had heard about Ayasdi from a friend doing a PhD in category theory (a branch of pure mathematics, perhaps the most abstract and pure of the branches), as a company born of out of novel research. The idea seemed fascinating, and to get started I scoured Linkedin to see what common connections I had with the company. To my surprise and delight, one of my former bosses during a previous internship did in fact know some people at the company, and I was invited in for an interview. Unfortunately, a week later, I learned that someone else had been hired for the role, and I was left in dark for next steps.

At this point, I had a choice. I could either give up on Ayasdi, and go look for a place elsewhere, or I could stick it out and try applying again. I was stubborn, and decided that I liked Ayasdi too much as a company (having read their papers and publications online, and been fascinated by their technology) to give up that easily.

Through this job search experience, I came to an important and affirming realization: the importance of grit and persistence.

There is a Valley legend that could have helped me tremendously when I was applying, though I only heard about it after. The story is that when Vinod Khosla was applying to Stanford, he was rejected during the initial admission decisions. Upon receiving the news, Vinod called up the Dean of Admissions, and pleaded his case, but was not able to sway the dean. He called dean every day, before Stanford finally relented and let Khosla in.

I spent the first two months of my new life in Silicon Valley trying to get a job at Ayasdi. My parents were concerned that I was wasting my time, and should consider going back to school, but I convinced them that this experience was the education that I needed. I went back to the office, and talked to some other people for another role. That one didn’t work out either. I had a meeting with the CEO. He was unconvinced that someone with my age and credentials would be able to do something valuable. I presented my arguments for why I would. I showed up so many times at the office that by the end, the engineers by the door knew me by name. After a long effort, I got a chance to demonstrate my knowledge on a weekly long data challenge. I hunkered down and really worked on the data challenge, stopping only for meals and occasionally some sleep. There were parts of it that were familiar and straightforward, and other parts that were more open ended. Surprisingly, (or not), I actually learned quite a bit during the data challenge itself.

That was followed by a full day of interviews, and finally I received an offer in November, 2013. Its been a crazy ride since then, and in another post, I’ll write about the things I’ve learned working as a data scientist. But to drive the point of this story home:

https://www.youtube.com/watch?v=vSOclSkMCNo

(To get a sense of what Ayasdi does, here’s a video of my colleague Michael Woods and I presenting at the NYC Finovate Conference this September).

To Change Is to Risk Rejection

During the process of compiling the Data Science Handbook, we had a chance to have an incredible conversation with DJ Patil, the guy who originally wrote that article that set everything else into motion. This was after I joined into Ayasdi, but every word was something that I lived through, and resonated strongly with.

“The challenge is finding out what you need to do to get that first step. For most people who come from academia, the first step is someone has to take a risk on you. There’s probably a lot of times you have to extend yourself. Nobody just discovers you at a cafe and says “Hey, the way you’re writing on that piece of napkin, you must be smart”! That’s not how it works, you must put yourself in this position where somebody can actually take a risk on you, and then they give you that opportunity.

To do that you must have failed many times, to the point where some people are not willing to take a risk on you. You don’t get your lucky break without seeing a lot of people slamming doors in your face. Also, it’s not like your message is staying the same, your message is changing every time you talk. You are doing data science in that way. You’re iterating the message and you’re trying to figure out what works.

As you think about becoming a data scientist, keep in mind that its not for the faint of heart. Its the beginning of a new era in software and enterprise, and while the opportunity is immense, you pay for the chance by your willingness to face rejection. As DJ so eloquently said, and my experiences testify — in the beginning of things, you need people to give you a chance, and in order to get that chance, you need to be willing to jump across the chasm. Many times, there will be no one to catch you on the other side; you will fall and it will hurt. But inevitably you will find someone also willing to take a chance, and then the real work begins.

It was a winding road that started me in data science, beginning several years before I even knew about the term. Along the way, though, I’ve had the chance to challenge, and learn from those I met along its path. One trend among every data scientist that I talked to for advice and guidance was how each person had to figure out his or her own path, and navigate murky waters before reaching clarity. Each wished that there had been someone guiding them through the journey. That common sentiment, and the frustrations and struggles I encountered along the way, laid the kindle for the Data Science Handbook project.

Interviews and Wisdom for The Road Ahead

I originally began working on the Data Science Handbook with my co-authors Carl Shan and Henry Wang in October. At that time, Carl had just finished creating the beautiful and informative Product Managers Handbook, and Henry was visiting from his stint as Entrepreneur @ Startup Chile. Around the same time, we learned that some of our friends in highly advanced graduate programs were worried about finding a career after graduating. Data Science seemed to be the logical step, but very few of them knew about the field. We talked about our interests in collaborating to help clarify and organize information about this emerging space, and soon after the Data Science Handbook was born.

Along the way, we’ve been honored to add William Chen, Quora guru and Quora data scientist as co-author.

Funnily enough, the book has grown alongside my journey to becoming a more mature data scientist, and been an integral part of my own learning cycle. As different challenges arose on the job, I channeled them as questions for our interviewees. Likewise, lessons learned from our interviews were gems of opportune advice that I could apply the next day.

The cover for our 120 Data Science Questions — can find it at datasciencequestions.com

At the same time, as we talked to more and more people, we began to see that our mission of compiling the Data Science Handbook was bigger than just helping people get in the door. As my own experiences above can testify, that part is a hard challenge and deserves a lot more resources than currently exists, which led us to create the 120 Data Science Interview Questions. But we also saw that the road to learning doesn’t stop with just getting the job – in fact, that is only the beginning. There are challenges along many different dimensions that confront the new data scientist, and we tried consciously to anticipate those needs as well, asking our interviewees questions on building teams, hiring, learning, designing products, and managing data. We drilled down on their own life stories and experiences, recording their paths from academia days to first industrial forays to becoming the leaders and visionaries of industry.

Ultimately, data science is a set of new tools to make sense of the world around us. We hope that through the compilation of this Handbook, we can not only show people how to acquire the tools, but also fruitful directions to use them along, creating a future that is more efficient, more intelligent and more beautiful for all of us.

Show your support

Clapping shows how much you appreciated Max Song’s story.