Data Science Undergrad Journey Part 3

Part3: My Internship & Research Experience

Alison Yuhan Yao
CodeX
12 min readAug 7, 2021

--

Photo by Christin Hume on Unsplash

Intro

My previous two posts addressed the questions of why I chose Data Science as my undergraduate major in the first place and what my DS undergrad program requires.

But to be honest, my decision to major in DS was made over two years ago. I see the whole picture of DS, especially the cons, a little bit clearer every single day in the past three years. I had every opportunity to switch major and stop fulfilling those requirements (many of my friends did that). And as someone who read blogs on Medium all the time, I am also seeing the trend shifting over the years, from “How to Become a Data Scientist in 3 Months” to “Why Data Scientists are Leaving Their Jobs”.

So, what makes me wake up every morning and still choose Data Science?

The answer lies mostly outside of schoolwork. It is my internships and research opportunities that fueled my passion for Data Science.

In this blog, I will talk about my internship and research experience in DS, AI, and tech in general. I will bring in my first 2 posts from time to time and list things chronologically to make sense of how decisions propelled each other. Specifically, let’s talk about without alluding to any names:

  • My motivation for getting into any project
  • What qualified me for these projects
  • What exactly did I work on (without violating NDAs)
  • Lessons I learned

My Internship & Research Experience

I am a big believer in creating my own opportunities. There are countless Kaggle competitions and open-source projects out there that an aspiring Data Scientist can contribute to and learn from. And as my DS knowledge grows, I can also easily come up with fascinating project ideas on my own. I have successfully turned some into reality through course projects and research proposals; the rest of the ideas are stored in my Notion idea bank. So, in a way, I have always felt relatively secure in terms of finding interesting projects to work on. There is no need to panic if you are a beginner.

Because of that, I have never intentionally applied for internships and research positions. These opportunities presented themselves through networking with friends and professors.

But it is also not advisable to overwhelm oneself by trying out everything or jumping on board too quickly. All of the projects I did chose to do went through scrutiny to make sure they aligned with my goal at that time.

Research Assistant in Robotics

I met my research teammates in a hackathon beforehand. The two were the organizers of the hackathon and I was the leader of the winning team, so we had a lot of interaction while working on the project. They must have seen my work ethic and my willingness to learn or whatnot that despite my complete lack of experience in Robotics, they, together with another contestant I met in the hackathon, approached me afterward and convinced me to join their Robotics Club and the Open-Source Swarm Intelligence Research Project.

It was the second semester of my freshman year and I was on the lookout for any fun and exciting opportunity to work closely with professors. I was drawn to the topic of Robotics and AI drawn instinctively. I already decided to declare a major in tech, but I wasn’t quite sure which one to choose (the story of how I chose my major can be found here), so this research project was also another way to find out what niche I am interested in.

Then, I dedicated almost 9 months of my Friday nights and two-thirds of my summer vacation working on building a cheaper, scalable, and decentralized Swarm Intelligence robot called Swarmesh. I learned C++ and Fusion 360 from scratch in an insanely short time and managed to start implementing them in our research. There was a lot of 3D printing and welding. There were countless tests and reruns, but it was overall exciting. Every time a new experiment commences, you would feel all the hope and expectation. And sometimes, they come true!

We kept track of all of our research progress online (our logs here). We took turns writing logs, so in a way, that was when I first started writing blogs. We published a paper and went to Okinawa, Japan to present our research outcome. It was the first taste for a young undergrad student of what research is like in academia and it was impeccable in many aspects.

However, I later left the research project of my own volition at the end of the first semester of sophomore year, mostly because I was more into the AI algorithms and ML models than the hardware part. This revelation also helped me declare my major in Data Science with a concentration in AI. But the process of building something from the ground up is massively enjoyable. Having a professor to guide you along the way is also a rare learning opportunity. Those who stayed continued to do excellent work after I left.

Software Developer & All Titles @ Startup

In the second semester of my sophomore year, the same friend at the Robotics research group asked me if I wanted to help him at his data-driven startup. I was extremely intrigued by the idea of being a part of a startup. I’ve studied entrepreneurship before and met many entrepreneurs in my life. Plus, tech and innovation go hand-in-hand. For someone who loves building things related to DS and tech, how could I say no?

But I needed to start over. Again.

I knew next to nothing about frontend development. I knew absolutely nothing about the backend service we use. So I learned everything by watching YouTube tutorials and reading blogs and documentations. Starting from the basics, I learned HTML, CSS, and Javascript. On top of that, I familiarized myself with frontend libraries/frameworks, backend NoSQL and the Agile process. I am now comfortable with learning on the fly and implement new tricks right away.

Being in a startup means so much more than just tech. I have also been a salesperson at a startup conference pitching our vision to everyone else; I have interviewed and hired new recruits; I have reached out to loads of people for networking purposes. Overall, I think I am just trying my best to be a young but innovative and hopefully disruptive entrepreneur. Like the other members of our startup, I wear many hats and pitch in whenever possible. I just realized that I have worked in this position for 17 months as of now. It has been a rollercoaster experience, but I know we are pivoting toward the right direction.

Coming back to the technical side of things, my understanding of Data Science developed through a comparison with software development. Software development is fundamentally incompatible with DS in that it is executing a command. If you can replicate the exact Figma or Framer design you are given on the frontend, you are considered a good developer. DS, on the other hand, is more about figuring out what command to give yourself. When you are given a dataset, you do not necessarily know what you can find. Therefore, DS is more open-ended than software development because data possess a myriad of interpretations. What to execute is far more important and meaningful than how to execute.

AI Intern @ IT Company

I got the job working as an AI Intern at an IT company through networking. The company happens to have a large customer base in Japan, so I figured what caught their eyes must have been the fact that my Robotics research led to an academic presentation in Japan. Japan gave some validation. And I just took the Machine Learning course, so Deep Learning was fresh in my mind. After interviews, I started working on Computer Vision all through my summer vacation.

My work included Optical Character Recognition (OCR), face recognition, pose detection, and a lot of model testing. My role as an intern was kind of like a scout. I was always the first one in the team to test out feasibility when a new project is under negotiation. That is, if I failed to find a feasible way to carry out the project or if I tried every way and the accuracy didn’t reach the industrial standard, our clients would be unlikely to sign the deal. It was a lot operating under the ‘make or break’ pressure, but it compelled me to climb a steep learning curve. 4 projects in the 3 months I worked for them followed through.

Being able to apply what I learned in class to real business scenarios yields an immense sense of accomplishment. Although people like to distinguish between AI and DS, I think the two fields are overlapping a lot. A Data Scientist seems to be destined to encounter AI sooner or later in his or her career (usually later). It was a little odd that my first DS job started in AI instead of something like classical Machine Learning, but it was an excellent starting point that confirmed my interest and capability in tech.

Research Assistant in History

History? Yes. That is right. A close friend of mine was working in the research team of a history professor who utilizes technology to digitalize historical documents. For example, they used OCR to recognize English letters in dated fonts and used data cleaning techniques to check the integrity of these digitalized documents. The sheer idea of integrating history and technology was fascinating enough for me to sign up. My friend kindly made an introduction and next thing you know, I went back to computer vision and Data Science.

Unfortunately, my schedule did not allow me to continue once the new semester started. The amazing research is an ongoing process that is being enriched every semester. And it is still a great reminder for me that Data Science is such an all-encompassing field that connects the dot of history and technology.

Data Science Intern @ E-commerce Company

I took a break from internships and research and shifted my attention to our startup during the entire first semester of my junior year. Because of COVID-19, I could not participate in our exchange program and go to our campus in another country to study and travel, so I “exchanged online” during the second semester of my junior year to make up for it.

I managed to land a Data Science Internship through the network of a professor. She was extremely helpful and managed to arrange a meeting for me with the CEO of an e-commerce company and their in-house Data Scientist who later became my supervisor and mentor.

As for what I was working on, I learned Power BI to do data visualization and worked with clustering and XGBoost models most of the time. We designed a visualization report to analyze webpage performances for the frontend team. Since I am a software developer as well, it was educational for me to see how product development and data analysis go full circle. We also built a Machine Learning model for Customer Relationship Management (CRM). We built a consumer segmentation & prediction model for the marketing team. These analytics skills are of course transferable for our startup in later stages.

It was my first time ever working with a DS expert in the industry and he happened to be the one and only Data Scientist who’s great at all the data jobs in the company. His photographic memory of the complicated database and insightful interpretations of statistical concepts and ML models blew me away. I learned so much just by observing how he approaches a problem. The way he communicated with me and presented his ideas showed years of work polishing his storytelling skills. Although my 4-month internship was remote (because I was “exchanging online”), I certainly found a perfect example of the kind of Data Scientist I aspire to become.

Undergraduate Student Researcher in Optimization

Recently, I am back to work in academia again.

The story goes back to a year ago. Our final project idea for the class Intro to Optimization and Mathematical Programming coincided with a problem that my professor had intended to investigate for a long time. After the class ended, the professor suggested that we took the problem a step further and research more. Now that we finally secured some funding and three months of uninterrupted time, we can finally bring our premises closer to the real-life situation and solve the problem on a higher level.

I formulated a real-life vehicle scheduling problem into 4 variations of spatio-temporal networks and implemented Genetic Algorithm in Python to reduce costs for the school by up to 20% while ensuring the students’ demands are met. I am documenting this project in the form of a blog series as you are reading this.

It was really empowering to be able to solve a problem that matters a lot to me and my fellow students. This is exactly the kind of impact that I am looking for.

Lessons Learned

  • It is the questions you investigate that matter, not the tools.

I never had a preference for the industries or the academia, because all the projects I have worked on are a perfect combination of research and application. As long as the problem is intellectually stimulating and worth solving, I will try every solution possible to solve it. Even if it needs the efforts to learn an entirely new programming language or a new algorithm, I would happily devote my time. One thing I noticed is that some DS students are solving problems based on the models they want to practice, instead of how interesting the problem is. I deem this approach unsustainable because, in the end, models are just tools. It is your passion for the questions and pains that matters.

  • DS ∩ Entrepreneurship ∩ Research = Curiosity

It may seem that I did various jobs in different fields, but I see that they are the same at their core. It’s probably easier to understand that Data Science and research are investigative in nature. Curiosity drives them. Surprisingly, entrepreneurship is the same. Building a company is like a large-scale experiment, but with a lot of money involved. In the beginning, one needs to interview potential customers to test out hypotheses before there even is a company. As the company grows, one needs to experiment with business models. When the company matures, the experiment may shift to building a more creative company culture. The feedback loop of this experiment may be excruciatingly long (much longer than machine learning training for sure!), but the potential reward can be unimaginable. It all comes down to whether you have the curiosity to see it through.

  • You are more valuable than you think.

Everyone goes to work with more skills than required in the job description. Your identity, your culture, your experience, and even the different languages you speak give you an edge. It is because I am a Chinese native that I am capable of reading between the lines of what Chinese investors say. I can confidently say that the data indicating almost no Chinese uses Apple Pay is 100% correct, despite Apple Pay’s dominance in other countries. Who you are is more important than the skills you know. You are more valuable than you think, even if you are just an intern.

  • Female Representation

As a woman in tech, I often find myself the only girl on the team. I was the only girl in the Robotics research team for 6 months until others came along; I was the only female software developer in our startup for an entire year before another excellent female developer joined us. The professors and managers I have worked with are 95% men. So, sometimes, it does feel frustrating when I cannot find a single female role model who works in tech in real life. But the great thing I noticed is that more girls than boys are reaching out to me and asking about Data Science. More girls want to code and are indeed talented at programming. So, if you are a fellow female aspiring Data Scientist, I am here with you.

Final Words

This is the end of this series, where 3 years of my undergrad DS journey are summed up in 3 blogs. Steve Jobs famously said:

“You can’t connect the dots looking forward; you can only connect them looking backwards. You have to trust that the dots will somehow connect in your future.”

There you go, dots connected! There is still one year left and I am very much looking forward to it. The DS journey never stops.

Thank you for reading my blog! I hope you find it helpful.

My Github: https://github.com/AlisonYao

My Kaggle: https://www.kaggle.com/alisonyao

--

--