A Q&A with a Data Scientist
Weighing different options and finding the right path. Featuring Cindy Lai.
The student who needed help said the following:
My career goal is to ultimately work in tech sector. As of now, I would love to work as either as a software developer or data scientist but am open to other positions. I’m really interested in software development and building impactful web application but I also have a curiosity in big data would love to get involved in machine learning and data science. I would love to be able to work at Facebook, Amazon, Google, Netflix, or even Apple following college graduation. Eventually, I would love to transition into Product Management position.
I am an incoming UCLA transfer student majoring in Statistics and have been conflicted on if Statistics would be a good major considering I want to work in tech, specifically software/data science. I keep hearing how its not your major but rather your experience that determines your career path. However, I feel as if most of the people who are working in tech all have computer science degrees. At UCLA, there are two computer science related majors I can potentially switch into, Ling/CS and Math/CS, but doing so may require me to stay a 5th year.
I’m currently attending a coding bootcamp called Horizons and have taken multiple CS classes. I’ve listed some relevant CS courses I’ve taken on my LinkedIn profile.
Here’s how Cindy Lai — the mentor I matched them with — answered the questions. She is going to attend grad school for Data Science at UCLA this Fall. She has worked at Lawrence Livermore National Laboratory and been a Researcher at UC Davis. In her time at UC Davis, she co-founded iidata is the first, student-run Data Science convention at the collegiate level.
I wanted to ask if you could elaborate on why/how you choose to pursue grad school, specifically in Computer Science.
I chose to pursue grad school for a variety of reasons.
The first being that I felt I was not done learning. As I got more interested in data science, I realized which parts of it really intrigued me (like machine learning). With only a Statistics undergraduate degree, I felt I was not equipped with enough skills or knowledge to pursue the really interesting problems. I personally wanted to continue in doing research in this field, and I believed grad school could provide me with that opportunity.
Another reason why I decided to go to grad school is because the types of jobs I wanted generally preferred or required a graduate degree. I did not want to do data analytics, which was generally the type of job I would be suited for after completing an undergraduate education. I think the main thing about deciding to go to graduate school is that you want to do it for yourself, that you personally really want to go and for the right reasons.
I have talked to multiple professors about graduate school, and the key things I have picked up on are to not go simply because you don’t have any idea what you want to do after graduating, and graduated school is a convenient option. If that’s the case, I recommend doing an internship (if it’s before your last year) so you get a better idea of what you like (and what you don’t like). If it is your last year, then I would most likely still recommend finding a job first in a field that aligns with your current skill set. That way you can see how the things you learned in school operate in industry.
It’s hard to know what you want to do in life until you try a few things to know what you don’t want to do. Dr. Duncan Temple Lang at Davis told me that a good way to know if you want to do grad school is while you’re working, you see the types of problems the people with a graduate degree are doing, and you want to do them as well. If you’re satisfied with what you’re doing now, then no need to go that extra step. Also one thing you generally write about in your application to graduate school is what research interests you have. If you don’t have any idea of what fields interest you and more specifically, what topics within that field, then I would take advantage of your undergrad to explore that first, or read up on research in a variety of fields, do research in undergrad, or work first. If you do have a good idea of what excites you, then that may be a good sign.
I chose computer science because of the university, the job prospects, and because it personally appealed to me more. I applied to statistics, data science/ML, and computer science programs (in total, around 11 applications). UCLA was the only California school, and weather/environment was important to me, along with a few other reasons why I preferred California. My final decision was a juggle between Washington for Statistics and UCLA for CS.
I went to visit some campuses over spring break, and the things that dissuaded me from Washington was — I talked with a faculty member there and they said research as a master’s student is highly unlikely since you’re there for not as long time. — the Statistics building seemed a bit cold and unwelcoming — The courses seemed very intense in Statistics. If you love statistics, this would be perfect for you, but honestly, in senior year, I was having doubts whether I wanted to pursue statistics even deeper. It is my own opinion that statistics did not interest me as much anymore, and I am not implying that it is not a wonderful program nor that statistics is inferior in any way. I just saw so much cooler research related to my interests being done in CS.
Even though there were quite a few things that I did not like about Washington, I did appreciate how much smaller their program was (about 30 master’s students enrolled per year as opposed to 100+ at UCLA), and Seattle was a great hub for tech jobs.
What was your thought process behind such a decision and what is your current career trajectory?
I talked a lot (maybe too much) about the first half, but in regards to the second, I have settled down onto one specific career and am honestly open to different types of positions. Ideally, I would want to go into more research-oriented positions connected to industry (like at FAIR, Microsoft Research, etc.), but usually those require a PhD (as I’ve heard).
In the meantime, I can see myself going towards a Data Scientist type of position. However since I am going into a CS master’s, I wouldn’t mind going more to the software side of things. I’m glad my field for graduate school is different, so that can broaden my career options. And honestly, a data scientist at one company is probably quite different than what you would find in another company. The term “data science” is highly saturated in job listings nowadays, and it is best to cipher through the job description to see if what you’ll be doing mostly falls into the realm of what you hope to be doing. And I would talk to the interviewer for more clarification on the job, since sometimes the job listing doesn’t align with reality.
Are Data Science interviews similar to SWE interviews?
Depends on the company and depends on what stage in the interview process. The questions will also depend on the rigor of the position and what type of background they’re expecting from candidates. I would say for some more finance-related ones, they tend to ask more probability/statistics questions. For quite a few, the first/second round is either a hackerrank coding question or a technical phone interview. However, the questions do diverge at some point.
You’ll get questions on SQL, probability distributions, machine learning concepts (ex: “define cross validation”, “what is word2vec”), dataset scenarios, etc. I’ve had companies give me dummy datasets and to either answer a broad question or submit whatever analysis I think was most insightful. However, in my experience, more of the questions pertain to CS than to statistics. For example, I’ve gotten quite a few Big-O, data structures, UNIX commands, algorithms, programming languages questions as opposed to statistics questions.
What resources should I use to prep for such interviews?
I would recommend the good ol’ Cracking the Coding Interview book and leetcode to get you grounded on CS fundamentals. For more data science-related ones, I found that I’m able to answer better with the more experience I’ve gone through.
This has been mostly through projects in classes, internships, self-initiated projects, hackathons, etc. It’s sort of vague, but I would say to go and expose yourself to more opportunities to explore data. Another way is to take more related classes in school (but I would not rely on this, because you should never assume you can get a position solely with an undergrad degree). Some classes I took at Davis that helped me were the STA 141 series (computational data science series), STA 131 series (math stats and probability), ECS 171 (ML), ECS 170 (AI), ECS 174 (Computer Vision), ECS 165A (Databases), ECS 289 (Deep Learning), ECS 122A, (Algorithms).
Even if you can’t find these exact classes at UCLA, maybe try finding whatever similar courses there are. If you’re at a point where you haven’t taken many courses and feel like you don’t have enough skills to start your own project, I recommend taking some online bootcamps. I personally used dataquest.io to study before I started the STA 141 series. It equipped me with a lot of skills on how to do data analysis in Python.
I also read “Introduction to Statistical Learning and Applications in R” (free pdf online) as my first dive into machine learning. If you’re familiar enough, there’s always Kaggle for datasets and competitions. Tangentially related, but once you do these projects, I highly recommend putting them on your resume!
Would majoring in Statistics at UCLA put me at a major disadvantage and if so, would taking a 5th year to major in Ling/CS or Math/CS be worthwhile in the long run?
If I choose to stay in Statistics, what should I do to better my chances of landing an SWE/Data Science internship?
Speaking honestly and from my own experience, sometimes companies will think more favorably of you if you have a CS (or related) major. Some positions say “CS or related majors”. And I do somewhat see where they’re coming from. A lot of the data science/tech work you would do while still in undergrad requires a noticeable amount of programming expertise and proficiency. And unless you’re in a higher position, you could use ML algorithms for data sets without knowing all the theory that goes behind them.
Not saying that’s necessarily a good thing nor good practice, but I’ve seen this happen (i.e. throwing neural nets at every dataset they see). Another thing to note is that the exciting work you would be doing with a statistics background would be more apparent if you go to graduate school. The type of education you get for graduate-level statistics vs. undergraduate is quite substantial. Looking straight at the facts of a Statistics undergrad vs. CS undergrad at Davis, the stats undergrad would most likely be suited for positions like “data analyst” whereas a CS background would gravitate to “machine learning/software developer” positions.
I would also like to make a point about the difference in the types of machine learning tasks you would do based on your background. If it’s an elementary position, machine learning to you could look like
from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors=5) classifier.fit(X_train, y_train)
The knn classifier is a small piece that would be incorporated into a larger piece of software. A graduate machine learning position would consist of reading academic papers to be aware of the current state-of-the-art and configuring a custom model or algorithm to solve a broad problem at the company. One is not superior to the other; it depends on what sounds interesting to you and how far into academia you want to go.
Sorry for the long sidetrack, but to answer the first part of your question, no. It wouldn’t put you at a major disadvantage, but you may experience a couple setbacks (i.e. if a company uses a resume parser and they are detecting for CS majors), but you will also be able to tackle other types of positions. For example, actuary positions rely significantly on probability, and that’s something a statistics degree would prepare you well for and is not something easy to pick up for other majors.
In addition, it’s not necessarily that a CS major will grant you so much more. Recruiters are just trying to find what quality of code do you write and what programming experience do you have. Just listing a CS degree on your resume doesn’t guarantee you anything to a recruiter. They’ll ask about internships and projects you’ve listed to gain further insight. If you do plan to aim for tech positions and data science, you will just have to take CS classes on the side or study yourself to be familiar with the fundamentals. And as you’ve cited, it does boil down more to the experiences you have. A major can only mean so much. Generally, companies are looking beyond what degree you’re leaving undergrad with.
If you’re debating whether to switch majors, I would honestly ask yourself what types of positions you want to go into and what subject excites you more. You’ll be much more engaged and will retain the information better when you leave school. An advantage of taking a 5th year is one more year for an internship or research position. I would consider a 5th year if you feel like you won’t be ready enough to leave undergrad in 4 years to go after the positions you strive for (also assuming no graduate school) and that the classes you would be taking in that additional year will actually benefit you more.
One reason why I didn’t major in CS at Davis was because the Statistical Data Science track listed out a lot of interesting classes I was planning on taking anyway, and with a CS minor, I could take whichever CS upper-division classes I wanted to instead of arbitrary requirements. And even if you don’t have enough time for a minor, just taking courses helps a lot. I wouldn’t get so caught up about having it plastered on a transcript as opposed to just taking classes for the knowledge.
The last part of your question is mostly addressed in the long essays of the previous questions (lol), but TL;DR (in no particular order);
- Hackathons / personal projects
- Useful classes
- Any experiences to expose you to more programming
These are some workshops I’ve held, and I have attached my Google Slides.
We are so glad Cindy was able to provide our student such helpful, comprehensive and personalized information that touched upon unique challenges and options towards deciding whether one wants to go to grad school and how to prepare for technical interviews.
The great thing about GetCareerAnswers is that it provides personalized career guidance towards your unique challenges and background. If you’re a student or someone who needs career guidance, sign-up for GetCareerAnswers today. Similarly, if you’re a knowledgeable professional who can mentor to help out, sign-up too!