How to Make Your Summer Productive

COVID-19 has imposed challenging effects on college students, upending summer internships and jeopardizing job offers. In adjusting to such an erratic environment, DS3, in collaboration with Tech San Diego, has hosted a panel joining Data Scientists from Kiran Analytics, Tandem Diabetes, and Lytx, to offer advice on projects and career prospects.

The event is divided into four sections: Career Advising/ Job Hunting, On the Job, Tools and Libraries, and General Advice. The following is a portfolio of the speakers and their respective companies:

Kiran Analytics: specializes in improving financial institutions’ performance through advanced analytics

  1. Aruna Rajasekhar — Senior Data Analytics Manager
  2. Jeff Ernst — Analytics Consultant
  3. John Tsai — Analytics Consultant

Tandem Diabetes: develops insulin pumps to better diabetes management

Alexandra Constantin — Data Science Director

Lytx: a video telematics company dedicated to helping fleet managers improve driver safety

Jesse Daniels — Machine Learning Scientist

UCSD:

Stephanie Labou — Data Science Librarian

Career Advising/ Job Hunting

Despite their different backgrounds, all the panelists encountered a wealth of data in their respective tracks. Most were completing their undergrad in the pre-data science era, and came to a gradual understanding of data’s imperative value. Regardless of what they were pursuing, from either astrophysics to marine biology, the theme of data’s versatility and abundance was a regular pop-up as big data began to boom.

In acknowledging this versatility and its tools, the panelists from Kiran Analytics insist on keeping an open mind and not solely focus on building models — which has been grossly overhyped. Instead, make an effort to be a well-rounded applicant by trying out different tasks: such as automation, visualization, data analysis, etc. Jeff makes a point of data being its own field; after learning the needed tools and methods, it’ll be easier to start projects in data-oriented domains. According to Alexandra, finding a plan that overlaps with your ambitions requires some “soul searching” and “self-discovery.” She offers advice on how to expose yourself to Data Science and potential motivations, including getting an internship/job, offering to help on a project, or utilizing Open Source Courses.

Quarantine is a perfect time for such “soul searching.” To start a project, Stephanie voices a similar message to Alexandra’s, as the first step to doing any project is self-reflection. The internet provides an extensive hub of data, so resources are readily available. Finishing a project from start to finish is impressionable to employers, so commit yourself to a well-defined plan instead of a laundry list.

Finally, employers value a good story: how will you help benefit that company? It’s great to have a bunch of packages on your resume, but do you know how to implement them? You are not expected to learn everything, but you should know how to demonstrate resourcefulness. Be mindful of the prerequisites your employer is looking for and diversify your resume with both soft and hard skills.

On the Job

What makes a good data scientist, and what aspects of the job are overlooked?

Jessie emphasizes that an end-to-end mindset is essential: understanding what business problem you’re trying to solve, and knowing whether you are applying the appropriate metrics to solve such problems. To do so, you need to gather data — a difficult, yet highly overlooked aspect. Remember, better data = more natural modeling. While newer candidates prefer to focus on modeling pieces, it’s important to systematically solve a problem through any efficient means.

In personally working with clients, the panelists stress on adaptability. Every client is unique: different presentation tools are needed per client’s comfort. John gives the following example in understanding this needed pliability:

“How would you describe a linear equation to a grocery store clerk?”

This brings an emphasis on soft skills. These are learned skills: you need to simplify technical details to a less technical-understanding audience for producing a minimum viable product. Practice presenting to your peers as well as science writing to develop these soft skills.

Tools and Libraries

As a Data Scientist, you are well aware of the tools we need to be proficient in. Python and R are indispensable to learn, however, it would help to know the domain you want to work in — as the priorities slightly differ. The medical field is exact with the type of data collected, and necessitates extensive preprocessing and exploration. Other areas prioritize computing time and efficiency in handling the data and model development. Maybe you want to work strictly with deploying models and managing the version control system.

The panelists mention two primary sources for tool proficiency: DataCamp and the UCSD Library. Stephanie has worked closely with many Data Science students and understands both the professional and novice perspectives. She is an ideal audience member if you want to pitch your Data Science project or research findings. The UCSD Library is also a datastore: you have access to numerous datasets that are formatted in a tabular and structured manner. Many students utilize its geospatial and cloud-computing platforms, which are available remotely during the COVID-19 pandemic. Lastly, machine learning books are free to access here.

Here is a quick rundown of the languages and platforms the panelists advise to learn:

  • Python: Most widely-used programming language in Data Science, seamless quantitative analysis, visualization, model building
  • R: Best for applying hardcore statistics. Alex notes that some regression techniques employed in R are not available anywhere else.
  • SQL: database management. Maintaining an efficient and speedy workflow and runtime is essential when using this language.

John says that you should make your queries take “ten seconds, five seconds, then two seconds.”

  • Visualization frameworks: ggplot for R, Plotly, are few among the array of means to visualize your data.
  • Pandas: Data analysis and manipulation in Python
  • Big data tools and data warehouse infrastructures: Spark, Hive, Hadoop
  • Git: version management and organization
  • MATLAB: a language for signal processing, and testing and prototyping algorithms and solutions
  • Data storage: Amazon AWS, GCP and Microsoft Azure “would all be a plus,” says Alex.

General Advice

The panel concludes with bits of universal advice for aspiring Data Scientists.

Networking

The panelists are very emphatic about refining your networking skills. Stephanie remarks that UCSD students should exercise this both on campus and the outside industry. Although the resources administered on campus aren’t currently accessible due to the pandemic, you can develop a grasp on what you’re looking for by browsing job boards. Take notes on the roles, responsibilities, and qualifications a certain job or internship entails. If your skillset doesn’t check all the boxes, see what you can do to ensure that.

When aiming for a job, your expertise should adhere to the following framework: be exceptional in a few skills and tools, and have some experience in plenty of others. Being a generalist with a select number of specializations shows excellent potential.

Lastly, when it comes to networking, you must take initiative. It is your responsibility to initiate the conversations and to develop the connections.

Always Learning About Everything, Including Yourself

According to Stephanie, the notion, “I have a chance to learn something new,” is more effective than “I don’t know this.”

The Data Science field has many career options for you to explore. In deciding which career to pursue, think about what Alex states: “When you find something that you can happily spend all of your time on and be really engaged in, it really makes a huge difference” … “When you do find that one area where everything falls into place, you just know it.”

Co-written by Camille Dunning

--

--