My First Time Using IBM’s Data Science Experience

As a math person, never having written a line of code

--

Editor’s note: This article is part of an occasional series by the 2017 summer interns on the Watson Data Platform developer advocacy team, depicting projects they developed using Bluemix data services, Watson APIs, the IBM Data Science Experience, and more.

For college students, the summer going into your senior year can be indispensable. Many of us try and get internships that will help us get a foot in the door with the company we may want to work for, build our resumes, and (hopefully) lead to a job after graduation. This is my version of the narrative.

How I started

I applied for a summer internship in data science at IBM. Given my background — I study Mathematics and Statistics — I thought this would be a good opportunity to learn if this is a career that matches my interests. After starting the job, I quickly learned I was severely under-prepared, lacking the required computer science skills and experience. While there is no quick way to learn all the ins and outs of coding, there was one tool that made the whole process a little less painful: IBM’s data science environment, the Data Science Experience (DSX).

Without going into too much detail, DSX users can create many notebooks within a project, simplifying collaboration. Notebooks are applications that allow you to create and share live code, visualizations, and documentation.

An irrational fear of computers

My only other prior experience in the world of computer science was a course I took in Java. The course created an irrational fear of coding and using IDEs. A benefit of using DSX was eliminating the need to locally install and maintain such a development environment, and to start new tasks or navigate to previous work — a gargantuan task for those of us new to the tools of computer science. DSX made it easy for me to access my work, look at the work of other teammates on the same project, and have others peer-review my work. I was prepared to make plenty of mistakes, so having the people working with me be able to access my work made overcoming obstacles much smoother. I also didn’t feel as bad about asking for help, since I knew it would take little effort for another developer to review the code in my notebook.

Another added benefit of using DSX as a novice coder is working in a notebook. Starting small is a huge step! Being able to run individual cells without worrying about breaking various parts of the code is a great feature. This is something I struggled with in my Java course. Removing the complexity and clunkiness of the language and focusing on its capability makes programming more efficient and, not to mention, less stressful.

Perhaps one feature of DSX that I found most helpful was being able to choose from multiple languages (Python, R, Scala) to implement my data science projects. Again, having no previous experience with any of these languages, I didn’t know which one to use. Most of the full-time employees I worked with preferred Python, but I had an unconscious intuition to use R (maybe that is why I am a Statistics major). But I was easily able to try both before starting on my projects.

An example notebook I worked on: data engineering for the American Presidency project using R.

Moving forward

As the internship came to a close and the new school year began, I quickly began to see the benefits of my summer at IBM. I am currently enrolled in courses that require the use of SAS and Stata, which are two different statistical software packages. The old me would have seen coding and become scared stiff, but now I have the confidence to tackle coding challenges head on.

I owe great thanks to everyone on the IBM Developer Advocacy Team for their patience working with me and for all they taught me this summer. I also should acknowledge Patrick Titzler, who took the time to meet with me almost daily, guided me through my work, and endured a similar learning curve with me.

--

--