Data Analysis using Ruby/JS or: How I learned through Google that it’s really Python vs. R — Part 3,840,980,923…

Kenny Hearn
Jul 28, 2017 · 3 min read

I’m a sports nerd. And being a sports nerd, what’s nerdier than sports analytics? (answer: nothing) And every time I mention analytics, only two languages are ever talked about: Python and R.

Why is that? Why isn’t Ruby useful for data? Doesn’t JavaScript have D3, a math heavy, data visualization library? Throughout our time at Flatiron, I’ve seen and created features that did some light data analysis, so why hasn’t someone who actually knows what they are doing continued scaling the tools necessary for industry-level analysis?

Well, through “research”, it seems to me that there are some technical reasons given, though I didn’t see much (meta) data analytics on why data analytics is better on Python/R vs. Ruby/JS. Here are the reasons listed out on sites like GitHub, Quora, and Reddit:

  • Speed (specifically, CPU and RAM optimization)
  • Most data analysis tools available in Python are written in a way that provides good performance; i.e. it’s developed in lower level languages like C++ or Fortran
  • Reduced “coding effort” (specifically for R; a single R expression may replace dozens of Javascript or Python lines, which are mainly compiled in native code instead of imported libraries)

However, the primary reasons for which languages are used seems to be more related to a larger, industry zeitgeist-esque choice: more people choose Python/R because people originally chose Python/R. It’s the programming equivalent to the chicken and the egg, and it’s incredibly reinforcing. The more companies that use Python/R for analytics, the more useful Python/R will become.


Skiing vs. Snowboarding

So since it seems to be mainly a choice between Python and R (with plenty of honorable mentions like Java, Scala, Go, etc), where do I go from here? Apparently, the answer is wherever the hell I want.

Across multiple websites, I saw the summary that R is “a language for statisticians built by statisticians.” If what I need is an esoteric statistical model for your calculations, R is the language I should be learning. However, it’s very specific to data. As a newcomer to the programming world (with limited MATLAB exposure in college), I should probably be learning something that has a wider range of use cases.

‘If R is a neurotic, loveable geek, Python is its easygoing, flexible cousin. ’— InfoWorld artcile

Enter Python. From what I can tell, performing data analysis tasks in either language is becoming more and more similar every day. While R has an edge in natively handling statistics and visualization, the Python community is constantly porting over the tools that make R so useful. Python library support, documentation, and community mindshare seem to be comparable with any of the best languages. Additionally, most organizations specializing in machine learning seem to be choosing Python for their primary codebase.

Add in the fact that Python is a common, versatile, object-oriented programming language that is used throughout multiple industries, and I think I’ve found my choice.

The best answer I found on Quora summarized the current environment around Python right now. “It’s a fast-moving train, and if you don’t board now, it’ll fly by while you’re fitting your hammer to a screw.”

Cool. I’ve got some learning to do.