Why you should limit your dependencies when sharing files: the example of plotly

Laurae
Data Science & Design
2 min readNov 13, 2016

Laurae: This post is about why dependencies must be limited as much as possible when you share files. It does not mean you must not use them, but you must check whether the users you are sending the files to can open and read properly those files. It also highlights the black line between “altruist” and “commercial” companies. The post was originally at Kaggle.

BenjaminLott wrote:

Is there a reason why ggplotly isn’t used more often?

Personal opinion:

  • Slow when you have many of them (especially when you have millions of points)
  • Adds a “massive” amount of file size to outputs (what could be done for instance using only 0.1MB using ggplot2 could take 10MB or more in ggplotly sometimes)
  • For the online version: need to pay for “appropriate” plotting (not counting you have Internet dependency from that point), but if you share code you have to strip your keys each time which is a hassle (imagine if you have a script generating scripts and you must strip all of them 1 by 1)
  • Not always usable with other gg -types package plugins (stuff can be broken)
  • Browser reliant (I’ve seen ggplotly broken in many different browsers, let alone outdated business workstations still on Windows XP): it’s better to have a static plot which works anywhere than an interactive plot silently breaking on specific workstation setups
  • Function call: if you use knitr with custom width/height, you again have to specify width/height, when you didn’t have to (imagine when you have many different width/height to handles)
  • Custom CSS: when you have a custom CSS for sizing (in knitr options, not in plotly), plotly offline plots can appear distorted for some (unexplained) reason
  • Interactivity is good, but usually you only need that for very specific points you could do it by yourself by adding labels (can be good for exploratory data analysis when you aggregated things though)

BenjaminLott wrote:

Do people not know about it or do they not like the interactivity?

Still personal opinion: I think the former, but there is another reason (which I confirmed during this year with my students). For instance in my case, although I knew the plotly website for a while, I only found it usable offlinerecently by using a R (and Python) package (plotly documentation seems to force the user to use the online version — hence the reason I/we avoided it). Here is an example where someone gets lost because he/she did not know it was usable offline.

A third reason: when (self-contained) plotly HTML files are hosted on a remote server, sometimes it silently breaks plots (when locally it works perfectly). Simple reason: hidden dependencies.

--

--