Databricks and Node-RED

Quick First Impressions

Sam Bell
Analytics Vidhya
4 min readSep 28, 2019

--

This past week I took the opportunity to attend a few workshops that featured products that facilitate quicker and more productive work for Data Science. I will be briefly highlighting AWS’s Databricks and IBM’s Node-RED to give the reader a peek at what tools are available to them and provide my first impressions.

Databricks

Databricks is essentially a pumped up version of Jupyter Notebooks that allows the user to do a lot with less effort.

Databricks was designed by the same team that created Apache Spark. “The Databricks Workspace is a notebook-based collaborative environment capable of running all analytic processes in one place, so you can build data pipelines, train and productionize machine learning models, and share insights to the business all from the same environment.” It was pretty surprising how easily detailed visualizations could be created with Databricks. I still like to create interactive visualizations, so I would still like use Bokeh or even D3, but this is good for simple and quick visualizations. The feature they seemed most proud of was that every version of your table history is saved. You can callback previous versions of your table by referencing an index number that is created automatically.

If you would like to give it a test-run, check out their Getting Started Guide. You can try out their community edition for free and there is a primer workshop Delta Lake and a workshop that implements machine learning.

User reviews for Databricks is very positive. Gartner has 89 ratings and 81 verified reviews. The result is an average 4.5/5 stars. The majority of these reviews come from North American finance businesses that are worth less than $50mil. The most frequent reviewers work in data and analytics. Most reviews praise the functionality of the features, but criticize the lack of sources of support and slow response time for targeted queries. A user who works in education commented, “the only dealbreaker was the cost and contract negotiation. We felt a bit bait-and-switched.”

Gartner

A recent positive review from G2 describes Databricks as “it’s like a Jupyter notebook but a lot more powerful and flexible. You can easily switch from Python to SQL to Scala from one cell to the next. With the Spark framework, you can preview your data processing tasks without having to build large intermediate tables.” The average rating on G2 is 4/5 stars with 13 reviews.

https://www.g2.com/products/databricks/features

Trustradius has 16 user reviews with an average score of 4.5/5 stars and a “trscore” of 8.8 out of 10.

https://www.trustradius.com/products/databricks/reviews#3

Databricks seems like a really cool tool to use for some quick machine learning diagnostics. I would be happy to use it for some personal projects with its community version, but there should be some caution and thought placed in an investment decision. A business should understand what exactly they are paying for and what they will be getting.

Node-RED

Node-RED had a couple meetups this week. Monday, September 23, 2019 had a pretty informal tutorial to get users introduced to the basics, and the following Wednesday provided an interesting application analyzing Twitter.

Analyzing Twitter with Node-RED Pipeline

Based off of dataflow programming, Node-RED aims for users to easily create a data pipeline while utilizing a suite of various nodes that can do anything from speech to text, translations, to sentiment analysis. This program is pretty easy to get started with. Check out Pooja Mistry’s github for a great introduction and tutorial on how to get started with Node-RED. I really believe there is a lot of potential with Node-RED as it takes a lot of the heavy lifting out of machine-learning. A lot of the components are easy to use and effective. You can even create basic dashboards with the implementation of the IoT.

Dashboard Template
Power Usage Dashboard

There are so many workshops and new tools being introduced all the time that it can be overwhelming and difficult to keep up with. I am always keeping an eye out for any additional cool things I have the opportunity to learn about. Thankfully, NYC has no shortage of these kinds of opportunities.

--

--

Sam Bell
Analytics Vidhya

Data Scientist with a penchant for Interactive Visualizations