Jupyter Notebooks with Teradata Vantage

An introduction to Vantage Modules for Jupyter

Samuel Martinez
Teradata
5 min readJan 24, 2023

--

Jupyter with Teradata extensions

This article is intended to give a brief introduction to Jupyter notebooks, and slowly make a turn into showing their relationship with Teradata Vantage. It will not be a detailed installation guide for Jupyter Notebook, however, the how-to pages will be linked and referenced.

This article is more focused on creating awareness of the Jupyter Notebooks, their history, their capabilities, and most importantly, the benefits that they are creating for both tech and non-tech users.

What is an IDE?

IDE, which stands for Integrated Development Environment, is an application that helps programmers to code. The IDEs contain the rules that a specific code should follow to help the programmer to be more efficient. IDEs also help programmers to build executables, debug and collaborate with other coders. Some examples of very popular IDEs include VSCode, Intellij, Eclipse, XCode, Jupyter, etc.

What is a Jupyter Notebook?

It is a “document” generated from the open-source Jupyter Notebook App project(a very popular IDE used by data scientists). This document can contain code and rich text elements coexisting. Jupyter notebooks support many of the popular programming languages loved by data scientists, such as Python, R, and Julia. The idea behind these documents is to have a readable element that can self-explain an analysis rationale and walk you through the thought process of the data analyst.

The notebooks can be executed either from a desktop application or via a web browser. The most important feature of notebooks is that it can make the content rich, they can contain figures, tables, and executable documents to support the logic of the analysis. Jupyter notebooks also have multiple kernels, which allows them to read a different kinds of codes.

Jupyter Notebook example

Jupyter Notebook trends

According to Google trends, the interest in Jupyter notebooks has been growing almost as fast as the interest in Data Science itself as we can see in the following exhibit.

Red: Data Science, Blue: Jupyter Notebook

The graph alone doesn’t tell you much, but if you cross it with the next graphic comparing the preference of people in terms of an IDE, you can see that Jupyter is competing head-to-head with PyCharm(a very popular IDE for Python developers) and steadily becoming one of the preferred tools used by data scientists.

Blue: Jupyter, Red: PyCharm, Yellow: Spyder

According to the Kaggle survey from 2022 the tool that is most used by data scientists is Jupyter Notebook, again, Jupyter’s ability to create a story supported by data is very appealing to data scientists.

SQL Editors evolution

We should take into consideration the ongoing evolution of the SQL IDE. In the beginning, we had the chance to see the called, 1st generation editor which only allowed querying for a single database. These include very popular SQL Developer IDEs like DBeaver or Toad. After that, the second generation of editors included multi-platform querying with tons of enhancements in the user experience and helps to promote much more efficient work in terms of querying. Examples of these are DataGrips and DataPine.

Finally, we come to the 3rd generation of SQL Editors, the ones including notebooks that allow a more efficient collaboration or knowledge transfer since these editors can contain a self-explanatory sequence that tells a story. The third generation Editors are also breaking the barriers for users with limited technical knowledge since the notebooks don’t only look like a piece of code, they can contain rich text, tables, and graphics that can make much more sense to that type of user. Another capability of the notebooks for non-tech savvy users is that you can just set parameters in the previously mentioned elements on the notebooks that can be modified by the analyst to validate different dates or filters. A very interesting feature of the 3rd generation SQL Editors is that they allow a query “drag & drop” feature that will allow any kind of user to perform a search with the necessity of coding in SQL.

In the end, this 3rd generation of SQL Editors is trying to democratize data analysis, allowing people to focus on their pursuit of answers, instead of struggling with an SQL Editor interface; as mentioned previously, the newest editors are striving to present a more natural language that can be not only understood but also manipulated by broader audiences.

Jupyter in Teradata

As with other databases, Jupyter notebooks can work with Teradata databases. Teradata has developed as a set of Vantage Modules for Jupyter that developers can install and deploy in their local or managed Jupyter environments. These modules, also known as Teradata SQL Extensions for Jupyter include: SQL Kernel, Navigator, and Connection Manager to explore, manage connections, and execute queries on Teradata Vantage™

The Teradata SQL kernel on Jupyter includes help commands named “magic commands”, these magic commands will make your life much easier. Some of the capabilities of the commands include creating, editing, or activating a connection, creating charts, and tables; or even loading data into a table.

VANTAGE MODULES FOR JUPYTER OVERVIEW

If you want to know how to get started with Jupyter notebooks with the Teradata kernel, you can refer to the following article.

Conclusion

In the end, the Notebooks offer several features and benefits compared to their second-generation cousins, many of which bring non-tech people closer to data analysis and creating a document with more natural language. These notebooks are definitely making progress towards a utopic world in which data and people can have an actual conversation with data in the middle.

In Teradata, we are aware of the previously mentioned trends, and we have built extensions that integrate with the adoption of Jupyter.

LINKS

Installation Guide

Package download

--

--