OAuth2 from inside a Jupyter Notebook

Olivier Borderies
9 min readMay 27, 2018

--

Why ?

Jupyter has grown very popular as an open source interactive computing platform platform. It originated in academia and is well known among data scientists as a prototyping tool. It also enables publishing reproducible science.

This blog post published in the wake of Jupyter receiving the ACM Software System Award — thereby putting it on the same level as milestones of computer history such as Unix, The WWW, or Java — is a good summary of its history, spirit and current state.

Jupyter is open at heart, which is very good ! However outside science not all data people work with is —obviously. As data is increasingly recognized as a key competitive advantage for firms it is important to protect it and manage access rights.

So there should be an easy way to get authenticated directly from the notebook so as to tap protected APIs without manipulating secrets (login/passwords or keys), which typically undermines security.

How ?

This article introduces an attempt to tackle the issue: the ipyauth library. It enables a notebook user to negotiate an access token with ID providers. Then the user is free to tap protected APIs but adding the access token to in the request header, for example with the handy package requests (*).

ipyauth is a custom ipywidget in 3 parts:

  • The Javascript part performs the OAuth2 implicit flow. Currently 2 ID providers are supported (Auth0 and Google). In these 2 cases it is built on the Javascript SDKs published by these ID providers.
  • The Python part manages the user interface to input configuration parameters, customized by ID provider, and a common set of core ipywidgets displaying buttons and the credentials obtained after negotiation.
  • The notebook server extension captures the OAuth2 redirect url.

In short, the notebook frontend manages authentication like any SPA and the ipywidgets machinery is used to sync data between the Javascript in the browser and the Python kernel:

  • The info necessary to start the authentication flow (e.g. client ID, scopes) is written in Python and synced to the browser.
  • The credentials obtained by the browser after upon completion of the authentication flow are sent back to the Python kernel.

The structure of ipyauth is modular. There are distinct Javascript and Python files for each ID provider so that it should be a reasonable effort to add new one.

For more info about how to develop a custom widget, see this blog post: Authoring custom widgets. For a tutorial on this tricky subject, the article Understanding OAuth2 is a good start, and Auth0 is one of the best resources available.

(*) Only Python is considered in this article

ID Providers

I picked Autho0 and Google as the first 2 ID providers because:

  • Auth0 is a powerful IDaaS, extremely well documented, versatile, developer friendly, easy to customize, with an interesting free plan.
  • In the context of ipyauth it is an example of the OAuth2 3-step dance: (1) Redirect away from the notebook to the authorization server, (2) From there redirect to the OAuth2 redirect uri, (3) Finally back to the origin notebook url. Note: Token refresh is handled inside an iframe so no redirects need be managed by ipyauth in this case.

EDIT: ipyauth does not rely on ID providers SDKs any longer. See bottom of page

Here what the ipyauth widget looks like with Auth0:

ipyauth with Google before sign-in and after authentication
  • Google is also very well documented and has a wealth of discoverable and well organized APIs, e.g. Google Drive (including Sheets, Slides), which many people may find useful to store and share data with manageable access rights — for free.
  • Google Sign-in Javascript client works entirely in popup & hiden iframes. So in the context of ipyauth the redirect away from the notebook is not necessary and the user experience all the better.

EDIT: ipyauth does not rely on ID providers SDKs any longer. See bottom of page

Here what the ipyauth widget looks like with Google:

ipyauth with Google before sign-in and after authentication

Examples

Auth0

The video below demonstrates the ipyauth widget with an Auth0 server.

In summary the notebook user goes through the following steps:

  • Build Auth0 authentication parameters
  • Instantiate the ipyauth widget
  • Click on the Sign-In button
  • Get redirected away from the notebook to the ID provider website
  • Type credentials with the authorization server —or confirm your id
  • Get redirected to the redirect uri (handled by the server extension)
  • Get instantly redirected to the origin notebook url

Then the widget displays the token data and the Python kernel has it too. The user may click the Inspect button to examine the token (which is a JWT) contents.

From there the user can make regular requests to a protected API with the access token obtained. In the demo the user taps an API that manages 2 lists of fruits, usual and exotic, and read/write rights depending on the scopes.

ipyauth with Auth0 example

See the Auth0 section of the ipyauth docs to run this demo notebook.

Google

The video below demonstrates the ipyauth widget with Google.

In summary the notebook user goes through the following steps:

  • Build Google authentication parameters
  • Instantiate the ipyauth widget
  • Click on Sign-In button
  • Google popup shows up (if your browser allows)
  • Type credentials in popup — or confirm your id
  • Grant access to requested scopes

Then the popup closes, the widget displays the token data and the Python kernel has it too. The user may click the Inspect button to examine the token contents.

From there the user is can make requests to Google APIs with the access token obtained as long as it has the required scopes. In the demo the user creates a new spreadsheet on his Drive, writes some data in it, update it, remove some, and finally shares it with other people — without leaving the comfort of their Jupyter notebook !

ipyauth with Google example

See the Google section of the ipyauth docs to run this demo notebook.

JupyterLab

ipyauth works in the classic notebook and JupyterLab too — thanks to the example provided by Maarten Breddels’ ipysheet.

However there is a small caveat.

When the flow involves redirect away/back to the notebook url (Auth0 example for initial authentication) then upon landing back to the notebook url there is a difference in behavior.

  • In the classic notebook the ipyauth widget fires up automatically and the query string in the url is instantly consumed then the widget renders and shows the authentication data (if the notebook is trusted).
  • In JupyterLab, the widget does not fire up until the user runs the notebook cell manually. Before they do the url query string contains the authorization credentials. When they do the authentication workflow continues and finishes as in the classic notebook. This pause in the flow makes the user experience a bit broken.

To be clear, when there is no redirection (Google example) then there is no difference in user experience between the classic notebook and JupyterLab.

EDIT: ipyauth works in iframes and popups for all ID providers. No more redirect from the notebook page. See bottom of page.

JupyterHub

But does not JupyterHub already authenticate users ? Then why ipyauth ?

Indeed JupyterHub does authenticate users. However the OAuth scopes negotiated are predetermined (like any website) and independent of the notebooks contents it helps play. However by definition notebooks contain arbitrary code which may need to negotiate a token containing specific rights — with a third party authentication server — to access some data protected independently of the JupyterHub server. As a consequence it is convenient to be able to originate the authentication from inside the notebook so that its workflow is self contained. Because the notebook is a kind of SPA, it is feasible in a standard way.

This remark is also valid for desktop Jupyter (i.e. local install).

Notebook as the ultimate “API for Humans” toolkit

But is it that useful to tap APIs from a notebook in the first place ?

Yes it is, very VERY much !

APIs are ubiquitous and modular IT system design is increasingly the norm.

Traditional clients are web GUIs. Unfortunately they are always a difficult compromise between flexibility and simplicity, are long to develop, and slow to evolve.

There are structural reasons for that: UI is intrinsically complex and requires specialists to design and implement, who typically hop from one project to the next, and seldom know the structure of the data behind the API and/or the mindset of the human data consumers. Besides, as intuitive and pleasant as the first time visitor experience is, in case of repeated interaction and intensive use the GUI becomes cumbersome and a frustrating barrier, particularly for batch works.

In contrast, the notebook solves many of these problems:

  • They are intuitive to everybody because of the short feedback loop, the linear navigation freedom
  • They are ideal for documented workflows as they contain rich text, code, rich outputs
  • They are fast to develop, leveraging hundreds of user friendly libraries, and publish (live on JupyterHub, snapshot on nbviewer)
  • A lot more “business people” (i.e. non IT specialists) can write notebooks — vs. web sites
  • A notebook is very versatile: It can contain mostly text, or a lot of code, some graphs, or only ipywidgets in quasi app mode, depending on the intended audience.
  • They can be arbitrarily augmented by harnessing Javascript libraries via ipywidgets (See this article) — admittedly that step shall be confined to specialists

A typical API consumption notebook works as follows:

  • Present the objective, input, output, and links to the documentation, swagger, etc
  • Authenticate the user and get access token
  • Build potentially complex request (json or query string) from high level human user input
  • Make the request to protected API
  • Unpack API response and display it in a thousand ways — tables, graphs.
  • Script business logic in batch works if necessary

Thus Jupyter notebooks offer a continuum of solutions to interact with APIs, from very technical live documentation to quasi web apps, likely to be developed or at least maintained and tweaked by “business people” who know the data and/or the users.

This makes the Jupyter notebook and ecosystem the ultimate “API for Humans” toolkit.

Conclusion

More generally because notebooks are intuitive and can be easily tailored to a specific audience or technical knowledge, they can be used to educate people about APIs and related subjects. This has a lot of value in organizations.

From this perspective I believe the key feature to increase the user base “x 10”, literally, is to help notebook producers package and publish them to notebook consumers.

The best current option I know for packaging to notebook consumers is the appmode notebook extension for the classic notebook. Hopefully the Jupyter team will agree and move in this direction to make it, or a similar capability, a core feature.

Anyway, I would like to conclude with a thumbs up and “chapeau bas” to the Jupyter core devs for their outstanding intuition, execution, dedication, achievement and social value generated ! They are part of History —really.

EDIT 12/06/18: ipyauth was refactored to remove the dependency on ID providers’ SDKs in favor of their endpoints, and manage the authentication flow in hidden iframes as much as possible, else popups, so that there is no more redirect from the notebook page, which greatly improves the user experience. It also shortens the code and makes it considerably simpler to add a new ID provider. And SG Connect, Société Générale CIB ID provider, was added.

--

--