Unraveling the Mysteries of Databricks IP Access Lists with the Analyzer and Fix Tool

Alex Ott
Databricks Labs
Published in
3 min readDec 13, 2023

IP Access Lists is one of the popular methods of securing access to Databricks workspaces. However, managing these lists can become intricate, especially in complex network scenarios. In this blog post we’re presenting the IP Access List Analyzer and Fix tool — an open-source tool created to simplify the analysis and correction of Databricks IP Access Lists. This tool is a part of Databricks Labs Sandbox project.

Unpacking the Tool’s Capabilities

1. Identifying Problematic Entries

The primary mission of the tool is to check Databricks IP Access Lists and flag potential issues. These include:

  • Private and Local IP Addresses: Detection of specifications such as 10.x.x.x, 192.168.x.x, and 127.0.0.x.
  • Duplicate Entries: Recognition of redundant entries within the lists.
  • Overlapping Entries: Identification of overlaps when larger networks encompass smaller networks or individual IPs.

2. Automated Correction through REST API

Beyond mere detection, the tool empowers users to automate corrections. By leveraging the Databricks REST API, the tool can interact with the workspace to effect changes. This feature is activated by adding the --apply command-line flag during execution.

3. Focused Analysis on Enabled Lists

To ensure efficiency, the tool selectively analyzes only enabled IP Access Lists. This prevents unnecessary processing of disabled or non-relevant lists.

Installation Made Simple

Getting started with the IP Access List Analyzer and Fix Tool is a breeze:

  1. Python 3.8 or Higher: Ensure you have Python installed on your system.
  2. The code and dependencies are installed as part of the databricks labs install sandbox command.

Navigating the Two Modes of Operation

1. Analysis and Fixing via Databricks REST API

Executing the tool with:

databricks labs sandbox ip-access-list-analyzer [options]

Allows for real-time analysis and correction of IP Access Lists directly from the Databricks workspace. To apply fixes, simply include the --apply flag. Authentication is configured using environment variables, as detailed in the documentation.

2. Analysis of Lists Stored in Files

For analysis without making changes, the tool supports the following command:

databricks labs sandbox ip-access-list-analyzer --json_file=test.json

Lists stored in files, following the format of the Get IP Access Lists REST API, can be scrutinized without modifying the workspace. Refer to the provided test.json for an example.

A Dive into the Tool in Action

Executing the tool with a command like:

databricks labs sandbox ip-access-list-analyzer --json_file=test.json --debug

Unleashes a cascade of insights, as illustrated by the output:

INFO:root:There are duplicates in the IP Access lists! len(all_ips)=241, len(uniq_ips)=237
INFO:root:Going to remove list 'list1' (0f209622-ca20-455a-bdc4-4de3bed8a1ed) as it's empty
INFO:root:Going to modify list 'list1 dup' (1f209622-ca20-455a-bdc4-4de3bed8a1ed). Entries to remove: ['52.55.144.63']
INFO:root:Going to modify list 'list2' (1f209623-ca20-455a-bdc4-4de3bed8a1ed). Entries to remove: ['10.1.2.0/24', '192.168.10.11', '52.55.144.63', '10.0.1.0']
INFO:root:List 'github_actions' (d798c5f5-3b53-4dc7-85b7-75dd67056512) isn't modified or not enabled
INFO:root:List 'Disabled list' (fc594781-60cb-4b46-b0f7-ee9d951e3c3f) isn't modified or not enabled

Embrace the Power of Databricks Python SDK

The IP Access List Analyzer and Fix tool is built on top of Databricks Python SDK that hides the complexity of interaction with REST APIs and allows to concentrate on the implementing the actual logic. Check the tool’s source code to see an example of Databricks Python SDK in action!

Please share and subscribe to Updates from Databricks Labs newsletter to stay up-to-date with the latest releases from GitHub namespace. Subscribing ensures you’re the first to know about the latest enhancements, bug fixes, and exciting features that will take your Data Intelligence Platform experience to the next level. You‘re encouraged to follow the GitHub org as well ;)

--

--

Alex Ott
Databricks Labs

Lead Specialist solutions architect at Databricks. Big & cybersecurity as a hobby, etc. https://alexott.net/en/