Secure and Streamlined Azure SQL Connectivity Using Service Principal Credentials in IBM SPSS Modeler
This article explores the various ways that IBM SPSS Modeler (Batch, Client, and Scripting) can connect to Azure SQL with azure service principal authentication.
IBM SPSS Modeler Client
IBM SPSS Modeler Client is a user interface (UI) application designed for data mining, statistical analysis, and predictive modeling. It provides a graphical environment where users can visually design analytical workflows without needing to write code.
IBM SPSS Modeler Batch
IBM SPSS Modeler Batch provides the complete analytical capabilities of the standard IBM SPSS Modeler Client but without access to the regular user interface. Batch mode allows you to perform long-running or repetitive tasks without your intervention and without the presence of the user interface on the screen. It must be run in distributed mode along with IBM SPSS Modeler Server (local mode is not supported).
IBM SPSS Modeler scripting
IBM SPSS Modeler empowers users by integrating Python scripting directly into their data analysis workflows. This functionality enables dynamic interaction within data streams, enhancing flexibility and automation capabilities.
To Know more about IBM SPSS Modeler: https://www.ibm.com/docs/en/spss-modeler/18.5.0
Note : For IBM SPSS Modeler to connect to Azure SQL with Azure service principal authentication ODBC DSN configuration is a prerequisite
Microsoft Azure Service Principal Authentication
A Service Principal is essentially an identity created for use with applications, hosted services, and automated tools to access Azure resources. Instead of a user login, the application uses the Service Principal’s credentials.Service Principal Authentication offers a secure and efficient way for automated services and applications to connect to Azure SQL Database.
Using Service Principal credentials for automation provides several advantages, particularly in scenarios where secure, unattended access to Azure resources is required.
Role-Based Access Control (RBAC): Service Principals can be assigned specific roles and permissions. This means you can grant them only the access they need to perform their tasks, adhering to the principle of least privilege. This minimizes the risk of unauthorized access and potential security breaches.
How Service Principal Authentication Works
Azure AD Tenant: An Azure AD tenant is a dedicated instance of Azure AD that an organization or entity uses to manage identities, access, and applications. It’s essentially a directory service that stores user accounts, groups, and application registrations.
Service Principal Creation:A Service Principal is an identity created in Azure AD, which represents an application, service, or automated process that needs to access Azure resources. This identity is associated with a client ID (also known as application ID) and optionally a client secret or certificate for authentication.
Authentication Process:Applications authenticate using the Service Principal’s credentials, which consist of Client ID (A unique identifier for the application or service) and Client Secret or Certificate (Used to authenticate the Service Principal and obtain access tokens)
Obtaining Access Tokens:When your application needs to access Azure SQL Database, it initiates an authentication flow with Azure AD to obtain an OAuth 2.0 access token.
Application sends a token request to Azure AD’s token endpoint (https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token
) authenticating itself using its client ID and client secret.
Azure AD verifies the credentials and permissions of the application (Service Principal).If authentication is successful and the application has the necessary permissions, Azure AD issues an OAuth 2.0 access token.
Connecting to Azure SQL Database:Application includes the obtained access token in the authorisation header (Bearer token) when making HTTP requests to Azure SQL Database APIs or in the connection string when connecting programmatically.
Configuring ODBC DSN for connecting to Azure SQL using service principal credentials
Connecting to Azure SQL Database via ODBC using Service Principal credentials involves configuring a Data Source Name (DSN) in the ODBC Data Source Administrator on your Windows machine. This allows applications that support ODBC to connect to Azure SQL Database using the Service Principal’s client ID, client secret (or certificate), and tenant ID for authentication.
Ensure that you have installed the appropriate ODBC Driver for Azure SQL Database. Microsoft provides the ODBC Driver for SQL Server, which supports Azure SQL Database.
Steps to configure DSN
1.Open the ODBC Data Source Administrator on your Windows machine. Navigate to user DSN or system DSN and click on add to create a new data source
2.Choose the appropriate ODBC Driver for your Azure SQL Database. For example, select ODBC Driver 17 for SQL Server and click on finish.
3.Enter necessary details to configure Data source
Name: Provide a name for your data source (e.g., AzureSQLServicePrincipalDSN
).
Description: Optional description for the data source.
Server: Specify the Azure SQL Database server name (e.g., yourserver.database.windows.net
).
Choose the Authentication method : Choose Azure Service Principal authentication and enter login ID and password.
login ID : <clientID>@<tennatID>
password : <client_secret>
Click on next and for Database specify the name of the Azure SQL Database you want to connect to.
Configure any additional settings specific to your application requirements, such as connection timeout, encryption settings, etc. Click on Test to verify the connection settings. Ensure that the connection is successful before proceeding.
Note : Configuration of DSN is possible without credentials as well. While connecting to DB, credentials must be passed (from Modeler client / Modeler batch / Modeler script).
Connecting to Azure SQL using service principal credentials via SPSS Modeler Client
Install IBM SPSS Modeler client on your machine. Configure DSN in the same machine as mentioned above
Note : DSN must be configured on the same machine along with SPSS Modeler server. If you are using SPSS Modeler client (local server), you can configure DSN in the client machine. If you are using a remote connection to the server, configure DSN in server machine
Steps to add Data source in Modeler client
- Launch SPSS Modeler client and navigate to tools -> Databases
- DSN configured will be listed in the Data source list. Select the DSN and enter username and password for authentication
- Make the DSN default if required and click ok.
Now use Datasource node or Database export node to access data in DB
Connecting to Azure SQL using service principal credentials via SPSS Modeler Batch
Before running the stream in batch mode, prepare your IBM SPSS Modeler Stream and ensure it is configured correctly to connect to Azure SQL Database. When the stream is saved, database connection parameters are saved in stream properties except the password.
IBM SPSS Modeler allows you to pass parameters from the command line when running streams in batch mode. This allows you to customize settings such as database connection details without modifying the stream file itself.
Command line syntax : Move to <Modeler Batch installation directory>/bin
Run clemb executable(clemb.exe) file with the following parameters to connect to Modeler server and run a stream
-server -hostname <hostname> -port <port_number>
-username <User>
-password <Password>
-stream "Stream file path" -execute
-log <log file path>hostname — IP address of the machine where SPSS Modeler server is running
port_number — port number to connect with SPSS Modeler server
User and Password — username and password to connect with Modeler server
clemb.exe -server -hostname <hostname> -port <port_number>
-username <User>
-password <Password>
-stream "Stream file path" -execute
-log <log file path>
How to pass Database connection parameters
To pass database parameters from command line, add one more parameter to the above command
- Custom name for Database (source / export )node — Select Custom option for node name and enter a name for Database (source/export)node
Command line parameters :
for Database source node : -P<Node name>dbs:databasenode.datasource={"<DSN name>","<clientID>@<tennatID>","<clientSecret>","false"}
for Database export node : -P<Node name>dbs:databaseexport.datasource={"<DSN name>","<clientID>@<tennatID>","<clientSecret>","false"}
clemb.exe -server -hostname <hostname> -port <port_number>
-username <User>
-password <Password>
-P<Node name>dbs:databasenode.datasource={"<DSN name>","<clientID>@<tennatID>","<clientSecret>","false"}
-stream "Stream file path" -execute
-log <log file path>
In this case command line parameter overwrite the stream properties saved.
2. Auto name for Database (source / export )node — Select Auto option for node name and enter a name for Database (source/export)node. Here table name selected would be considered as the node name
Command line parameters
for Database source node : -Pdbs:databasenode.datasource={"<DSN name>","<clientID>@<tennatID>","<clientSecret>","false"}
for Database export node : -Pdbs:databaseexport.datasource={"<DSN name>","<clientID>@<tennatID>","<clientSecret>","false"}
clemb.exe -server -hostname <hostname> -port <port_number>
-username <User>
-password <Password>
-Pdbs:databasenode.datasource={"<DSN name>","<clientID>@<tennatID>","<clientSecret>","false"}
-stream "Stream file path" -execute
-log <log file path>
In this case, since node name is not mentioned in the command line, only only database (source / export ) node will take command line parameters and the rest of the database node (if added in the stream) will take the stream properties to connect to database. Stream properties doesn’t store password for database connection and hence a password prompt pops up.
Connecting to Azure SQL using service principal credentials via Modeler Script
IBM SPSS Modeler enables users to add Python scripts within their data streams.This allows for dynamic interaction with the stream, including the ability to pass parameters and manipulate nodes programmatically.
The following script showcases how to retrieve database connection details from session parameters (useful in batch mode) or stream parameters (useful in the Modeler Client UI) and apply these settings to a Database node within a Modeler stream.
In Modeler thick client click on tools -> Stream Properties -> Parameters
import modeler.api
#Get the Global SPSS Modeler Stream, Diagram and Session properties
diagram=modeler.script.diagram()
stream=modeler.script.stream()
session=modeler.script.session()
#Fetch the values of the parameters from the session
#When running in Modeler Batch (clemb.exe) this is where the parameters are accessible
databaseODBCName=session.getParameterValue("DBODBCDSN")
databaseUsername=session.getParameterValue("DBUSERNAME")
databasePassword=session.getParameterValue("DBPASSWORD")
#If the session parameters are not available, it means the stream is running in the Modeler Client UI
#Then fetch the parameters from the stream
if databaseODBCName == None:
databaseODBCName=stream.getParameterValue("DBODBCDSN")
if databaseUsername == None:
databaseUsername=stream.getParameterValue("DBUSERNAME")
if databasePassword == None:
databasePassword=stream.getParameterValue("DBPASSWORD")
print "ODBC name set to: ", databaseODBCName
print "ODBC User name set to: ", databaseUsername
print "ODBC Password set to: ", databasePassword
#Get the Database source / export Node
#go to annotation tab and get the ID from bottom right corner
databaseNode=diagram.findByID("id6Y8XJJQNLJE")
#Set the Database node connection properties
databaseNode.setPropertyValue("datasource", databaseODBCName)
databaseNode.setPropertyValue("username", databaseUsername)
databaseNode.setPropertyValue("password", databasePassword)
databaseNode.setPropertyValue("tablename", "tm_spss_modeler.script_test")
#Get the output/final node and run
#go to annotation tab and get the ID from bottom right corner
diagram.findByID("id6Y8XJJQNLJE").run(None)
Run the saved stream from either Modeler client or Modeler batch.
Summary
This blog provides a detailed explanation mutliple methods provided by IBM SPSS Modeler to connect to Azure SQL using Azure service principal authentication. Setting up the connection involves configuring Azure AD, creating a Service Principal, and setting up ODBC DSN. This approach ensures secure and efficient access to Azure resources, adhering to role-based access control and minimising security risks