Connecting to Azure Data Lake and Azure SQL Server Database using Mulesoft
Microsoft host Apache Hadoop in the cloud, under the name “Azure Data Lake”. My challenge today is to connect to a Data Lake Store via its REST API, using Mulesoft, and stick in some data pulled from an Azure hosted SQL server database.
I have Anypoint Studio and the basic set of connectors, I haven’t found or attempted to use any other connector.
I found that when connecting to Azure REST APIs protected with OAuth2, I couldn’t use the HTTP Connector and configure Authentication to be OAuth2 — Client Credentials. A bit of sniffing with https://requestb.in showed me that what Mulesoft builds for this is a HTTP POST, with grant_type=client_credentials as a request parameter, and the client ID and client secret base-64 encoded in the basic auth header. Unfortunately, Microsoft’s OAuth token dispenser doesn’t recognise this — it expects the client ID and client secret as parameters in the HTTP post body request. So I had to manage the OAuth “Dance” myself with two calls.
Step 1 — get Microsoft Azure cloud account and set up a Data Lake Store and a SQL database.
Step 2 — create an Azure application account and make sure it has permissions to the data lake store and the database. You do this from the Azure Portal Active Directory page — select App Registrations. Set up a Key for your application user too, using the Keys sub menu.

For Data Lake, I used this tutorial https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-rest-api to get set up. For SQL server I followed https://docs.microsoft.com/en-us/azure/sql-database/sql-database-get-started-portal and
For both apps, use the Access Control (IAM) to make sure that the application user ID is present as a Contributor for the resource.
Step 3 — create a Mulesoft flow, add connectors. Let’s talk about Azure data lake first. It is secured with OAuth, and you will need service-to-service authentication, so you need to first call to get a token.
The token call is a HTTP POST call, with the four parameters embedded in the POST body. So you need to add a Set Properties component to your flow, and configure it to set the body params

Your client ID is your Azure Data Lake Application ID, and the client secret is your Azure Key — created via Azure Active Directory on the Azure portal as above. I found that the tutorial I followed was incorrect regarding the Resource, and I had to set it to https://management.core.windows.net/ — when I set it to the value specified by the tutorial (which was the App ID URI from the application user properties page) I got an error telling me to change the resource to the above.
I discovered how to use parameters so that I didn’t have to write the client credentials into my flow here: https://docs.mulesoft.com/mule-user-guide/v/3.5/using-parameters-in-your-configuration-files
Pass the output from Set Properties into a HTTP request to get the token. The HTTP connection settings are as follows:
Protocol: HTTPS
Host: login.microsoftonline.com
Port: 443
The URL is /<your-tenant-id>/oauth2/token — you can get the tenant ID from the Azure Active Directory Manage -> Properties page on the Azure portal, it’s called the Directory ID there.
This should succeed and you should get a response in this format:
{“token_type”:”Bearer”,”expires_in”:”3600",”ext_expires_in”:”0",”expires_on”:”1499176066",”not_before”:”1499172166",”resource”:”https://<your Azure portal>.com/<your tenant ID>","access_token":"<very long token string here"}
All you need to extract is the token — store it as a flow variable, perhaps. You should probably implement some kind of try-catch block which refreshes the token when it expires, but I haven’t done that yet — I’m creating a new token for each flow and assuming it’ll live longer than the flow.
Now to try and use the Data Lake REST API. To start with, maybe try something like read the directories.
Create a new HTTP connector with a new HTTP Request Configuration. Your Azure URL will be something like mylake.azuredatalakestore.net — you can get this from the data lake page of the portal. The protocol is HTTPS and the port is 443.
The URL for getting directories is a HTTP GET call to /webhdfs/v1/ and then you need to add the parameters:

The Authorization header contains your token. The op query parameter must be set to LISTSTATUS to list the data lake folders. Not sure if Content-Type is actually needed. If you do not set the Host header yourself, it will be set to include the port, and Azure data lake will throw an error like:
{“RemoteException”:{“exception”:”IllegalArgumentException”,”message”:”LISTSTATUS failed with error 0x83090a1a (Bad request. The operation is not supported on the provided URL type.). [48147b66-f3d6–4fec-80c2–3dc1dc9708e3][2017–07–10T02:57:40.2568465–07:00]”,”javaClassName”:”java.lang.IllegalArgumentException”}}
Pretty tricky to figure that one out I can tell you! So, set the Host to the correct value (ie yourlake.azuredatalake.com) but without the port appended, and all should be fine.
Now to connect to Azure SQL server. This is quite straightforward using JDBC, following the tutorial here https://blogs.mulesoft.com/dev/anypoint-platform-dev/building-apis-around-microsoft-azure-sql-databases/, except that you have to make sure that your computer’s IP is whitelisted in the firewall — to set the firewall values, use the Azure portal and go to SQL databases, there’s a tab for firewall settings: “Set server firewall”.

If you don’t know your computer’s IP, you can go to https://httpbin.org/ip and it will show it to you.
One final gotcha that didn’t hit me on day one but appeared later, when using MacOS you need to ensure that the HostName property is set. Otherwise, when connecting to Azure SQL database, you get the error:
the connection has been closed ClientConnectionId
To check if HostName is set, from a terminal run scutil --get HostName.
If you get the reply: HostName: not set, then that is the issue. Run
sudo scutil --set HostName “somehostname”
Then you’lll find the Azure JDBC connection works just fine.
REST access to SQL Server
What if my project isn’t going to allow JDBC access to the SQL server? Can I create a REST API for it? There’s a nice query editor in Azure Cloud, but I couldn’t figure out how to generate the necessary auth token to access it programatically (I got close). So I think I may have to use an API App hosted in Azure to access the database. This is fairly straightforward and you can use this tutorial to get you started on creating the API app, and this tutorial to get you started on connecting to an Azure SQL Server database.
Basically you use Swagger to define your API, then you can use something like jax-rs to turn your Swagger API into a Java service, then wade in and add database connection logic to your service implementation and pull/push data to the Azure database from your service implementation code. Interestingly I have no idea whether the JDBC connection loops out to the internet in order to connect to the Azure database, but I didn’t have to add any firewall rules to my Azure database to allow access which suggests that the Azure cloud is well-configured enough to route internal connections internally.
When you create an Azure API app, you can configure it to host Java applications on Tomcat. Then to push to Tomcat you can use Git, which is pretty cool — just tell the API app that you want to use a local Git repository and it’ll set one up for you and give you the remote URL. Everything you push to this repository goes into the Tomcat root directory, so you need to push webapps/<your-war>.war to get Tomcat to pick up and expand the webapp. If you call your webapp ROOT, you don’t then have an app context path in your URL.
The only difficulty I found was with the JDBC driver. I’d included the driver JAR file in my WAR, but Tomcat couldn’t find it. Experimenting with a local Tomcat showed that even if I put the driver JAR in the catalina lib/ folder, Tomcat couldn’t find it. I had to add the line:
DriverManager.registerDriver(new com.microsoft.sqlserver.jdbc.SQLServerDriver());
into my service implementation constructor and then Tomcat could find it and the application worked just fine.
