Consuming a SOAP service using Azure Data Factory Copy Data Activity
One day at work, I was presented with the challenge of consuming a SOAP service using Azure Data Factory.
After a lot of research over the internet, reading a lot of forums, I found no tutorial or tip to perform this particular task. Even the support that Microsoft provides to our company did not know how to guide us.
Some forums even said that it was not possible to perform this task in ADF (Azure Data Factory), and that it would be necessary to use Azure Functions to make the SOAP request through code.
According to wikipedia, the SOAP protocol:
SOAP (abbreviation for Simple Object Access Protocol) is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks. Its purpose is to provide extensibility, neutrality and independence. It uses XML Information Set for its message format, and relies on application layer protocols, most often Hypertext Transfer Protocol (HTTP)
Then, analyzing the ADF, I realized that the task in theory could be done using the HTTP connector.
So, long history short, this is what I did.
To perform this tutorial, I used the SOAP service available at: http://www.dneonline.com/calculator.asmx?wsdl
Analyze the SOAP service to be consumed
To find out the characteristics of the service to be consumed, I usually use the SOAP UI tool.
**If you have no particularity with the tool, there are several tutorials on how to import a wsdl and make a request. As the focus of the post is the ADF I won’t describe this part in detail.
Make a POST or GET request to find out which types of headers and which body model will be passed to the Endpoint:
I will use the “Add” endpoint and pass as 10 and 5 values, according to the body image of the request.
After making the request, check the headers that have been passed to the endpoint:
This information will be used when we build the Activity in the ADF.
Configuring Copy Data Activity in the ADF
Create a new HTTP Dataset:
In the ADF, select new data set:
Select the Data Store HTTP type
Select data format. Here select the binary type. It is the most basic type of ADF.
Name the Dataset created for SoapDataSetBinary and create the linked service. Here you will need to create a new linked service for the consumption of the SOAP endpoint.
Configure the HttpLinkedService
In the linked service selection click +New
In the configuration tab of the new HTTP linked service:
- Set the name to HttpLinkedService
- Set the Base URL to http://www.dneonline.com/calculator.asmx as described in the UI soap request information.
- Since the endpoint does not require authentication, in Authentication type select Anonymous (If your service requires authentication, fill in the user-password fields).
- Click create. Here, by clicking the test connection button it is possible to do a test if the connection was successful.
After creation you will be redirected back to the creation of the dataset. Confirm by clicking OK. Your DataSet will open.
Create the Sink data set to be used
In this step you define where the data is supposed to be transported to. In this tutorial I used the Azure Blob Store. The creation is very similar to the dataset previously created:
- Click on new data set
- Select the data format for binary
- Set the name to SoapSinkDataSet
- Select the linked service to be used. In this part I will not demonstrate how to configure the linked service for the azure blob, since it does not make a difference to the tutorial.
- Define the path where the file will be available and click OK
The sink data set will be displayed in the same way as when creating SoapDataSetBinary
Now that we have SoapDataSetBinary, SoapSinkDataSet and HttpLinkedService configured it’s time to create a new pipeline with Activity Copy data
Click New Pipeline
The new pipeline panel will be opened. Browse Activities and find Copy Data Activity, click and drag to the panel.
Set the name of the copy data Activity to SoapRequisition
In the Source tab, select the newly created SoapDataSet as the source dataset.
Set it to Request method POST
Add the headers evidenced in the SOAP UI of content-type and SOAPAction
Add the Request body, the same used in the SOAP UI
The source tab should look like this:
In the sink tab select SoapSinkDataSet
And that’s it! Click Debug to test the pipeline and everything should be fine
The Response file of the request will be available at the folder defined in SoapSinkDataSet. Just open it with a text editor to see the content.
If you have any questions or problems, feel free to contact me.