SSIS, XML and NULLs

Trident of a data extraction

Riccardo Perico
Rik in a Data Journey
6 min readDec 24, 2023

--

Prelude

In a world of Cloud, Fabric, Datalake, Data Mesh… we still have to deal with so called “legacy data architecture” based on SQL Server’s BI stack. You know what? I still like it.

Even after more than a decade in this industry some requests from customers could be challenging and enjoyable to solve.

One of our biggest customer recently came to us asking to produce XML interface files to feed one of its new technical systems with data elaborated into the DWH (the entire BI is on-prem and based on SQL Server).

What we do

In a SQL Server ecosystem the very first option to consider when you need to deal with Extraction, Transformation and Loading is obviously SQL Server Integration Services (SSIS).

I work with SSIS since 2012 but I couldn’t recall I’ve ever had to produce an XML file with them (tons of CSVs and XLS(X) but no XMLs), but I opened Visual Studio pretty confident to find an “XML Destination” task in my SSIS Data Flow Task toolbox.

Unfortunately I quickly realized there isn’t an out-of-the-box XML output in SSIS (at least in SQL Server 2017 that is the version I’m working on this customer).

Image showing destinations available in VS2017 Data Tools for SSIS

I therefore decide to search for a solution online and I found one really interesting and smart.

The idea is to prepare the XML document in T-SQL and to use a Data Flow Task in SSIS to export it using an Export Column task.

“The Export Column transformation reads data in a data flow and inserts the data into a file. For example, if the data flow contains product information, such as a picture of each product, you could use the Export Column transformation to save the images to files.”

Prepare data

First of all we create a stored procedure that incapsulates the logic (if any), and prepare the XML document using FOR XML statements as needed.

This stored procedure will output 1 row with 2 columns:

  • XML document
  • Output file path

It could be advisable to add a parameters to the stored procedure in order to programmatically choose the output file path and to pass any “logic switcher”.

Orchestrate the stored procedure

The Export Column Task basically asks you to identify which column you want to extract and in which file. All that I need is already in the output of the stored, therefore we only need to call it in a Data Flow Task with a simple EXEC and put it as input of the Export Column.

Image showing the Data Flow task

The Export Column Task is setup to extract xml_node column content into a file which is defined by dest_file column.

Image showing Export Column setup

So far, so good. A nice XML document produced according to logic and structure defined in the stored procedure code.

How to manage NULLs in XML

One of the aspects to consider is the way we want NULLs coming from our RDMS to be represented in the output XML document.

There’re many ways to represent NULLs in XML, but some of them would require us to do some extra work in the stored procedure.

Omitting the Element

<root>
<name>John</name>
<! - age element omitted to indicate null value -->
</root>

Probably the easiest one and the default behavior of FOR XML statement.

PROs

  • No extra work needed
  • Less verbose XML document

CONs

  • Every node hasn’t the same number of elements and clients may require it

Using a Special Attribute

<root>
<name>John</name>
<age isNull="true"/>
</root>

Using an Empty Element

<root>
<name>John</name>
<age/>
</root>

Using a Special Value

<root>
<name>John</name>
<age>null</age>
</root>

All the above methods clearly state the age column is null and one of them could be used according to the client needs. However this is not the standard behavior of FOR XML and we need to perform some data manipulation in order to achieve the desired output.

For example if we want to use a special value instead of NULLs we need to do something like this before applying FOR XML.

SELECT 
f1 = ISNULL(CONVERT(VARCHAR(10), t.f1),'null')
, f2 = ISNULL(CONVERT(VARCHAR(20), t.f2),'null')
, f3 = ISNULL(CONVERT(VARCHAR(30), t.f3),'null')
FROM dbo.MyTable as t

We can create very complex XML documents, nesting elements at different levels. In this case it could be a good idea to use a CTE to “clean” all the columns at the very beginning and then use them easily later.

WITH cte
AS
(
SELECT
--father node elements
f1 = ISNULL(CONVERT(VARCHAR(10), f.f1),'null')
, f2 = ISNULL(CONVERT(VARCHAR(20), f.f2),'null')
, f3 = ISNULL(CONVERT(VARCHAR(30), f.f3),'null')
--child node elements
, s1 = ISNULL(CONVERT(VARCHAR(10), f.f10),'null')
, s2 = ISNULL(CONVERT(VARCHAR(20), f.f11),'null')
, s3 = ISNULL(CONVERT(VARCHAR(30), f.f12),'null')

, key_column
FROM dbo.MyTable
)
SELECT
f1
,f2
,f3
,
(
SELECT
s1
,s2
,s3
FROM cte AS s
WHERE f.key_column = s.key_column
GROUP BY
s1
,s2
,s3
FOR XML PATH('Child'), ROOT('Children'), TYPE
)
FROM cte AS f
GROUP BY
f1
,f2
,f3
FOR XML PATH('Father'), ROOT('Fathers'), TYPE

Regardless the patter you follow, each of the above solutions have the same pros and cons.

PROs

  • NULLs clearly exposed
  • Every node has the same number of elements

CONs

  • Extra code to prepare the data
  • More Verbose XML document
  • Heavier XML document

Looking carefully to docs you’ll discover there’s another way to handle NULLs that’s that use of ELEMENTS directive with the optional XSINIL.

The ELEMENTS directive constructs XML in which each column value maps to an element in the XML. By default, if the column value is NULL, no element is added. But by specifying the optional XSINIL parameter on the ELEMENTS directive, you can request that an element is created for the NULL value. In this case, an element that has the xsi:nil attribute set to TRUE is returned for each NULL column value.

In my humble opinion that’s the most elegant solution since it gives you the best of both worlds. Minimal change to the code adding “ELMENTS XSINIL” to the FOR XML statement and no need of any “isnullification”.

SELECT ProductID, Name, Color
FROM Production.Product
FOR XML RAW, ELEMENTS XSINIL;

The output you’ll get will be super clear and standard.

<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ProductID>1</ProductID>
<Name>Adjustable Race</Name>
<Color xsi:nil="true" />
</row>
...
<row>
<ProductID>317</ProductID>
<Name>LL Crankarm</Name>
<Color>Black</Color>
</row>

PROs

  • Minimal code change (less effort, improved readability and manageability)
  • NULLs clearly exposed and human readable
  • Every node has the same number of elements
  • Adheres to an XML standard

CONs

  • More Verbose XML document
  • Heavier XML document
  • It is closely tied to the use of XML Schema
  • Introducing XML Schema and XSI:NIL can make your XML documents more complex
  • Even if it is a standard mechanism, not all XML processors or consumers may fully support it or interpret it correctly

Conclusions

SQL Server Integrations Services are the facto standard for on-premises ETL projects based on SQL Server ecosystem. They still not provide a standard connector for XML files but Extract Column transformation can make you workaround of it.

You can represent NULLs in many ways according to the client/consumer needs. Any approach has its pros and cons.

XSI:NIL could be very elegant and easy to implement.

--

--

Riccardo Perico
Rik in a Data Journey

BI & Power BI Engineer since 2010 — Data and me in a complicated relationship — Hard Rock and Heavy Metal addicted