SSIS — Loading #$@&%*! Excel files

Sammy Deprez
Data Fish
Published in
4 min readApr 22, 2016

As a BI consultant we regularly get the question to load 1 or more excel files, with on or more worksheet.
All those files have different layouts and columns have different datatypes.

With other words is not always a piece of cake to load an excel file.

After lots of #$@&%*!, I was tired of it. And wanted to create a solution that I could reuse.
I came up with following solution:

First of all, I always wrote my data to a reusable table.

With a layout like this:

[code lang=”sql”]
CREATE TABLE ExcelDataLoad(
[Id] INT IDENTITY NOT NULL,
[FileName] NVARCHAR(250) NOT NULL,
[SheetName] NVARCHAR(250) NOT NULL,
[Column1] NVARCHAR(250),
[Column2] NVARCHAR(250),
[Column3] NVARCHAR(250),
[Column4] NVARCHAR(250),
[Column5] NVARCHAR(250),
[Column6] NVARCHAR(250),
[Column7] NVARCHAR(250),
[Column8] NVARCHAR(250),
[Column9] NVARCHAR(250),
[Column10] NVARCHAR(250),
[Column11] NVARCHAR(250),
[Column12] NVARCHAR(250),
[Column13] NVARCHAR(250),
[Column14] NVARCHAR(250),
[Column15] NVARCHAR(250),
[Column16] NVARCHAR(250),
[Column17] NVARCHAR(250),
[Column18] NVARCHAR(250),
[Column19] NVARCHAR(250),
[Column20] NVARCHAR(250),

CONSTRAINT PK_ExcelDataLoad PRIMARY KEY (Id)
)
[/code]

When my data was loaded into these tables, I could do with it whatever I wanted.
Cleaning, converting, mapping, transforming, ….

That’s the easy part.
What are the problems you need to overcome:

  • Structure of sheets are different
  • Excel returns an incorrect datatype
  • SSIS locks your file after reading it
  • ….

The solution I came up with:

One package that reads all your excel files at once, and we can transform the data afterwards in other packages or specific stored procedures

The solution:

Step 1: We start with adding a first ‘foreach’ loop, parameterize it with a data directory, so that it will loop over all the Excel files in that directory.

Step 2: Logging (not necessary) but I like to write to my SSIS output which file I’m processing.

worksheet-flow

Step 3: Read the available worksheets in that Excel file.

Following script reads the Excel file and add the names of each worksheet in our data flow. Then we write all that data to a RecordSet that we will use later.

[code lang=”c”]
string excelFile = Variables.DataDirectory + Variables.FileName;
string connString = “Data Source = “ + excelFile + “; Provider = Microsoft.ACE.OLEDB.12.0; Extended Properties = \”Excel 8.0;HDR=NO;Imex=1;ImportMixedTypes=Text;\””;
OleDbConnection objConn = null;
DataTable dt = null;

try
{
objConn = new OleDbConnection(connString);
objConn.Open();
dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

if(dt != null)
{
foreach(DataRow row in dt.Rows)
{
Output0Buffer.AddRow();
Output0Buffer.SheetName = row[“TABLE_NAME”].ToString();
}
}

}catch(Exception ex)
{
//do nothing
}
finally
{
if(objConn != null)
{
objConn.Close();
objConn.Dispose();
}
if(dt != null)
{
dt.Dispose();
}
}
[/code]

Step 4: We add another ‘foreach’ loop that will iterate over all the worksheets available

Step 5: Logging (again not necessary), here I keep track of which sheet I’m processing.

Step 6: The most important one!

data-flow

Instead of using an Excel File Source, we need to use a Script Source, with as OutputBuffer the maximum number of columns that you could ever get. Off course you need to make sure that your destination table has the same number of columns.

Following script is the most interesting of this whole post.

[code lang=”c”]
String sheetName = Variables.SheetName;
string excelFile = Variables.DataDirectory + Variables.FileName;
string connString = “Data Source = “ + excelFile + “; Provider = Microsoft.ACE.OLEDB.12.0; Extended Properties = \”Excel 8.0;HDR=NO;Imex=1;ImportMixedTypes=Text;\””;
OleDbConnection objConn = null;
OleDbCommand objComm = null;
OleDbDataReader rdr = null;

try
{
objConn = new OleDbConnection(connString);
objConn.Open();
objComm = objConn.CreateCommand();
objComm.CommandText = “SELECT * FROM [“ + sheetName + “]”;
rdr = objComm.ExecuteReader();

while (rdr.Read())
{
if(rdr.FieldCount > 0)
{
Output0Buffer.AddRow();
int i = 0;
while (i < rdr.FieldCount)
{
if(Output0Buffer.GetType().GetProperty(“Column” + (i + 1).ToString()) != null) //if column exists in outputbuffer
{
Output0Buffer.GetType().GetProperty(“Column” + (i + 1).ToString()).SetValue(Output0Buffer, rdr[i].ToString());
}
i++;
}
}
}
}
catch (Exception ex)
{
//do nothing
}
finally
{
if(rdr != null)
{
rdr.Close();
}
if(objComm != null)
{
objComm.Dispose();
}
if (objConn != null)
{
objConn.Close();
objConn.Dispose();
}
}
[/code]

So what does the code do:
First of all we open a connection to the excel file.
Next part is doing a ‘SELECT *’ from the worksheet that we are processing at the moment.
Then for each row (in the OleDbDataReader) (dr.Read()). We check how many columns are available. If there are more then 0 columns then go on and create a new row in our dataflow.
Now we want to iterate over each column of that row.
First we will check if the column that we want to fill exists in our OutputBuffer, if so then we will fill that column with the certain cell value we have selected (Column/Row combination).
Instead of Output0Buffer.Column1 = some_value, this script does it dynamically. This way it does not matter how many columns we have. The columns that are not used will be NULL.

Final notes:

I made use of OLEDB, so that I can close the connection myself, so we don’t get locked file. Because of making use of ExtendedProperties in the ConnectionString I do not easily get issues with wrong datatypes. And by using above script I can ready any Excel file with any structure.

Offcourse this SSIS Package might need some changes depending on extra requirements. But personally I think its a good extended basis to start with.

Hope this blogpost helped you and you can use it now or in the future.
You can download the solution (VS2015) here.

--

--