Data Integration with ETL and SSIS

Shreeharan Hareendran
6 min readNov 12, 2019

--

Welcome to yet another post. This will provide you basic insights on ETL and SSIS.

Introduction

The availability of data on time is an important factor for a business to thrive. Organizations generate huge amounts of data daily about various activities based on their daily business operations. However data in raw format does not provide any meaningful insights, it should be processed by correct techniques to convert it to useful information.

Converting data to information is just part of the problem. In reality, an organization will consist of several operational units and interacts with external entities. Different sources will generate data in different formats and expose in different ways. The integration of those is a challenging task.

Data Integration/migration is a process in which heterogeneous data is retrieved and combined as an incorporated form and structure.

Data orchestration can be achieved through several processes/concepts,

  1. ETL
  2. Data Cleansing
  3. Data modeling
  4. Data warehousing
  5. Data merging migration and replication

What is ETL?

Data extraction involves extracting data from homogeneous or heterogeneous sources; data transformation processes data by data cleansing and transforming them into a proper storage format/structure for the purposes of querying and analysis; finally, data loading describes the insertion of data into the final target database such as an operational data store, a data mart, data lake or a data warehouse.

Source :- https://en.wikipedia.org/wiki/Extract,_transform,_load

Data integration mainly focuses on transferring data from a source or multiple sources to a destination in an expected format.

ETL solves this by,

  • Extraction of data from source or multiple sources
  • Transformations of heterogeneous source data to destination expected format
  • Loading or storing transformed data in the destination

The major benefit using ETL is that it is easier, convenient and faster to use to move data within two entities with transformations than traditional methods which requires many lines of codes. ETL tools contain graphical user interfaces which speeds up the process of mapping tables and columns between the source and destination tables.

To achieve data integration via ETL there are several tools which are being used,

  1. SSIS
  2. Informatica
  3. Oracle Data Integrator
  4. AWS Glue

So let’s get going with SSIS,

Microsoft provides a suite of BI tools for data manipulation and reporting. Following are the main tools in the Microsoft BI Suite (MSBI),

  • SSIS -SQL Server Integration Services
  • SSAS-SQL Server Analysis Services
  • SSRS-SQL Server Reporting Services
Application of MSBI Suite in a nutshell

It is a new tool provided by Microsoft called Sequel Software Integration Services (SSIS), which is more commonly known as SSIS. It is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data integration and data transformation tasks, which is simply referred to as an ETL process.

So simply put, SSIS provides you with tools to perform ETL.

Don’t get overwhelmed with these jargons.

Let me take you through some visual of these concepts.

On the left-hand side is what we call as extracting data. Your data may be spread across various systems such as CRM, billing, supply chain. So using SSIS, you can easily connect to these data sources or systems like Ekso, Hadoop, Oracle, excel files, etc and start pulling across systems.

Maybe different.

For example, consider that employee payroll information is stored in Oracle and employee information is stored in Sequel Server and you need to build cubes for reporting and analysis. Another scenario would be like employee state represented with the word Washington in one database and the letters WA in another. So in both these scenarios, different formats of data will be maintained in both the databases and they should be transferred and maintained in a common format before analysis or report generation.

This is where the trick of transformation comes into play. You can plan and clean or massage the data across systems so that your warehouse becomes a clean single source of truth. Once you transform the data you can load the data into a predefined and pre-designed table in a database.

Most common use cases of SSIS packages

  • Archival of data (export)/loading of new data (import)
  • Transferring data from one data source to another
  • Data cleansing or transformation of dirty data
  • Merging Data from Heterogeneous Data Stores Populating Data
  • Warehouses and Data Marts Cleaning and Standardizing Data Building
  • Business Intelligence into a Data Transformation Process Automation

Some more technicalities of SSIS

To get a comprehensive idea about SSIS I strongly recommend you to take a look at the below concepts and follow the attached tutorials.

SSISDB CatLog

The SSISDB catalog is the central point for working with Integration Services (SSIS) projects that you’ve deployed to the Integration Services server. For example, you set project and package parameters, configure environments to specify runtime values for packages, execute and troubleshoot packages, and manage Integration Services server operations.

The objects that are stored in the SSISDB catalog include projects, packages, parameters, environments, and operational history.

Source :- https://docs.microsoft.com/en-us/sql/integration-services/catalog/ssis-catalog?view=sql-server-ver15

Parameters vs Variables

Deploy and execute an SSIS package

Makes sense?

I would like to emphasize that, I have used many SSIS oriented terminologies (ex-SSISDB, parameters, package deployment, etc). My aim wasn’t to make you uncomfortable with the topic by using so many technical jargons, but rather give you an idea of all the basic components and terminologies to kick start with SSIS. Definitely, some terms will be hard to digest as newbies, so I recommend you to refer the following materials to get the meaning/understanding of those since the main goal of this post is to get your legs wet on SSIS and not to elaborate on every artifact.

To get a clear understanding of the above, I recommend you to build a basic SSIS package to transfer data from a source to a destination with some basic transformations and deploy the created package on a SQL Server Integration Services and execute. This task will make you comfortable with the above.

Refer the below-mentioned links for a simple package creation and terminology explanations respectively,

Conclusion

Organizations started to use ETL to integrate data and gather various insights. ETL is designed to operate majorly with databases and uses batch processing. Every business in the world deals with a vast amount of data. But these are nothing without a powerful tool to clean them up and make them accessible for analysis. So data management is becoming complex and data integration tools are needed to change to keep up the pace. So SSIS helps to successfully run an ETL task in eased manner.

I hope the above has given you a comprehensive idea on ETL and SSIS with all necessary basic components to start with.

Catch you in another post!!!

For further details refer the following,

--

--