Introducing SSIS Architecture & DW Concepts Overview

Saranya Mohan
JanBask Training
Published in
8 min readMay 1, 2020

Introduction

Nowadays most of the organizations across the world invest heavily in Information Technology. They have software to maintain the attendance of the employees. They have software for document management. They now have an ERP. They also have document management software, Software to capture CCTV footages and data saved in Excel, Word, and notepad as well. And above all, they have a Business Intelligence Software which provides attractive analytical reports to the management. Now the question arises-how would you manage to get all those data from different applications together in one place and create meaningful reports using them?

The main problem that will arise when you try to bring data from all those databases and other storage media’s together is that all those data are stored on different platforms. For example, your Enterprise Resource Planning Software might be using the Oracle database as back end whereas your HR software could be using MS SQL Server. Although both of this software is databases at the end of the day, each of them has a certain unique way of managing data. It is here that SSIS plays its part.

The most important feature of an SSIS package is the ETL service. The full form of ETL is- extract, transform and load. The process of extraction includes collecting data from different data sources. The transformation includes cleaning up those collected data, removing redundancy and bringing those collected data into a common platform. The loading includes storing that collected data into a separate relational database called data warehouse to be used later on for the analytical and reporting purpose.

SSIS is one such popular ETL tool that is available in the market nowadays. Over the next few paragraphs, we will try to understand the basics of SSIS, different components of SSIS and introduce you to the basics of the data warehouse database.

Data warehouse Design and Components

What is the warehouse? It is a place where goods are safely stored properly so that they can be easily accessed when required. There are different parts of the warehouse building dedicated for storing different types of items. Similarly, a data warehouse is a database where data extracted and cleaned up from different sources can be safely stored and retrieved later on when needed.

Components of Data warehouse are as below

  • Data Warehouse Database: A separate relational database to store the final extracted data that will be used for analytical purposes.
  • Sourcing, Acquisition, Cleanup and Transformation Tools: Different standard software like InformaticaPowercenter, Microsoft SSIS, Talend, etc. that helps in the ETL process.
  • Metadata: This comprises the data related to the configuration of an ETL package developed in SSIS. When we talk about configuration, it means data related to the source of data, the destination database, what fields need to be exported and how many times, etc.
  • Access Tools: Access tools are tools that are used to access the data in the data warehouse. This includes query writing tools, reporting tools, etc.
  • Data Marts: A data mart is a repository of data that is designed to serve a particular community of knowledge workers.
  • Data Warehouse Administration and Management: Includes features on how to provide access rights to different users of a specific set of data.
  • Information Delivery System: This includes the process of delivering specific data to one or more locations using specific algorithms.

Catalog DB

The SQL Server Integration Service Catalog database is the fundamental center when we work with Integration Services (SSIS) projects that are transferred to the Integration Services server. For instance, project and parameters can be set, design situations to indicate runtime values for package execution and investigate packages, and can oversee Integration Services server activities.

Control Flow

A control stream portrays a work process of tasks that can be executed, here and there in a specific order (We have utilized the precedence control operator). The looping example is a decent case of a control-flow requirement, likewise, standalone SQL Scripts can be executed, call into COM interfaces can be made, .NET components can be executed, or email can be sent.

Inside control flow, the dataflow tasks are found.

DataFlow

The Data Flow task hides the data flow engine that relocates data from sources to destinations, and allows the user to transform, clean, and modify data as it is moved. Adding a Data Flow task to a package control flow makes it possible for the package to extract, transform, and load data.

Components of DataFlow

Source

Transform

Destination

This control permits us to determine the data source from where data will be extracted.

These controls help to delineate source data types with the target data types.

These controls help to indicate the target database where data will be located.

Below is a small example of the SSIS Package.

The following package connects to a SQL Database, transforms the data, and multicast it to multiple flat files.

Read: SQL Server Recovery Models-Simple, Full and Bulk Log

SSIS Datatypes

Below is a screenshot of the types of data types in SSIS.

Variables

Variables store esteems thatanSSISs package and its containers, tasks, and event handlers can use at run time.

SSIS Local and Global Variables

SSIS Local Variables

SSIS Global Variables

Defined by package developers

Defined by Integration Services

Can create as many variables as possible

Cannot add new variables with existing variables

Dynamic Precedence control

Precedence constraints interface executable, containers, and tasks in packages in a control flow, and indicate conditions that decide if executable run. An executable can be a For Loop, for each Loop, or Sequence compartment; a task; or an event handler. Event handlers likewise use precedence constraints to connect their executable into a control flow.

A precedence constraint connects two executable: the precedence executable and the constrained executable. The precedence executable runs before the constrained executable and the execution consequence of the precedence executable may decide if the constrained executable runs. The accompanying outline shows two executable connected by a precedence constraint.

The beneath example shows three tasks: — OLE DB Source, Data Conversion, Multicast. They are associated with an arrow-like priority control.

SSIS Environment Variables

SSIS Environment Variables give the instrument to set values at the time a bundle is executed. This usefulness is helpful for any number of things, much of the time for indicating various qualities between Development, quality, and production environments.

How to generate a dynamic connection string

Below we will learn how to generate a dynamic connection string for an SSIS project. We will create an SSIS package which will try to import a data set from two excel file having the same set of data.

Read: How To Become SQL Certified Professional To Grow Your Career?

Step 1

Let us first create two excel files and name them Testfile1 and Testfile2. The excel files look as below.

Testfile1 Testfile2

Step 2

Let us now create an SSIS package. The package will look like below.

Step 3

The exported data will look like below.

Step 4

Let us now create a variable to hold the file names to be exported

Step 5

Let us now make the excel connection string flexible. That means we are going to parameterize the connection string.

We will right-click on the Excel source connection string and we will choose properties

Step 6

Copy the text in the connection property.

Step 7

Then let us go to the expression property

Read: The Evolution of SQL Server Versions and Editions

Step 8

Let us now create a custom expression for the connection string property.

Step 9

Next, we choose the connection string under the property and paste the copied connection string in the expression section.

Step 10

We need to edit the connection string as below.

The initial connection string was like below

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\SSIS File\Testfile1 ;Extended Properties=”EXCEL 12.0 XML;HDR=YES”;

And we edited it to

“Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\\SSIS File\\” + @[User::Excel] + “ ;Extended Properties=\”Excel 12.0 XML;HDR=YES\”;”

Step 11

Now let us change the file name in the variable value like below and run the package

Step 11

As you can see as soon as we change the file name and run the package, the output changes like below.

So in this way, we can use dynamic connection string in the SSIS package to connect to different source files.

SQL Server Training & Certification

  • No cost for a Demo Class
  • Industry Expert as your Trainer
  • Available as per your schedule
  • Customer Support Available

Summary

In this blog, we have tried to give you an introduction to basic SSIS mechanisms. We have discussed data warehouse, different SSIS tools, and variables and also parameterized connection string. This would give a head start to someone interested in knowing about SSIS and can go ahead learning more on the topics discussed above in detail.

--

--

Saranya Mohan
JanBask Training

Creative and performance-driven Social Media Analyst with 3+ years of exp. in conducting marketing research, analyzing data, & producing social media content.