Many businesses use enormous volumes of data. Big data is frequently unstructured, raw, and stored in a variety of places, including relational, non-relational, and other storage systems. Organizing this enormous data and turning it into useful business insights is a substantial issue for these firms.
We can utilize the managed cloud service Microsoft Azure Data Factory to get valuable business insights from our disorganized data. Complex hybrid extract-load-transform, extract-transform-load, and data-integration projects can be managed with its assistance.
In this project, we will be moving data from production by using the Azure Data Factory to move and transform our data into an archive database. Our process transformation stage entails modifying our date built because our archive database doesn’t require it. Our SQL Server, Data Factory, and SQL databases are already provisioned, as we can see.
We need to add our IP address to the firewall to make sure we can actually interface with the database by clicking on the SQL server and selecting networking.
Finally, we can add our client's IP address by selecting adding our IP address and saving the settings.
Next, we need to select our SQL database.
When we select the acweb archive database, we see an option for a Query editor(preview).
Using our credentials to log-into our query editor, we will be provided with a basic query interface.
Next, we need to create a test table by using the following code and run it on the query editor. This will create a destination or sink table in our archive database.
CREATE TABLE [dbo].[ProductArchive](
[ProductID] [int],
[Name] [nvarchar](40),
[ProductNumber] [nvarchar](25),
[Color] [nvarchar](15),
[StandardCost] [money],
[ListPrice] [money],
[Size] [nvarchar](5),
[Weight] [decimal](8, 2),
[ProductCategoryID] [int],
[ProductModelID] [int],
[SellStartDate] [date],
[SellEndDate] [date],
[DiscontinuedDate] [date],
[ThumbNailPhoto] [varbinary](max),
[ThumbnailPhotoFileName] [nvarchar](50),
[rowguid] [uniqueidentifier],
[ModifiedDate] [date]
) ON [PRIMARY]
GO
The next step is to move and transform our data using Azure DataFactory. We will now select our data factory from our resources.
Here we see different activities within the data factory.
Now, we will select open Azure Data Factory Studio where we can build pipelines.
From this, we can now create a new pipeline and name it.
Next, we will select a Data flow from Move and Transformation tab and drop a pallet into our workflow and named the data flow, and give it a name.
Next we need to select the setting and under the dataflow to select new.
Next, we need to create a new dataflow and select a new dataset.
Since, we know that our data is coming from the Azure SQL database, we will select Azure SQL database.
Next, we need to create a new linked service
Now, we are going to name our SQL table and add a new linked service with the new database, subscription, database name, and server name, and use the username and password for the authentication and make sure the connection is successful and created. Next, we need to select the right table called SalesLT.product.
The next step is data transformation where we select the derived column for the transformation of the date column.
Change the date from the column from substring to date format.
SellStartDate: toDate(substring(toString(SellStartDate),1,10))
SellEndDate: toDate(substring(toString(SellEndDate),1,10))
DiscontinuedDate: toDate(substring(toString(DiscontinuedDate),1,10))
ModifiedDate: toDate(substring(toString(ModifiedDate),1,10))
We need to repeat the process for the sink.
Select product archive
Next, we can go back to our pipeline tab and publish all.
After publishing, the pipeline can be triggered by selecting the trigger now option or it can also be scheduled.
This will take a few mins to run until it says succeeded.
After we verify that our pipeline run is successful, we can go back to our Query editor and run a SQL query to see if the data was migrated.
We can also check our date columns where we performed the transformation from string to date format. Success!!
To sum up, we were able to move and alter data from our production acweb database and place it in the acweb arch database using Azure data factory. Additionally, we were able to complete all of these jobs quickly by utilizing a fully cloud-hosted ELT system.
Source: Microsoft Azure Data Factory, acloudguru