Migrating Google Sheets to Azure SQL Database with Azure Data Factory

Suren Edilyan
9 min readDec 9, 2023

--

Prerequisites

  • An active Azure subscription. If you don’t have one, create a free account.
  • An active Google Sheet inside of your Google account.

Google Sheet

  1. Open Google Sheet.
  2. Click: Share (on Top-Right side of Sheet) >> “Anyone with the link” >> “Done”.

3. Open https://console.cloud.google.com/ Coogle Cloud Portal. If You sign it in first time than choose your Country, accept Terms of Service than click “AGREE AND CONTINUE”

4. Click “Select a project” (on Top-Left side of screen)

5. Click “New Project”.

6. Name a project, then click “CREATE”.

7. Wait until the project is created and click “SELECT PROJECT”.

8. On the new page select “Product & solutions” (on Top Left).

9. Click on: “APIs & Services”.

10. Click on “ENABLE APIS AND SERVICES”

11. Select “Google Sheet API”.

12. Click “ENABLE”.

13. Click: “Credentials” >> “+ CREATE CREDENTIALS” >> “API key”.

14. Copy the API key.

Now back to Azure Portal

15. Create a resource group: Choose the appropriate subscription and name the resource group following naming conventions.

16. Create an SQL Server: Choose the appropriate subscription and resource group. Name the server, select the location, choose the authentication method, and then click “Next: Network”. Allow Azure services and resources to access this server by selecting “Yes”. Skip other settings and click “Review + create”, next “Create”.

17. Create SQL Database: Begin by selecting the appropriate subscription and resource group. Provide a name for the Database and choose the Server which created in the previous step. Skip the remaining settings and click “Next: Networking”. On the Networking page, select “Yes” for the “Add current IP address” field. Skip any additional settings and click “Review + create”. Finally, click “Create” to complete the process.

18. Open the “Query editor” in the Azure SQL Database page (or another tool you prefer, for example, SQL Management Studio). Then, create a simple table with columns, with the “CREAT TABLE” query, similar to those in your Google Sheet. In our case it will be like:

CREATE TABLE [FinancialTransactions]
(
[Date] DATE,
[MoneySpent] DECIMAL(18, 2),
[MoneyEarned] DECIMAL(18, 2),
[Total] DECIMAL(18, 2)
);

18*. If there is any exception related to access to the database, you can add your current IP address by clicking the offered link at the bottom of the message.

19. Create ADF (Azure Data Factory): Choose the appropriate subscription and resource group. Name the ADF, select the location. Skip other settings and click “Review + create”, next “Create”.

20. Inside ADF main page click “Lunch studio”.

21. Go to “Manage” (on Top-Left side menu) >> “Linked services” >> “+ New”.

22. Search “Azure SQL Database”, select it and click on “Continue”.

23. Then, it’s open a new pop-up windows where we need to provide the Linked service name. Select the appropriate subscription, service name, database name, authentication type. Then, enter the username and password. After that, click “Test connection”, and if it shows “Connection successful” click “Create”.

24. Once again, click “+New”, on pop-up search “Google Sheet”, select it, and click “Continue”.

25. Then, it opens a new pop-up window where we need to provide the Google Sheet Token obtained in step 14.

26. In ADF go to “Autor” on Top Left side menu >> Click “+” Add new resource >> “Pipeline” >> “Pipeline”.

27. Now, we have new pipeline under the “Pipelines” section. Next, we need to select the “Data flow” activity under “Activities” >> “Move and transform” >> “Data flow”, and with the help of drag & drop, place it on the white dashboard.

28. Select “Data flow” activity >> “Settings” >> “+ New”. After that, it opens a new dashboard for “Data flow”.

29. Click “Add Source” and enable debugging mode by clicking “Data flow debug” and on the opened popup, click “OK”.

30. Enter the appropriate “Output stream name” and “Description”. Choose “Inline” as the “Source type”. Under the “Inline dataset type” section, search for “Google Sheet”, then select the “Linked service” for our “Google Sheet”, as we name it in the Linked service part.

31. From your Google Sheet URL select and copy “SpreadSheet ID”.

32. Now, go to “Source options”. In the “SpreadSheet ID” textbox, past the copy from the previous step (step 31). Under “Sheet name”, click “Refresh” and wait until it shows green icon with “Success” and select the appropriate sheet name. For “Start cell”, choose the first top-left data cell of the sheet (for example A2). For “End cell”, select the bottom-right data cell of your sheet (for example D8).

*If the “Refresh” button is not active, it indicates that you may have missed enabling debugging mode at “Step 29”.

33. Now, we can go back to “Source settings” and click on “Test connection” and wait until it shows you green icon with “Connection successful” expression.

34. Now, navigate to the “Projection” section and click on “Import schema”.

We can assign the appropriate data types to each column here. However, since Google Sheet’s types may not perfectly align with those in ADF, we must designate them as strings. The casting process will be performed in the subsequent step.

35. We can skip the “Optimize” and “Inspect” sections. In the “Data preview” section, after clicking “Refresh”, we can view the data from Google Sheet.

36. To facilitate the casting process, we need to include the “Cast” modifier, which can be found in the “Schema modifier” section.

37. Within the “Cast” modifier, we need to select the appropriate type and format for our columns. In our case:
* column_1 Type: date Format: MM/dd/yyyy
* column_2 Type: decimal Format: $#########.###
* column_3 Type: decimal Format: $#########.###
* column_4 Type: decimal Format: $#########.###

38. To display two numbers after the floating point, click “Edit” under the decimal type settings.

It will open a new popup menu where we need to specify “Scale” as 2.

39. Then, we can proceed without altering “Optimize” and “Inspect”, leaving them unchanged, and navigate to the “Data preview” section to click on “Refresh”.

40. Now, we need to create a “Sink”. Click on the “+” button, then select “Sink” under the “Destination” section.

41. Now, under the “Sink” section at the bottom, let’s fill in the appropriate data. Enter values in the “Output stream name” and “Description” textboxes. Skip the “Incoming stream” select box as it automatically fills in the correct data. Set “Sink type” to “Dataset”.

Unser the “Dataset” section, click “+ New”, then in the popup, search for “Azure SQL Database”, select it and click “Continue”. Fill in the “Name”, select Azure SQL Database “Linked service”, that we have created previously (Step 22), choose the appropriate table, and click “OK”.

In the “Options” section, we can select “Allow schema drift” and “Validate schema”.

42. Now, we can skip the “Setting” and “Errors” section. Under the “Mapping” section, uncheck “Auto mapping”. Here, we encounter preview problems, so we need to manually select the appropriate columns.

43. Now go to “Data preview” and click “Refresh”. And you can see preview data.

44. Naw for cacking purposes we can go back to our pipeline and click “Debag”.

If you want to save this pipeline you need to click “Publish all” on top of dashboard.

45. Check status, and if it succeeded, we can “Publish all”.

46. After all we can check Azure SQL Database.

--

--

Suren Edilyan

Voracious software developer. Avid investigator. Active researcher.