Migrating different database tables into databricks

Arun SK
BI3 Technologies
Published in
5 min readFeb 1, 2023

This blog demonstrates how to migrate data from different database to databricks.

Migrating snowflake database table into databricks

The steps below describe how to migrate a snowflake table to a databricks table.

STEP 1:

Create a new snowflake account or use an existing account.

STEP 2:

Create a table in snowflake.

CREATE OR REPLACE TABLE MAIN.CITY(CITY_ID BIGINT,CITYNAME STRING,PINCODE INT,COUNTRY STRING);

Below query is used to insert values in the table,


INSERT INTO MAIN.CITY VALUES
(10020,'NEW DELHI',110001,'INDIA'),
(10021,'MUMBAI',400001,'INDIA'),
(10022,'CHENNAI',600028,'INDIA'),
(10023,'LONDON',400706,'UNITED KINGDOM'),
(10024,'PARIS',403101,'FRANCE'),
(10025,'DUBAI',600017,'UNITED ARAB EMIRATES'),
(10026,'SINGAPORE',395004,'SOUTHEAST ASIA'),
(10027,'NEW YORK',360005,'UNITED STATES OF AMERICA'),
(10028,'TOKYO',533101,'JAPAN');

To view the table, use below select query in snowflake,

SELECT * FROM MAIN.CITY;
SOURCE TABLE(SNOWFLAKE)

STEP 3:

Provide the username, URL (hostname), password, warehouse name and database name to connect snowflake to databricks.

url_1="hostname"
user="snowflake_username"
password="snowflake_password"
database="database_name"
sfWarehouse="warehouse_name"

sfOptions = {
"sfUrl": url_1,
"sfUser": user,
"sfPassword": password,
"sfDatabase": database,
"sfWarehouse": sfWarehouse
}

Note: sfUrl, sfUser, sfPassword, and sfDatabase must be provided. And sfWarehouse can be optional.

STEP 4:

As shown below, select query should be saved in a string variable. Get the data from query and store it in a dataframe using spark methods.

query=”SELECT * FROM MAIN.CITY”
df=spark.read.format("net.snowflake.spark.snowflake").options(**sfOptions).option("query", query).load()

show() is used to display the contents of the DataFrame.

df.show()
Data stored in dataframe

Below saveAsTable() used to save the contents of the DataFrame to a databricks table,

df.write.saveAsTable(“main.city”)

If schema is not provided, the table is stored in default schema. If schema is provided, the table will be stored in mentioned schema.

To view the schema of the table, the below command can be used in databricks sql.

DESC TABLE MAIN.CITY;

The schema of the table is shown below,

To select the table, use below command in databricks sql.

SELECT * FROM MAIN.CITY;
DATABRICKS TABLE

Migrating oracle database table into databricks

The steps below describe how to migrate data from oracle table to a databricks table.

STEP 1:

Create a table in oracle database.

CREATE TABLE MAIN.LAPTOPS(LAPTOP_ID INT,LAPTOP_NAME VARCHAR(40),SPECIFICATION VARCHAR(70),OS VARCHAR(30),PRICE_IN_RUPEES INT);

Below query is used to insert values in that table,

INSERT ALL
INTO MAIN.LAPTOPS VALUES(1,'Apple 16-Inch MacBook Pro M1 Max','10-core | 3.1 GHz Processor ','Mac Monterey OS',309490)
INTO MAIN.LAPTOPS VALUES(2,'Lenovo Yoga 9i','11th Gen Intel Core i7–1185G7 | 3.00 GHz processor', 'Windows 10 Home OS',164686)
INTO MAIN.LAPTOPS VALUES(3,'Dell New XPS 13 Plus','12th Gen Intel EVO Core i7–1260P | 3.40 GHz Processor', 'Windows 11 OS',215000)
INTO MAIN.LAPTOPS VALUES(4,'Apple MacBook Air M2','Apple M2 | NA Processor',' iOS OS',139390)
INTO MAIN.LAPTOPS VALUES(5,'Acer Nitro 5','12th Gen Intel Core i7–12700H | 4.70 GHz processor', 'Windows 11 Home OS',101000)
INTO MAIN.LAPTOPS VALUES(6,'Lenovo IdeaPad Gaming 3i','12th Gen Intel Core i5–12450H | 2.0 GHz processor', 'Windows 11 Home OS',79900)
INTO MAIN.LAPTOPS VALUES(7,'Acer Swift X','AMD Ryzen 7–5800U | 1.9 GHz processor', 'Windows 11 Home OS',99999)

To view the records in the table, use below select query in Oracle database,

SELECT * FROM MAIN.LAPTOPS;
SOURCE TABLE(ORACLE)

STEP 2:

To install oracledb module, below pip command is required.

pip install oracledb

STEP 3:

Import the module and provide Oracle database credentials as shown below,

import oracledb
user='oracle_username'
password='oracle_password'
port='port_no' #default : 1521
service_name='ORCL'
host_name='host_name of oracle database'
conn_string="{0}:{1}/{2}".format(host_name,port,service_name)

STEP 4:

Establish a connection to a database using cursor () and execute a sql query using execute().

connection= oracledb.connect(user=user, password=password, dsn=conn_string)
cursor=connection.cursor()
query=cursor.execute("SELECT * FROM MAIN.LAPTOPS")

STEP 5:

To extract the column names, use cursor.description as shown below,

column_names = ([i[0] for i in cursor.description])
print(column_names)

This column_names variable contains columns in the table as a list.

COLUMN_NAMES STORED IN A LIST

STEP 6:

Create a dataframe with the query and column_names variable.

table_df=spark.createDataFrame(query,column_names)
table_df.show()

STEP 7:

Create a table with saveAsTable(). If no schema is provided, the table is stored in default schema.

table_df.write.saveAsTable(“laptops”)

If schema is provided, the table will be stored in mentioned schema.

table_df.write.saveAsTable(“main.laptops”)

The schema of the table and databricks table is given below,

SCHEMA OF THE DATABRICKS TABLE
DATABRICKS TABLE

CONCLUSION:

At last, the tables from the Oracle and Snowflake databases are stored into another table in DataBricks by following these steps.

ABOUT US:

Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe and we’re quickly building a brand that is well-known for superior delivery.

Website: https://bi3technologies.com/

Follow us on,
LinkedIn: https://www.linkedin.com/company/bi3technologies
Instagram:
https://www.instagram.com/bi3technologies/
Twitter:
https://twitter.com/Bi3Technologies

--

--