Bulk Insert with SQL Server on Amazon RDS
Import and Bulk Insert Amazon S3 files directly to your Amazon RDS for SQL Server database.
With the support of S3 integration, you can now download .csv, .xml, .txt. and other file types from Amazon S3 directly to your Amazon RDS for SQL Server database and import that data with Bulk Insert.
Recently, an oil and gas client asked if we could consume data from a well lifecycle reporting system, so we could integrate it with other systems for data analytics. Seeing as the reporting system is a cloud-based, Software as a Service(SaaS), I assumed(you know…that whole chestnut) they had a RESTful API we could tie into and told the client that shouldn’t be a problem.
After a quick call with the software provider, it turned out that they are in the process of developing a RESTful API, but it wouldn’t be in production anytime soon.
Instead, they were able to provide an FTP folder from which we could retrieve an initial cumulative XML file to replicate the database, daily XML files that contain updates, and a handful of SQL Server helper scripts to create and update the database from the aforementioned XML files. Since we are running a SQL Server database on Amazon RDS for my client, that seemed like a reasonable solution.
As promised, the SaaS company set up an FTP folder with the XML files and provided the SQL scripts with documentation. The first two scripts for creating the database and tables went off without a hitch, and I was feeling very good about life.
The purpose of the 3rd SQL script was to update the database from the provided XML file(s). In order to do this, there was a spot in the script to reference the file path where the XML file was located. I was connected to the SQL Server database on my laptop using SQL Server Management Services(SSMS). Not even thinking about why it wouldn’t work, I referenced the path to the file from my local drive, ran the script, and got this response:
Cannot bulk load because the file "C:\my_local_file_path\XXXX_20010101-20200810\XXXX_20010101-20200810.xml" could not be opened. Operating system error code 3(The system cannot find the path specified.)
I’d experienced this before trying to BULK INSERT .csv files and remembered that the referenced file needed to live on a drive accessible to the database, and not on my local machine.
The issue with that is that Amazon RDS is a managed database service. Typically, I view this as a positive because I’m not a DBA or infrastructure expert, and RDS gets me up and running with a fully functioning, durable relational database in minutes. In this instance, though, it was a negative because RDS didn’t allow access to the underlying operating and file systems…or does it?!
New SQL Server for RDS Feature
Luckily for me, AWS released a new feature on May 15, 2020 that could help with just this!
You can transfer files between a DB instance running Amazon RDS for SQL Server and an Amazon S3 bucket. By doing this, you can use Amazon S3 with SQL Server features such as BULK INSERT. For example, you can download .csv, .xml, .txt, and other files from Amazon S3 to the DB instance host and import the data from
D:\S3\into the database. All files are stored in
D:\S3\on the DB instance.
Thanks, AWS! How did you know I needed this?!
Enabling this Feature
The process is very well documented by AWS, but I will include high-level steps below.
1. Create a New S3 Bucket
It doesn’t have to be new, but mine was. For this example, we’ll call it
2. Create an IAM policy for access to Amazon S3
The IAM policy will need these permissions:
3. Create a New IAM Role and Attach the IAM Policy from above
We will need to attach the previously created policy from above to a new IAM role. This will then be applied to the SQL Server for RDS instance.
4. Associate IAM Role with RDS Database
Go to Services>RDS>DB Instances and select your Database.
Make sure you are in the Connectivity & Security Tab and scroll all the way down until you hit Manage IAM roles.
Find and select the IAM role you just created in the first drop-down and choose S3_INTEGRATION from the second drop-down. Click “Add role.”
5. Transfer/Download Files between RDS and S3
From the AWS Docs:
You can use Amazon RDS stored procedures to download and upload files between S3 and your RDS DB instance. You can also use Amazon RDS stored procedures to list and delete files on the RDS instance.
Well that’s nice!
Stored procedures, like the one below, can be run right from SSMS:
After running that stored procedure, our XML file has been copied from S3 to the D drive on the RDS instance.
6. Re-Run the 3rd Helper Script
Now that the file is accessible on the RDS instance, we can update the 3rd helper script to reference the D drive. Eureka!
Now that that hurdle has been crossed, the next step will be to automate the process using Lambda, or because this implementation will likely require PyODBC, maybe the Event-Driven Containers With Lambda and Fargate solution proposed by my buddy Miles Hill could be a good path forward? I’ll follow up with a future post once I get that implemented. Hope this helps!