CONNECTING AMAZON RELATIONAL DATABASE WITH R

Afaf Athar
Analytics Vidhya
Published in
4 min readJan 9, 2020
Photo by Tobias Fischer on Unsplash

It’s pretty clear that in the past decade Python and R are being used commonly everywhere both as a tool and as open-source programming. R is considered as the best for statistical implementation while Python, on the other hand, is preferred for data analysis and building production-ready models. The easiest way to store data in native formats but in real-world implementations data is seldom read from CSV files. It’s is either sourced from data lakes or captured in the database from data streams.

The cloud comes into picture when handling and storing large chunks of data is required. With the added benefit of scalability and fault tolerance, using cloud infrastructure to both process and store data is very economical and secure. AWS is a broadly adopted cloud platform globally. It provides many services (sagemaker, AWS machine learning, Athena, S3)which can be used by a data analyst/scientist.

Though the services are paid ones, AWS offers a free tier usage that is available to everyone- students, entrepreneurs, small businesses, etc. One can explore and experiment with the AWS services free of charge.

Here, In this simple tutorial, we’ll try to store and retrieve the data required for analysis onto an Amazon Database instance (Amazon RDS) running MySql server.

To create a connection between AWS-RDS with R or Python we first need to set up the relational database instance to connect too.

Amazon RDS has various databases as Amazon Aurora, MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server. In this blog, we will connect R and Python with MySQL, transfer data to the database, query it and do some further analysis.

STEPS TO LAUNCH MySQL DATABASE IN AMAZON RDS:

1. https://aws.amazon.com/console/ Visit to create an AWS account.

2. Once logged in, click on the “Services” and Select “Database->RDS”

3. Click and select “Create database”, Under “Select Engine -> Select MySQL”.

4. At the bottom, just above the “next”, click the checkbox that states “Only enable options eligible for RDS Free Usage Tier”, then click “next”(Though not required as MySql does come under free tier usage)

5. Under “Specify DB details”, specify the details as shown below

6. Under instance specification, select DB engine version, DB instance, DB storage as 20GB class as given(One can increase the database storage but must be careful as you might cross the free limit)

7. Next go to Setting then define your DB instance identifier, specify your master username and fill your DB password (required to connect with the database).

8. Continue to configure advanced settings, here you need to be careful while specifying your Network and Security configurations, database name and Port.

9. Give your Database name and specify Port as 3306

10. Check the default settings in the advanced settings and move forward

11. Click to Create Database.

12. When you have created the database instance, you’ll get endpoint (HOST) i.e. while making R connection with database instance we will require it.

13. CONNECTION WITH R:

For this, you need to install certain libraries.

library(dplyr)
install.packages(“RMySQL”)
library(RMySQL)

library(dbConnect)
library(RMySQL)
library(odbc)

host=”END-POINT”
port=3306
dbname=”Your DBNAME”
user=”USER"
password=”PASSWORD”

14. To connect with AWS:

my_db=dbConnect(RMySQL::MySQL(),dbname=dbname,host=host,port=port,user=user,password=password)

dbListTables(my_db)[1] “no table will be shown”Murder Assault UrbanPop RapeAlabama          13.2     236       58 21.2Alaska           10.0     263       48 44.5Arizona           8.1     294       80 31.0Arkansas          8.8     190       50 19.5California        9.0     276       91 40.6Colorado          7.9     204       78 38.7Connecticut       3.3     110       77 11.1Delaware          5.9     238       72 15.8Florida          15.4     335       80 31.9Georgia          17.4     211       60 25.8Hawaii            5.3      46       83 20.2Idaho             2.6     120       54 14.2Illinois         10.4     249       83 24.0Indiana           7.2     113       65 21.0Iowa              2.2      56       57 11.3Kansas            6.0     115       66 18.0Kentucky          9.7     109       52 16.3Louisiana        15.4     249       66 22.2[1] "Murder"   "Assault"  "UrbanPop" "Rape"dbListTables(my_db)[1] "US"  "tab"> US_DB = tbl(my_db,"US")> class(US_DB)[1] "tbl_MySQLConnection" "tbl_dbi"             "tbl_sql"             "tbl_lazy"            "tbl"

--

--

Afaf Athar
Analytics Vidhya

I Do Data. I write what I wish I could have read when I was younger