CONNECTING AMAZON RELATIONAL DATABASE WITH R
It’s pretty clear that in the past decade Python and R are being used commonly everywhere both as a tool and as open-source programming. R is considered as the best for statistical implementation while Python, on the other hand, is preferred for data analysis and building production-ready models. The easiest way to store data in native formats but in real-world implementations data is seldom read from CSV files. It’s is either sourced from data lakes or captured in the database from data streams.
The cloud comes into picture when handling and storing large chunks of data is required. With the added benefit of scalability and fault tolerance, using cloud infrastructure to both process and store data is very economical and secure. AWS is a broadly adopted cloud platform globally. It provides many services (sagemaker, AWS machine learning, Athena, S3)which can be used by a data analyst/scientist.
Though the services are paid ones, AWS offers a free tier usage that is available to everyone- students, entrepreneurs, small businesses, etc. One can explore and experiment with the AWS services free of charge.
Here, In this simple tutorial, we’ll try to store and retrieve the data required for analysis onto an Amazon Database instance (Amazon RDS) running MySql server.
To create a connection between AWS-RDS with R or Python we first need to set up the relational database instance to connect too.
Amazon RDS has various databases as Amazon Aurora, MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server. In this blog, we will connect R and Python with MySQL, transfer data to the database, query it and do some further analysis.
STEPS TO LAUNCH MySQL DATABASE IN AMAZON RDS:
1. https://aws.amazon.com/console/ Visit to create an AWS account.
2. Once logged in, click on the “Services” and Select “Database->RDS”
3. Click and select “Create database”, Under “Select Engine -> Select MySQL”.
4. At the bottom, just above the “next”, click the checkbox that states “Only enable options eligible for RDS Free Usage Tier”, then click “next”(Though not required as MySql does come under free tier usage)
5. Under “Specify DB details”, specify the details as shown below
6. Under instance specification, select DB engine version, DB instance, DB storage as 20GB class as given(One can increase the database storage but must be careful as you might cross the free limit)
7. Next go to Setting then define your DB instance identifier, specify your master username and fill your DB password (required to connect with the database).
8. Continue to configure advanced settings, here you need to be careful while specifying your Network and Security configurations, database name and Port.
9. Give your Database name and specify Port as 3306
10. Check the default settings in the advanced settings and move forward
11. Click to Create Database.
12. When you have created the database instance, you’ll get endpoint (HOST) i.e. while making R connection with database instance we will require it.
13. CONNECTION WITH R:
For this, you need to install certain libraries.
library(dplyr)
install.packages(“RMySQL”)
library(RMySQL)
library(dbConnect)
library(RMySQL)
library(odbc)
host=”END-POINT”
port=3306
dbname=”Your DBNAME”
user=”USER"
password=”PASSWORD”
14. To connect with AWS:
my_db=dbConnect(RMySQL::MySQL(),dbname=dbname,host=host,port=port,user=user,password=password)
dbListTables(my_db)[1] “no table will be shown”Murder Assault UrbanPop RapeAlabama 13.2 236 58 21.2Alaska 10.0 263 48 44.5Arizona 8.1 294 80 31.0Arkansas 8.8 190 50 19.5California 9.0 276 91 40.6Colorado 7.9 204 78 38.7Connecticut 3.3 110 77 11.1Delaware 5.9 238 72 15.8Florida 15.4 335 80 31.9Georgia 17.4 211 60 25.8Hawaii 5.3 46 83 20.2Idaho 2.6 120 54 14.2Illinois 10.4 249 83 24.0Indiana 7.2 113 65 21.0Iowa 2.2 56 57 11.3Kansas 6.0 115 66 18.0Kentucky 9.7 109 52 16.3Louisiana 15.4 249 66 22.2[1] "Murder" "Assault" "UrbanPop" "Rape"dbListTables(my_db)[1] "US" "tab"> US_DB = tbl(my_db,"US")> class(US_DB)[1] "tbl_MySQLConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl"