How to Convert SQL into R Data Frame? Part 1

Informula
IMU Framework Design
2 min readJun 2, 2023

In this series of articles, we will study how to convert SQL statements into R data frame syntax. We are using Kaggle Notebook to access the SQLite database file stored in Cloud Drive and to run the R scripts.

Underlying data/ database

We upload the SQLite DB file named with “sample.db” under into Kaggle notebook (/kaggle/input/sqlite/sample.db). The data we are using is SQLite Sample Database Chinook.

library(RSQLite)
conn <- dbConnect(RSQLite::SQLite(), "/kaggle/input/sqlite/sample.db")
df_albums <- dbGetQuery(conn,'select * from albums')
df_tracks <- dbGetQuery(conn,'select * from tracks')
df_invoices <- dbGetQuery(conn,'select * from invoices')
df_invoicesitem <- dbGetQuery(conn,'select * from invoice_items')

Head / Offset

-- SQL Top 5 Rows
select * from albums LIMIT 5
# R Top 5 Rows
head(df_albums,5)
-- SQL Top 5 Rows starting from the 10th row
select * from albums LIMIT 5 OFFSET 10;
# R Top 5 Rows starting from the 10th row
df_albums[11:15,]

# R
head(df_albums[11:nrow(df_albums) ,], 5)

Rename / Alias

-- SQL Rename AlbumID as AlbumNum
select AlbumId as AlbumNum, Title from albums
# R Rename AlbumID as AlbumNum
colnames(df_albums)[1] = 'AlbumNum'
df_albums[,1:2]

Concatenate Two Columns

-- SQL Concatenate Two Columns with '-'
select
*,
ifnull(BillingState, 'None') || '-' || BillingCountry as BillingCityCountry
from invoices
# R Concatenate Two Columns with '-'
df_invoices$BillingCityCountry1 <- paste(paste(df_invoices$BillingState,'-'), df_invoices$BillingCountry)

Remove Duplicate

-- SQL distinct values of customerID
select distinct CustomerId from "main"."invoices"
# R distinct values of customerID (unsorted)
unique( df_invoices[ , c('CustomerId') ] )

Enjoy it! If you want to support Informula, you can buy us a coffee here :)

𝗕𝘂𝘆 𝗺𝗲 𝗮 𝗰𝗼𝗳𝗳𝗲𝗲

Thank you and more to come :)

--

--