The R sqldf package allows us to execute SQL statements. The clauses, the expressions, the predicates,… are the salvation for many mediocre types such as the now-writer. sqldf is an indispensable module, even Falete’s boyfriend is able to program in SQL.
Obviously I’m not going to teach you how to do queries, but I do want to show you some of the possibilities that this R package offers. “e will work with examples.
# Detach package
# detach("package:some_package", unload = TRUE)# Clean the working memory
rm(list = ls())
# Gargabe colector
# gc()# Set working directory
# Install packages
packages <- c("sqldf")
newpack = packages[!(packages %in% installed.packages()[,"Package"])]
a=lapply(packages, library, character.only=TRUE)
The first step is to create a data.frame:
summary(saldos)saldo1 saldo2 id_cliente alto
Min. : 32.75 Min. : 2.52 Min. : 1.00 1:28
1st Qu.:270.74 1st Qu.:1123.23 1st Qu.: 25.75 2:72
Median :519.33 Median :2644.90 Median : 50.50
Mean :504.42 Mean :2513.83 Mean : 50.50
3rd Qu.:727.64 3rd Qu.:3790.58 3rd Qu.: 75.25
Max. :998.53 Max. :4947.22 Max. :100.00
We create a structure with two numerical variables saldo1 and saldo2, a variable client_id (autonumeric) and a factor that indicates whether the balances are alto (1) or bajo (2). The first step is to summarize the balances by the factor high. Programming in R a mediocre type as I would use the aggregate function:
It doesn’t look exactly like it should, but it’s a quick and simple code on which we can create a function and which is perfectly parameterisable. However Raul Vaquerizo, someone who believes John Locke influenced Richard Alpert more than Hume, prefers to use SQL to add data:
max(saldo1) as max_saldo1,
min(saldo1) as min_saldo1,
max(saldo2) as max_saldo2,
min(saldo2) as min_saldo2
group by alto;'
Simple code. Perfectly understandable by those less skilled in R. Migrating from another application to R may be less complicated than we think, we may lose the fear of a hypothetical migration. Obviously we can also make joins between tables. We compare the code in R with the analogous code in sqldf and so we learn to use the merge function:
#We created a random sample of 50 records
#In this union we keep the sample observations
If this union is attempted by someone who believes that the Philadelphia experiment was real, the first thing he does is a mythical left join (the maximum expression of the ineffective information manager):
from muestra a left join saldos b
The essential inner join in this monographic:
from saldos a , muestra2 b
summary(saldos.muestra.22)id_cliente saldo1 saldo2 alto
Min. : 3.00 Min. :141.6 Min. : 523.4 1:1
1st Qu.: 8.00 1st Qu.:353.2 1st Qu.: 781.9 2:7
Median :52.50 Median :613.8 Median :1613.1
Mean :47.12 Mean :552.1 Mean :1951.3
3rd Qu.:82.25 3rd Qu.:739.6 3rd Qu.:2735.5
Max. :90.00 Max. :855.6 Max. :4942.2
saldo1 saldo2 id_cliente alto
Min. :141.6 Min. : 523.4 Min. : 3.00 1:1
1st Qu.:353.2 1st Qu.: 781.9 1st Qu.: 8.00 2:7
Median :613.8 Median :1613.1 Median :52.50
Mean :552.1 Mean :1951.3 Mean :47.12
3rd Qu.:739.6 3rd Qu.:2735.5 3rd Qu.:82.25
Max. :855.6 Max. :4942.2 Max. :90.00
And finally the full join:
No, I didn’t forget to put the analog code in for sqldf. It’s not possible to do full join or right join in sqldf, but you won’t deny me that the code in R is very simple. Please correct me if I’m wrong and we can do full join with sqldf and I’ll modify the monographic immediately.
I hope you like it.
No matter what books or blogs or courses or videos one learns from, when it comes to implementation everything can look like “Outside the Curriculum”.
The best way to learn is by doing! The best way to learn is by teaching what you have learned!
Never give up!
See you on Linkedin!