How to create your own database with R — part 12 of “R for Applied Economics” guide

Dima Diachkov
5 min readFeb 27, 2023

--

Hello again, my friend. This is me again advocating R for Applied Economics. And today we gonna talk about databases (again, yeah). That is extremely important because whatever data you get — you have to properly store it. And there are cases when the cost of storing everything in file format is higher than in a database. Many people say that you can not simply create a database…

One Does Not Simply Walk Into Mordor. — Boromir

Boromir by MidJourney AI

Fortunately, we have already prepared a review of different databases and ways to access them (check out part 11) for you. Now we can create our own databases! Let’s start.

What is RSQlite package?

SQLite is a lightweight database management system that stores data in a file on disk, rather than requiring a separate server. You do not need to bother with hosting your database, you host it yourself locally, hence — share and modify too. As you already know, RSQlite is an R package that provides an interface to the SQLite database engine, allowing you to create and manage databases using R code.

How to install it

To use the RSQlite package, you need to install it first. You can do this by running the following command in R (of course, if you have not executed this command during the part 11 class):

install.packages("RSQlite")

To proceed, we need some material to work with (so we can insert it in the database and request it back). Hence, we will recover all data we made earlier from our GitHub repo.

# firstly we recover data we made in part 8
library(devtools)
source(“https://raw.githubusercontent.com/TheLordOfTheR/R_for_Applied_Economics/main/Part8.R")

It will create many objects in your environment, and some of them we could store.

Creation of your first local database with R

Once you have installed the package, you can create a new SQLite database using the dbConnect() function. This function takes a connection string as an argument, which specifies the name and location of the database file. In this example, we'll create a new database called economic_data.db in the current working directory:

library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), "economic_data.db")

Voila! Our database is created. When we tried to access database that does not exist — we create one. What can be easier than this? And yeah, there are still people out there that say that databases are difficult because you need to manage them properly and that is why they store EVERYTHING in XLSX…

Don’t get me wrong. Managing large-scale databases is challenging. And sometimes XLSX files are the only plausible solution.

However, there are many practical opportunities for a database like this for internal purposes in the company or research. Since an approach like this is based on a file format database, it means that you easily exchange data, collaborate and manage it over time with a few commands.

How to connect to the newly made database

Well, you already know it. Now that we have created our database, we can connect to it using the dbConnect() function again:

con <- dbConnect(RSQLite::SQLite(), "economic_data.db")

Make sure that you store the database in the working directory. Otherwise, just move it there.

How to insert (write) data into it

Once we have connected to the database, we can create tables and write data to them using SQL queries. For example, we can create a new table called inflation with two columns, Period and Inflation, and insert some data into it like this:

# Write the inflation dataset named df into a table names fct_inflation
dbWriteTable(con, “fct_inflation”, df)
# List all the tables available in the database
dbListTables(con)
The output for the code above

With dbListTables() we listed ALL tables available in the database. And there we see our table. Nice.

Now let’s kill it. You may say “Stop, we just checked our first table in the database”, but I am not afraid to kill it…Only because I know that it will take me one second to restore it.

Please use dbRemoveTable command with connection con.

dbRemoveTable(con, “fct_inflation”)
dbListTables(con)
The output for the code above

Done. It is killed. So please run again the command the re-create it. Let’s look at it.

Extracting (seceting) data

Before we used only the R commands to do something. Now let’s master the most fundamental function of SQL language. Please, welcome theSELECT operator. This is the most famous guest of SQL-commands. You will use only it throughout your life.

For example, we can retrieve all the data from the fct_inflation table like this:

results <- dbGetQuery(con, "SELECT * FROM fct_inflation")
glimpse(results)
The output for the code above

We used SELECT * FROM table_name command in SQL to retrieve ALL rows from the requested table. Plain and simple.

I hope you said “wow” when you executed the code above. It works. With R, you created, inserted, and selected data from your database today. Sounds like a big deal.

Now, please do not forget to close the connection to your database with dbDisconnect(con) and call it a day. Further, we will explore SQL functions and tricks, how to perfectly manage your database from R and SQLite GUIs, upload more and more data to it… It will be fun.

And as usual, here is the underlying R script from our GitHub repo.

Conclusion

It may be boring, but again: databases are essential tools for managing and storing large amounts of data. And fortunately for us, R offers many opportunities for working with databases, including the RSQlite package, which allows you to create and manage an SQLite database using R.
Now we know how to create our own databases, write data to them, and extract it with sophisticated SQL queries. Please follow me for more posts like this. Next, we will explore the functionalities of charting and databases, supplemented with hidden functions.

Let’s keep in touch. We still have a lot of things to learn. From R and SQL perspectives.

Please clap 👏 and subscribe if you want to support me. Thanks!❤️‍🔥

--

--

Dima Diachkov

Balancing passion with reason. In pursuit of better decision making in economic analysis and finance with data science via R+Python