How to write user-defined functions in R — part 4 of “R for Applied Economics” guide

Dima Diachkov
7 min readJan 29, 2023

--

What is a function and a user-defined function?

A function is a block of code that performs a specific task and can be reused multiple times. A user-defined function is a function created by the user to perform a specific task that is not available in the package. User-defined functions are a key component of the R programming language, and they are widely used in various packages (e.g. just like those we used in previous parts: rvest and ggplot2).

For example, based on previous parts of the guide (Part 1, Part 2) the html_nodes() function from the rvest the package is used to extract specific elements from an HTML document and the ggplot() function from the ggplot2 the package is used to create a graph.

Usually, people learn how to write custom functions on advanced levels of programming in R. I personally think that such an approach is cruel because it makes people re-do similar tasks multiple times. Some people say that it is part of the educational process. Maybe they are right, but I would like to show you how to write your own custom function to save you (and me) some time for other things.

“All we have to decide is what to do with the time that is given us.” — Gandalf

Gandalf by MidJourney AI

Why do you need user-defined functions right now?

Basically, the key message of this article is the following:

User-defined functions are an essential tool for automating repetitive tasks and making your code more efficient and organized.

And my task here today is to help you start using this advantage. Instead of rewriting the same code multiple times, a user-defined function allows you to encapsulate that code into a single, reusable function. This not only saves time and effort but also makes your code easier to read and understand. The approach is pretty straightforward and helps you to maximize your efficiency.

Glenn Carstens-Peters | Unsplash

Moreover, while we are learning to create our own functions, I will also show you how to create basic handling of exceptions (errors, etc.). Just jump right in.

How to define your (first) own function?

Creating a user-defined function in R is mega-simple. The basic syntax is as follows:

function_name <- function(argument1, argument2, ...) {
# code to be executed
}

For example, let’s say you want to create a function that calculates the average of two numbers. You could write the function like this:

average <- function(x, y) {
(x + y) / 2
}

You can then call the function by passing in the two numbers:

average(5, 10)

This would return the value of 7.5, which is the average of 5 and 10.

Another (more practical) example of a user-defined function is using the mpg dataset from ggplot2 package, say you want to create a function that plots the relationship between engine displacement and highway miles per gallon for all vehicles.

library(ggplot2)
data(mpg)

disp_vs_hwy_mpg <- function(){
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
ggtitle("Engine Displacement vs. Highway Miles per Gallon")
}

You can call the function as

disp_vs_hwy_mpg()
The output of the code above

Quick tips to create better functions

  • Keep your functions small and focused on a single task
  • Use clear and descriptive names for your functions and arguments
  • Use comments to explain what your function does and how to use it
  • Use return() statement to output the result of the function
  • Test your function thoroughly to ensure it works as expected
  • Avoid using global variables inside the function

Practice: writing user-defined functions to parse data from the website “Trading Economics” with inflation by country

To proceed with our task for Gandalf (business case from part 1), we draw a beautiful chart on inflation (part 2), but we also decided to parse data on inflation in various EU countries to provide insight on the heterogenous breakdown of these countries. For this purpose, the website “Trading Economics” is good to work with, because it has exactly what we need.

The target for parsing — table on inflation in Europe by country on Trading Economics

We already know that rvest package makes it easy to scrape data from websites. Parsing data from websites can be a tedious task, but with the help of the rvest package, it can be made more manageable. In this section, we will walk through the process of creating a user-defined function that can be used to scrape data from the "Trading Economics" website. First, we will load the rvest package and set up the necessary libraries. Next, we will create a function called parse_web_table that takes in a URL as an input and return a data frame with the desired table as an output. Also, this function is supplemented with basic handling of excepting. Please check my comments below.

parse_web_table <- function(link = "", table_number = 1)
{
# we call the package inside of the function so it is called automatically every time you use the function
library(rvest)

# here we check the provided link for being non-empty string
if(link == "")
{stop("No URL provided")}

# then we try to parse the URL, but if it fails - we print error message and stop function
try(parsed_data <- read_html(link), stop("Something went wrong...Please, check the link you provided."))
try(parsed_table <- html_table(parsed_data), stop("Something went wrong...Seems like there are no tables available."))
try(raw_df <- as.data.frame(parsed_table[[table_number]]), stop(paste0("Something went wrong...Seems like the link does not have table number ",table_number, " or any tables at all")))

return(raw_df)
}

This code is self-explanatory, but if you need a few comments, I will briefly explain the course of action:

  • This script creates a function called “parse_web_table” that takes two input arguments: “link” and “table_number” (we need the table number to select the table we want to extract)
  • The function uses the rvest package to scrape data from a web page specified by the “link” argument.
  • The function first checks if the link is an empty string and if so, stops the function and returns an error message saying “No URL provided.”
  • It then tries to parse the URL using the read_html function, if it fails then it returns an error message saying “Something went wrong…Please, check the link you provided.”
  • It then try to extract table data from parsed_data using html_table function and if it fails it returns an error message saying “Something went wrong…Seems like there are no tables available.”
  • It then tries to convert the table data into a dataframe and if it fails it returns an error message indicating that the link does not have the specified table number or any tables at all.
  • If all above steps are successful then it returns the dataframe containing the table data.

Now we try to encounter our exceptions. Let’s try to run the function without the URL provided.

The output when URL is not provided

It works! Can we try to submit a link to some html page without tables at all? Sure, I took a link to a twitter account of “Trading Economics” that has no tables in it.

The output when URL without tables provided

We caught it. Let’s do one more thing. The next step will be providing the wrong (non-existing) URL.

The output when wrong URL was provided

Now it is fireproof from a trouble-shooting perspective. We can now use this function to scrape data by passing in a URL as input. For example, we can use the following code to scrape inflation by country from “Trading Economics” (or any other website):

Parsing data on inflation rates

Nice! It works! Can we try another dataset from some other webpage? Sure! Let’s replace the link to inflation with a link to data on unemployment.

Parsing data on unemployment rates

One more test: let’s run our function with the link, which is different from “Trading Economics”. Let’s take data on GDP per capita from Wikipedia.

Parsing data on GDP from Wikipedia

Hooray. It works, and it is tested. Now we can use such own functions to web scrape economical (and not only) data from web pages directly and refresh it too. Also, user-defined functions will help us to spend less time on repetitive tasks, just as it was shown with parsing different websites using only one command.

As usual, the FULL code is available at the designated Github repo for your convenience: https://raw.githubusercontent.com/TheLordOfTheR/R_for_Applied_Economics/main/Part4.R

Conclusion

Today we have learned about user-defined functions and handling exceptions in R and why they are useful for recurring tasks. We have covered the basics of creating a user-defined function, including the syntax and input/output parameters.

We have also provided an example of using a user-defined function to scrape data from “Trading Economics” (or any other website) using the rvest package.

By creating user-defined functions, we can automate repetitive tasks and make our work more efficient and streamlined. Overall, understanding how to create and use user-defined functions is an important skill for any R user. Next time we will use our user-defined function to collect a lot of data and join it for further visualization.

Please clap 👏 and subscribe if you want to support me. Thanks!❤️‍🔥

--

--

Dima Diachkov

Balancing passion with reason. In pursuit of better decision making in economic analysis and finance with data science via R+Python