Merging multiple dataframes in R
R has lots of handy functionality for merging and appending multiple dataframes. In particular, I’d like to cover the use case of when you have multiple dataframes with the same columns that you wish to merge together.
As an example, let’s break the UFOs dataset from Kaggle into multiple pieces and see how we can easily fit it back together.
To split the data, we’ll use the following function devised by kcha:
ufos <- read.csv("scrubbed.csv")
splitDf <- split_k(ufos,4)
We split the data into four dataframes, ufo1, ufo2, ufo3 and ufo4.
ufo1 <- splitDf$`1`
ufo2 <- splitDf$`2`
ufo3 <- splitDf$`3`
ufo4 <- splitDf$`4`
Now, if we wanted to merge just the first two dataframes back together, we could use rbind, as follows:
ufo1And2 <- rbind(ufo1,ufo2)
As we can confirm by using NROW that the data has been successfully merged:
> NROW(ufo1)
[1] 20083
> NROW(ufo2)
[1] 20083
> NROW(ufo1And2)
[1] 40166
However, as we have split our data into four parts, it’s not inconceivable to think that we’ll want to merge four dataframes. Rbind as is only accepts two dataframes, so we have to adjust our code slightly to accommodate for more dataframes.
Merging more than two dataframes
To merge more than two dataframes within R, we can combine our rbind call with do.call.
do.call("rbind", list(dataframes to merge))
How this works:
The do.call function allows us to use any existing R functions, such as the aforementioned rbind, but instead of passing just two dataframes in this instance, we are able to pass multiple.
Therefore, when used alongside rbind, we can use do.call to merge more than two dataframes at once.
Putting this all together, we can merge our UFO dataframes back together as follows:
ufoMerged <- do.call("rbind", list(ufo1, ufo2, ufo3, ufo4))
We can confirm our data has been successfully merged by checking the number of rows:
> NROW(ufoMerged)
[1] 80332
> NROW(ufos)
[1] 80332
And by performing an anti_join using dplyr, which returns null:
library(dplyr)
ufoChecker <- anti_join(ufoMerged, ufos)
ufoChecker
[1] datetime city state country
[5] shape duration..seconds. duration..hours.min. comments
[9] date.posted latitude longitude
<0 rows> (or 0-length row.names)
This is just one method for merging multiple dataframes, and over at R-Bloggers you can find speed comparisons for alternative methods. However, I find do.call in combination with rbind one of the most intuitive methods for joining multiple dataframes.
Useful Resources
Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News
Also, Read
- Crypto Telegram Signals | Crypto Trading Bot
- Copy Trading | Crypto Tax Software
- Grid Trading | Crypto Hardware Wallet
- Crypto Exchange | Crypto Apps in India
- Best Crypto APIs for Developers
- Best Crypto Lending Platforms
- An ultimate guide to Leveraged Token
- Coinbase Review| Deribit Review |FTX Review
- Coinmetro Review | VirgoCX Review
- 4 Best Crypto Copy Trading Platforms in France
- 5 Reasons to Switch from WazirX to CoinDCX
- Unocoin Review | Best Crypto Staking Coins
- How to get a KCC Address using MetaMask Wallet?