Sample datasets to work with in R: Exploring Data Made Easy

Shweta Dixit
4 min readJun 30, 2023
*https://www.simplilearn.com/what-is-r-article

When it comes to learning and practicing data analysis in R, having access to sample datasets is crucial. These datasets provide real-world examples for exploring various data manipulation and analysis techniques. In this blog post, we will explore some popular sample datasets that you can readily use in R. We will showcase each dataset and provide small R code chunk examples to help you get started.

Iris Dataset: The Iris dataset is a classic dataset widely used in the field of data analysis and machine learning. It contains measurements of sepals and petals from three different species of iris flowers. You can load the Iris dataset using the following code:

data(iris)

mtcars Dataset: The mtcars dataset contains information about various car models, including features like miles per gallon (mpg), number of cylinders, horsepower, and more. It is a popular dataset for demonstrating data analysis techniques. Load the mtcars dataset using:

data(mtcars)

ChickWeight Dataset: The ChickWeight dataset provides data on the weight of chicks over time. It includes information about different diet types and the impact on chick growth. This dataset is often used to demonstrate longitudinal data analysis. Use the following code to load the ChickWeight dataset:

data(ChickWeight)

Titanic Dataset: The Titanic dataset contains information about passengers aboard the Titanic, including details such as age, gender, ticket class, and survival status. It serves as a useful dataset for exploring survival analysis and predictive modeling. Load the Titanic dataset using:

data(Titanic)

AirPassengers Dataset: The AirPassengers dataset records the monthly total number of airline passengers from 1949 to 1960. It is a time series dataset that is commonly used for time series analysis and forecasting. Load the AirPassengers dataset with the following code:

data(AirPassengers)

Here in this read, we explored several sample datasets that you can easily access and work with in R. These datasets provide valuable opportunities for practicing data manipulation, visualization, and analysis techniques. By leveraging these datasets along with small R code chunk examples, you can enhance your understanding of data analysis in R and develop your skills in handling real-world data.

When it comes to data analysis in R, having access to a wide range of datasets is essential. In addition to traditional sample datasets, exploring data from the web opens up a world of possibilities. In this blog post, we will explore how to obtain and analyze data from the web using R. We will showcase several web-based datasets and provide code examples to help you harness the power of web data in your data analysis projects.

GitHub Repositories: GitHub is a treasure trove of publicly available data. You can access data from GitHub repositories using the read.csv() function or the readr package. For example, to import data from a CSV file hosted on GitHub, you can use the following code:

url <- "https://raw.githubusercontent.com/username/repository/main/data.csv"
data <- read.csv(url)

Open Government Data: Many government agencies provide open datasets for public use. These datasets cover a wide range of topics such as demographics, public health, transportation, and more. One such resource is data.gov, where you can search and download datasets in various formats. For instance, to import a CSV dataset from data.gov, you can use the following code:

url <- "https://www.data.gov/dataset/example-dataset.csv"
data <- read.csv(url)

Web APIs: Web APIs (Application Programming Interfaces) allow you to fetch data directly from online services. Many websites and platforms provide APIs that grant programmatic access to their data. You can use the httr package to interact with web APIs and retrieve data in JSON or XML formats. Here's an example of fetching data from a hypothetical API:

library(httr)

url <- "https://api.example.com/data"
response <- GET(url)
data <- content(response, "parsed")

Web Data with readr R allows you to easily import and analyze data from the web. The readr package provides functions to read data in various formats, and the httr package enables downloading data from URLs. Here's an example of how to import CSV data from a URL:

library(readr)
library(httr)

url <- "https://example.com/data.csv"
temp_file <- tempfile()
GET(url, write_disk(temp_file))
data <- read_csv(temp_file)

Web Scraping/HTML Tables with rvest: Websites often contain valuable data embedded in HTML tables. The rvest package enables web scraping, allowing you to extract data from these tables. Here's an example of extracting an HTML table from a webpage:

library(rvest)

url <- "https://www.example.com/data-page"
page <- read_html(url)
table <- html_table(html_nodes(page, "table")[1])

The rvest package is perfect for extracting data from websites. It provides a set of functions to scrape data by parsing HTML and XML documents.

In this read, we explored several sample datasets, including datasets from popular packages and web data, that you can readily use in R. These datasets offer valuable opportunities for practicing data manipulation, visualization, and analysis techniques. Additionally, we covered importing data from the web using packages like readr and httr, as well as web scraping with the rvest package.

By leveraging these sample datasets and working with web data, you can expand your understanding of data analysis in R and gain hands-on experience with diverse data sources. So, dive in, explore, and let these datasets inspire your data analysis journey in R!

Remember, these datasets are just the tip of the iceberg. R provides a vast collection of sample datasets for different domains and purposes. So, dive in, explore, and let these datasets inspire your data analysis journey in R!

Keep Learning R!

Happy Exploring with the Samples!

--

--

Shweta Dixit

||LearneR||Academician|| Researcher|| Biostatistician||