Navigating Data Types in R

Author

Lindsay N. Hayes

Published

July 3, 2025

About the activity

  1. Access the Quarto document here.

  2. Download the raw file.

  3. Open it in RStudio.

We will work our way through this quarto document together during class. The activity will cover using R as a calculator, creating R objects, and exploring the features of a data set.

First Load the Tidyverse Package

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Ways to Use R

1. Arithmetic

# example: addition/subtraction/multiplication/division

193 + 45
[1] 238
2050 - 2025
[1] 25
50/250 * 100
[1] 20
# activity: assign the variable my_age as the current year minus your year of birth. 

2. Create R objects

# vector
# c(..., recursive = FALSE, use.names = TRUE)
x <- c(1:225)
class(x)
[1] "integer"
# matrix
# matrix(data, nrow, ncol, byrow, dimnames)
y <- matrix(1:225, nrow=15, ncol=15, byrow = FALSE)
class(y)
[1] "matrix" "array" 
# logical
# testing each variable in the vector and outputting TRUE or FALSE
over100 <- x>100
table(over100)
over100
FALSE  TRUE 
  100   125 

Your Turn

A. Create a vector of all the homeworlds in starwars using the starwars data.

# create vector called "homeworlds" and assign it the value "homeworld" from the starwars data set


# how many worlds are there? hint: use the unique function


# is there a world called "Ohio"? how would you test this with code?


# How many characters live on Naboo?



# Who lives on Naboo? (hint use the "names" variable in the starwars data and the "which" function)

B. Import and explore the dataframe called “taylor” from the csv “taylorswift.csv”

library(taylor)
taylor <- taylor_all_songs

# what is the "class" of the object taylor?


# what types of data are in the object taylor?


# change the "album_name" from class "character" to class "factor"


# How many albums are in the data set & how many songs on each album?

C. Which numeric song features are correlated with one another? Hint create a correlation matrix.

# pick what features of the data you want to explore


# create a matrix of those features


# evaluate the correlation matrix
mat |> GGally::ggpairs()
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
Error: object 'mat' not found
# which values show a positive correlation? Which values show a negative correlation?

#var1 <- 
#var2 <- 
#var3 <- 
#var4 <- 

#ggplot(taylor, aes(x=var1, y=var2)) + geom_point(size=3)
#ggplot(taylor, aes(x=var3, y=var4)) + geom_point(size=3)

#library(taylor)
#ggplot(taylor, aes(x=loudness, y=energy, color = album_name)) + geom_point(size=3) + scale_color_albums() + facet_wrap(~album_name)

#ggplot(taylor, aes(x=acousticness, y=energy, color = album_name)) + geom_point(size=3) + scale_color_albums() + facet_wrap(~album_name)