Navigating Data Types in R

Author

Lindsay N. Hayes

Published

July 3, 2025

About the activity

  1. Access the Quarto document here.

  2. Download the raw file.

  3. Open it in RStudio.

We will work our way through this quarto document together during class. The activity will cover using R as a calculator, creating R objects, and exploring the features of a data set.

First Load the Tidyverse Package

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Ways to Use R

1. Arithmetic

# example: addition/subtraction/multiplication/division

193 + 45
[1] 238
2050 - 2025
[1] 25
50/250 * 100
[1] 20
# activity: assign the variable my_age as the current year minus your year of birth. 

2. Create R objects

# vector
# c(..., recursive = FALSE, use.names = TRUE)
x <- c(1:225)
class(x)
[1] "integer"
# matrix
# matrix(data, nrow, ncol, byrow, dimnames)
y <- matrix(1:225, nrow=15, ncol=15, byrow = FALSE)
class(y)
[1] "matrix" "array" 
# logical
# testing each variable in the vector and outputting TRUE or FALSE
over100 <- x > 100
table(over100)
over100
FALSE  TRUE 
  100   125 

Your Turn

A. Create a vector of all the homeworlds in starwars using the starwars data.

# create vector called "homeworlds" and assign it the value "homeworld" from the starwars data set

glimpse(starwars)
Rows: 87
Columns: 14
$ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
$ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
$ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
$ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
$ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
$ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",…
$ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
$ sex        <chr> "male", "none", "none", "male", "female", "male", "female",…
$ gender     <chr> "masculine", "masculine", "masculine", "masculine", "femini…
$ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T…
$ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma…
$ films      <list> <"A New Hope", "The Empire Strikes Back", "Return of the J…
$ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp…
$ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",…
homeworlds <- starwars$homeworld

homeworlds
 [1] "Tatooine"       "Tatooine"       "Naboo"          "Tatooine"      
 [5] "Alderaan"       "Tatooine"       "Tatooine"       "Tatooine"      
 [9] "Tatooine"       "Stewjon"        "Tatooine"       "Eriadu"        
[13] "Kashyyyk"       "Corellia"       "Rodia"          "Nal Hutta"     
[17] "Corellia"       "Bestine IV"     NA               "Naboo"         
[21] "Kamino"         NA               "Trandosha"      "Socorro"       
[25] "Bespin"         "Mon Cala"       "Chandrila"      NA              
[29] "Endor"          "Sullust"        NA               "Cato Neimoidia"
[33] "Coruscant"      "Naboo"          "Naboo"          "Naboo"         
[37] "Naboo"          "Naboo"          "Toydaria"       "Malastare"     
[41] "Naboo"          "Tatooine"       "Dathomir"       "Ryloth"        
[45] "Ryloth"         "Aleen Minor"    "Vulpter"        "Troiken"       
[49] "Tund"           "Haruun Kal"     "Cerea"          "Glee Anselm"   
[53] "Iridonia"       "Coruscant"      "Iktotch"        "Quermia"       
[57] "Dorin"          "Champala"       "Naboo"          "Naboo"         
[61] "Tatooine"       "Geonosis"       "Mirial"         "Mirial"        
[65] "Naboo"          "Serenno"        "Alderaan"       "Concord Dawn"  
[69] "Zolan"          "Ojom"           "Kamino"         "Kamino"        
[73] "Coruscant"      NA               "Skako"          "Muunilinst"    
[77] "Shili"          "Kalee"          "Kashyyyk"       "Alderaan"      
[81] "Umbara"         "Utapau"         NA               NA              
[85] NA               NA               NA              
# how many worlds are there? hint: use the unique function

unique(homeworlds)
 [1] "Tatooine"       "Naboo"          "Alderaan"       "Stewjon"       
 [5] "Eriadu"         "Kashyyyk"       "Corellia"       "Rodia"         
 [9] "Nal Hutta"      "Bestine IV"     NA               "Kamino"        
[13] "Trandosha"      "Socorro"        "Bespin"         "Mon Cala"      
[17] "Chandrila"      "Endor"          "Sullust"        "Cato Neimoidia"
[21] "Coruscant"      "Toydaria"       "Malastare"      "Dathomir"      
[25] "Ryloth"         "Aleen Minor"    "Vulpter"        "Troiken"       
[29] "Tund"           "Haruun Kal"     "Cerea"          "Glee Anselm"   
[33] "Iridonia"       "Iktotch"        "Quermia"        "Dorin"         
[37] "Champala"       "Geonosis"       "Mirial"         "Serenno"       
[41] "Concord Dawn"   "Zolan"          "Ojom"           "Skako"         
[45] "Muunilinst"     "Shili"          "Kalee"          "Umbara"        
[49] "Utapau"        
starwars |> count(homeworld) # NOTE IT INCLUDES missing values (NA)
# A tibble: 49 × 2
   homeworld          n
   <chr>          <int>
 1 Alderaan           3
 2 Aleen Minor        1
 3 Bespin             1
 4 Bestine IV         1
 5 Cato Neimoidia     1
 6 Cerea              1
 7 Champala           1
 8 Chandrila          1
 9 Concord Dawn       1
10 Corellia           2
# ℹ 39 more rows
# is there a world called "Ohio"? how would you test this with code?

starwars |> count(homeworld == "Ohio")
# A tibble: 2 × 2
  `homeworld == "Ohio"`     n
  <lgl>                 <int>
1 FALSE                    77
2 NA                       10
table(homeworlds == "Ohio")

FALSE 
   77 
# How many characters live on Naboo?

table(homeworlds == "Naboo")

FALSE  TRUE 
   66    11 
starwars |> filter(homeworld == "Naboo") |> count(homeworld)
# A tibble: 1 × 2
  homeworld     n
  <chr>     <int>
1 Naboo        11
# Who lives on Naboo? (hint use the "names" variable in the starwars data and the "which" function)

starwars |> filter(homeworld == "Naboo") |> select(name)
# A tibble: 11 × 1
   name         
   <chr>        
 1 R2-D2        
 2 Palpatine    
 3 Padmé Amidala
 4 Jar Jar Binks
 5 Roos Tarpals 
 6 Rugor Nass   
 7 Ric Olié     
 8 Quarsh Panaka
 9 Gregar Typho 
10 Cordé        
11 Dormé        
homeworlds == "Naboo"
 [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE    NA  TRUE FALSE    NA FALSE FALSE
[25] FALSE FALSE FALSE    NA FALSE FALSE    NA FALSE FALSE  TRUE  TRUE  TRUE
[37]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
[61] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[73] FALSE    NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE    NA    NA
[85]    NA    NA    NA
which(homeworlds == "Naboo")
 [1]  3 20 34 35 36 37 38 41 59 60 65
starwars$name[which(homeworlds == "Naboo")]
 [1] "R2-D2"         "Palpatine"     "Padmé Amidala" "Jar Jar Binks"
 [5] "Roos Tarpals"  "Rugor Nass"    "Ric Olié"      "Quarsh Panaka"
 [9] "Gregar Typho"  "Cordé"         "Dormé"        

B. Import the taylor package and explore the taylor_album_songs dataframe

# Try loading in the taylor package and viewing the taylor_album_songs dataframe

# install.packages("taylor")

library(taylor)
taylor <- taylor_album_songs

# what is the "class" of the object taylor?



# what is the "class" of the object taylor_album_songs?


# what types of data are in the object taylor_album_songs?


# change the "key_mode" from class "character" to class "factor"


# How many albums are in the data set & how many songs on each album?

C. Which numeric song features are correlated with one another? Hint create a correlation matrix.

# look at the columns in taylor_album_songs and pick what features of the data you want to explore (hint choose all the numeric variables)

# create a matrix of those features

# pipe the matrix into GGally::ggpairs(), evaluate the correlation matrix

# Which values show a positive correlation? Which values show a negative correlation?

#var1 <-
#var2 <-
#var3 <- 
#var4 <- 


# Plot the correlations

#ggplot(taylor_album_songs, aes(x=var1, y=var2)) + geom_point(size=3)

#ggplot(taylor_album_songs, aes(x=var3, y=var4)) + geom_point(size=3)