library(tidyverse)
library(cowplot)
Data Wrangling with tidyverse
About the activity
Access the Quarto document here.
Download the raw file.
Open it in RStudio.
We will work our way through this quarto document together during class. The activity will cover reshaping, filtering, and summarizing data using tidyverse principles.
Load the Tidyverse Package
Reshaping and Summarizing Data
A common type of data that requires reshaping is time course data.
Using tidyverse principles answer the questions below:
1. Which month had the most and least passengers in the AirPassengers
data?
The AirPassengers
data which is a time-series of data representing the monthly international airline passenger numbers from January 1949 to December 1960. Search for AirPassengers in the Help
to learn more about the dataset.
# Load and inspect the data, a little reshaping here to get in to an easy to read format for you.
<- matrix(AirPassengers, nrow = length(unique(floor(time(AirPassengers)))), byrow = TRUE)
AP_matrix colnames(AP_matrix) <- month.abb
rownames(AP_matrix) <- unique(floor(time(AirPassengers)))
<- as.data.frame(AP_matrix)
AP_df $Year <- rownames(AP_matrix) AP_df
A. Is the data long or wide? What form does it need to be in? How can you convert to the form you need?
AP_df
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year
1949 112 118 132 129 121 135 148 148 136 119 104 118 1949
1950 115 126 141 135 125 149 170 170 158 133 114 140 1950
1951 145 150 178 163 172 178 199 199 184 162 146 166 1951
1952 171 180 193 181 183 218 230 242 209 191 172 194 1952
1953 196 196 236 235 229 243 264 272 237 211 180 201 1953
1954 204 188 235 227 234 264 302 293 259 229 203 229 1954
1955 242 233 267 269 270 315 364 347 312 274 237 278 1955
1956 284 277 317 313 318 374 413 405 355 306 271 306 1956
1957 315 301 356 348 355 422 465 467 404 347 305 336 1957
1958 340 318 362 348 363 435 491 505 404 359 310 337 1958
1959 360 342 406 396 420 472 548 559 463 407 362 405 1959
1960 417 391 419 461 472 535 622 606 508 461 390 432 1960
B. How can we extract the the most and least traveled months each year?
2. What was the percent increase in passengers each year between Aug and Nov?
# To answer this question we need to find the ratio of Aug and Nov travelers. We need the data in the wide format.
# how can we add the ratio to get the percent increase?
3. Which diet lead to heavier chicks?
We will use the ChickWeight
data. Use the help to read more about the data.
# First look at the data.
glimpse(ChickWeight)
Rows: 578
Columns: 4
$ weight <dbl> 42, 51, 59, 64, 76, 93, 106, 125, 149, 171, 199, 205, 40, 49, 5…
$ Time <dbl> 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 21, 0, 2, 4, 6, 8, 10, 1…
$ Chick <ord> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ Diet <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
A. Count how many timepoints were measured and how many chicks were on each Diet.
# How can you count the timeponits, chicks, and diets, and chicks nested in diets?
B. Now figure out which diet leads to the heaviest chicks.
# we can plot it to get a first view
|>
ChickWeight ggplot(aes(x = Time, y = weight, group = Chick, color = Diet)) +
geom_line() +
theme_cowplot()