PCA with tidymodels
At the end of this lesson you will be able to:
- Define dimension reduction
- Explain why dimension reduction is used
- Build a model to perform PCA on a dataset
Before class, watch this video from Data Scientists Julie Silge from RStudio. She does a real-time data analysis using Principal Component Analysis (PCA) of the best hip hop songs of all time according to critics ratings. This video is not a detailed explanation of what PCA is or how it works (there is a lot of that maths on the internet if you want). In contrast, it is a live, real-time, analysis that asks the question what song features make a hip hop song the best. I want you to see that PCA is often a first line of inquiry when exploring a dataset.
For background, TidyTuesday is a weekly podcast and community activity that brings an interesting dataset to the data science community each week to do some cool plotting or analysis on. It provides interesting data to use for teaching purposes or code testing.
This video goes pretty fast but all the code she uses is below the video.