Welcome

title

Part 0: Why using R - Commonly used Statistical Packages

  • Not suited for anything beyond absolute basic:

    • Excel
  • Good for processing surveys, limited functionality, not free:

    • SPSS
  • Good functionality, not always state-of-the-art, not free:

    • SAS
    • Stata
  • Almost always state-of-the-art, free, great support community:

    • Python with Jupyter Notebooks (requires basic programming skills)
    • R with R Studio (very strong with data engineering)

What to Expect

  • Part 1: Installation, Setup, Tricks
  • Part 2: Pre-Processing Data with the Tidyverse
  • Part 3: Data Engineering and Modeling with Tidymodels

Part 1: Installation, Setup, Tricks

Go to RStudio

  • Setup R
  • Import Data

Data Pre-Processing with the Tidyverse

  • Select variables (columns) from data frame (aka tibble):
    • select(data, …, …, …)
  • Filter observations for specific criteria:
    • filter (data, criteria)
  • Mutate (calculate) new variables (columns) in an Excel-like way:
    • mutate(data, newvariablename=formula, anothervariablename=formula, …)

*************************

  • Connect commands with piping:
    • data %>% select(Sex,Wage76=Wage) %>% filter(Sex==“Female”) %>% mutate(Wage21=Wage76*3.8)

Go to RStudio

  • select(),
  • filter(),
  • mutate (), and
  • piping %>%

Part 3a: Data Engineering and Modeling with Tidymodels - Simple Recipes


D1

Go to RStudio

  • use tidymodels to design a recipe,
  • use bake to apply the recipe

Part 3b: Data Engineering and Modeling with Tidymodels - Workflow (advanced)


D2

Go to RStudio

  • create a workflow,
  • add a model and a recipe to the workflow
  • fit the workflow with data
  • predict with the workflow
  • get metrics realted to the prediction