Day 1: Joining data tables

Learning Objectives

This week, students will learn to:

  • Explain the importance of joining multiple data tables.
  • Use the dplyr functions that join data tables.
  • Understand why data is dropped when joining tables
  • Use pipes to join more than two data tables
  • Use the %in% operator to find matching column names in two data tables

Practice Objectives

This week, students will practice:

  • Use of relational and logical statements to filter data tables
  • Handling missing values with is.na() and na.rm =
  • pipeline placeholders

Non Objectives

  • which()
  • match()

Homework and class review (15 min)

Setup your RStudio project (15 min)

Why do we need to join data tables? (5 min)

Joining two data Tables (5 min)

Exercise 1 (10 min)

Do the following calculations using a single pipe of code (no nested nor intermediate variables):

Check dropped data

Finding shared column names (colnames()) between tables (5 min)

Exercise 1 (10 min)

  1. Find the column name that is shared between the plots table and the surveys table. Use that column name for the next question.
  2. Do the following using a single pipe of code (no nested code nor intermediate variables):
    • Use function inner_join() and filter() to get a data frame with the information from the surveys and plots tables where the "plot_type" is "Control".

Joining two or more data Tables (5 min)

Exercise 2 (15 min)

We want to do an analysis comparing the size of individuals on the "Control" plots to the "Long-term Krat Exclosures".

Start with the Homework

Exercises 3 and 4 of Joining data tables practice.



Day 2: Joining data vectors

Learning Objectives

This week, students will learn to:

-

- - -

Practice Objectives

This week, students will practice:

-

Non Objectives

-

Setup Your RStudio Project

A Relationship Between Data Frames and Vectors

Creating vectors

  1. Examples of logical vectors
      abc
    

Creating data frames from Vectors

density_data_year <- data.frame(year = 2000, sites = sites, density = density)

Joint in-class exercise

You have data on the length, width, and height of 10 individuals of the yew Taxus baccata stored in the following vectors:

length <- c(2.2, 2.1, 2.7, 3.0, 3.1, 2.5, 1.9, 1.1, 3.5, 2.9)
width <- c(1.3, 2.2, 1.5, 4.5, 3.1, NA, 1.8, 0.5, 2.0, 2.7)
height <- c(9.6, 7.6, 2.2, 1.5, 4.0, 3.0, 4.5, 2.3, 7.5, 3.2)


Extracting values from vectors and data frames

Extracting vectors from data frames

surveys <- read.csv("surveys.csv")
surveys["species_id"]
surveys[["species_id"]]
surveys$species_id

Exercise

Using the Portal data surveys table (download a copy if it’s not in your working directory):

  1. Use $ to extract the weight column into a vector called surveys_weight
  2. Use [] to extract the month column into a vector called surveys_month
  3. Extract the hindfoot_length column into a vector and calculate the mean hindfoot length ignoring missing values.

Extracting Values from Vectors

letters[10] # indexing the 10th letter of the alphabet
letters[1:3] # getting the first three letters
abc <- letters[c(1,2,3)] # creating a vector of the first three letters of the alphabet
letters[3:1] #
letters[-1]
letters[-1:5]

Overwriting values in vectors and data frames

Summary