Joining data

Homework Day 1: Joining data tables practice

Exercise 4: From join to plot

Create a data frame with only data for the “species_id” “DO”, with the columns "year", "month", "day", "species_id", and "weight".
Create a data frame with only data for species IDs "PP" and "PB" and for years starting in 1995, with the columns "year", "species_id", and "hindfoot_length", with no missing values for "hindfoot_length".
Create a data frame with the average "hindfoot_length" for each "species_id" in each "year" with no null values.
Create a data frame with the "year", "genus", "species", "weight" and "plot_type" for all cases where the "genus" is "Dipodomys".
Make a scatter plot with "weight" on the x-axis and "hindfoot_length" on the y-axis. Use a log10() scale on the x-axis. Color the points by "species_id". Include good axis labels.
Make a histogram of weights with a separate subplot for each "species_id". Do not include species with no weights. Set the "scales" argument to "free_y" so that the y-axes can vary. Include good axis labels.
Optional challenge: Make a plot with histograms of the weights of three species, "PP", "PB", and "DM", colored by "species_id", with a different facet (i.e., subplot) for each of three "plot_type"’s "Control", "Long-term Krat Exclosure", and "Short-term Krat Exclosure". Include good axis labels and a title for the plot. Export the plot to a PNG file.

Exercise 5: Challenge

Develop a data manipulation pipeline for the Portal surveys table that produces a table of data for only the three Dipodomys species ("DM", "DO", "DS").

The species IDs should be presented as lower case, not upper case. Search which function can help you do this.
The table should contain information on the date, the species ID, the weight and hindfoot length.
The data should not include null values for either weight or hindfoot length.
The table should be sorted first by the species (so that each species is grouped together) and then by weight, with the largest weights at the top.

You finished homework day 1!

Homework Day 2: Shrub volume data set - part 2

Exercise 8: Joining data tables

In addition to the main data table on shrub volume, Dr. Granger has two additional data tables. The first describes the manipulation for each experiment, and it is called shrub-volume-experiments.csv The second provides information about the different sites, and it is called shrub-volume-sites.csv.

Import the experiments data into your R environment, and then use inner_join() to combine it with the shrub volume data to add a "manipulation" column to the shrub data.
Import the shrub volume sites data and then combine it with both the data on shrub volume data and the experiments data to produce a single data frame that contains all of the data.

Exercise 9: Vectors

You have data on the length, width, and height of 10 individuals of the yew Taxus baccata stored in the following vectors:

length <- c(2.2, 2.1, 2.7, 3.0, 3.1, 2.5, 1.9, 1.1, 3.5, 2.9)
width <- c(1.3, 2.2, 1.5, 4.5, 3.1, NA, 1.8, 0.5, 2.0, 2.7)
height <- c(9.6, 7.6, 2.2, 1.5, 4.0, 3.0, 4.5, 2.3, 7.5, 3.2)

Copy these vectors into your Rmd file, and use them to get the correct answers.

Hint: Remember the effect of missing values for R evaluations. You’ll need to use na.rm = TRUE or remove missing values using is.na() to get the correct result.

The smallest value of length, width and height.
The largest value of length, width and height.
The sum of the values length, width and height, separately.
The average of the length, width and height.
The volume of each shrub (length × width × height). Storing this as an object or variable will make some of the next problems easier.
The sum of the volume of all of the shrubs.
A vector of the height of shrubs with lengths > 2.5.
A vector of the height of shrubs with heights > 5.
A vector of the heights of the first 5 shrubs (using []).
A vector of the volumes of the first 3 shrubs (using []).
A vector of the volumes of the last 5 shrubs with the code written so that it will return the last 5 values regardless of the length of the vector (i.e., it will give the last 5 values if there are 10, 20, or 50 individuals).

Exercise 10: Data Frames Challenge

One of your collaborators has posted a comma-delimited text file online for you to analyze. The file contains dimensions ("length", "width", "height") of a series of shrubs ("shrubID") and they need you to determine their volumes (length * width * height). You could do this using a spreadsheet, but the project that you are working on is going to be generating lots of these files so you decide to write a program to automate the process.

Download the data and save it to the appropriate folder, use read.csv() to import it into R, and use it to produce the following information:

A vector of shrub lengths
A vector of the volume of each of the shrubs
A data frame with just the shrubID and height columns
A data frame with the second row of the full data frame
A data frame with the first 5 rows of the full data frame

UC Merced -
Spring 2023

After class

Homework Day 1: Joining data tables practice

Exercise 4: From join to plot

Exercise 5: Challenge

You finished homework day 1!

Homework Day 2: Shrub volume data set - part 2

Exercise 8: Joining data tables

Exercise 9: Vectors

Exercise 10: Data Frames Challenge

You finished homework day 2!

UC Merced - Spring 2023

Joining data

After class

Homework Day 1: Joining data tables practice

Exercise 4: From join to plot

Exercise 5: Challenge

You finished homework day 1!

Homework Day 2: Shrub volume data set - part 2

Exercise 8: Joining data tables

Exercise 9: Vectors

Exercise 10: Data Frames Challenge

You finished homework day 2!

UC Merced -
Spring 2023