Home exercises: Acacia Vs Trees
Exercise 3: Removing outliers.
- Download the file TREE_SURVEYS.txt and save it to your “data-raw” folder
- Read the file with the function
read_tsv
from the packagereadr
and assign it to an object calledtrees
:trees <- read_tsv("TREE_SURVEYS.txt", col_types = list(HEIGHT = col_double(), AXIS_2 = col_double()))
- Use the
$
to add a new column to thetrees
data frame that is namedcanopy_area
and contains the estimated canopy area calculated as the value in theAXIS_1
column times the value in theAXIS_2
column. - Create a subset the
trees
data frame with just theSURVEY
,YEAR
,SITE
, andcanopy_area
columns. - Make a scatter plot with
canopy_area
on the x axis andHEIGHT
on the y axis. Color the points byTREATMENT
and create a subplot per species using the functionfacet_wrap()
. This will plot the points for each variable in theSPECIES
column in a separate subplot. Label the x axis “Canopy Area (m)” and the y axis “Height (m)”. Make the point size 2. - That’s a big outlier in the plot from (2). 50 by 50 meters is a little too
big for a real acacia tree, so filter the data to remove any values for
AXIS_1
andAXIS_2
that are over 20 and update the data frame. Then, remake the graph. - DON’T DO: For this question you will use the package
dplyr
and the pipe operator%>%
. To learn more about the pipe operator and how to use it, watch this introductory video. Using the data without the outlier – i.e., the data generated in (6), create a data frame calledabundance
that shows how the abundance of each species has been changing through time. To do this, use the functionsgroup_by()
,summarize()
, andn()
to make a data frame withYEAR
,SPECIES
, and aspecies_abundance
column that has the number of individuals per species per year. For guidance, look at the examples of the functionsgroup_by()
(usinghelp(group_by)
andsummarize()
(usinghelp(summarize)
). Print out theabundance
data frame. - DON’T DO: Using the data frame generated in (7),
make a line plot with points (by using
geom_line()
in addition togeom_point()
) withYEAR
on the x axis andabundance
on the y axis with one subplot per species. To let you see each trend clearly, let the scale for the y axis vary among plots by addingscales = "free_y"
as an optional argument tofacet_wrap()
.
Exercise 4: Fitting linear models.
We want to compare the circumference to height relationship in acacia to the same relationship for all trees in the region. These data are stored in two different tables. Make a graph with the relationship between CIRC
and HEIGHT
for all trees as gray circles in the background and the same relationship for acacia as red circles plotted on top of the gray circles. Scale both axes logarithmically. Include a linear model fitting for both sets of data, trying different linear models specified using the argument method =
. Provide clear labels for the axes.
Your plot should look something like this.
Once your are done with the exercises:
- Save your .Rmd file and knit to PDF.
- Add the two files, commit and push to GitHub
- Let your instructor know that changes have been published on GitHub