Learning Objectives
This week, students will be able to:
- list the elements of the data life cycle
- articulate the relevance of good data management for scientific research
- identify the differences between good and bad data entry and management
- recognize bad data organization and why it is problematic for research
- implement quality assurance and control measures for data entry in spreadsheets using excel
- list current measures used by the scientific community in ecology and evolution to preserve data long term
Welcome!
- Introductions
- Why are you taking this class?
- Introduce yourself to one of your neighbors you don’t know
Syllabus
- Go through the syllabus
- Do you have any questions about it?
- Choose grading scheme
- Choose office hours
Schedule
The schedule shows a list of topics that will be covered each week of the course.
The schedule reflects the flipped course structure, and organizes homework and in-class activities by topic:
- Homework
- Prepare: Readings or activities to do before relevant class
- Strengthen: Exercises to strengthen concepts discussed during class
- Lectures/live-coding
- Lecture notes used in class
- Not expected to be read in advance; may be useful for review
- May not match lecture precisely
- Lecture notes used in class
- In-class activities
- Individually or jointly
- A challenge that will support the construction of a mental model of your own
- May require additional work time after class
Lecture: Best practices for data entry and quality assurance using excel
In-class discussion of reading for Day 1
- showcase data repository websites (GenBank, GBIF, Dryad, Zenodo)
In-class discussion of readings for Day 2
Live coding: Lesson Quality Assurance and Control
- What is the difference between quality assurance and quality control.
- Quality assurance
- Demo data validation in excel using this data file figshare.com/files/2252083. Students download it to their computer, too, and follow along.
- Quality control
- Saving a copy of the original raw data is key.
- Start a README file describing the documents you create in your project.
- Use a template following readme best practices from https://data.research.cornell.edu/content/readme
- Sorting to check for invalid data: Exercise 1
- Ensure that data sorting is expanded to the whole data table, so data is not corrupted.
- Conditional formatting to scan data for outliers: Exercise 2
- Use this cautiously, it might corrput the data.
In-class exercises
- Instructions and group activities.
A minute feedback
- Provide some quick feedback for this session here.