Please email me a copy of your R script for this assignment by start of class on Friday February 23rd with your first and last name followed by assignment 3 as the file name (e.g. ‘marissa_dyck_assignment3.R’)
You should always be following best coding practices (see Intro to R module 1) but especially for assingment submissions. Please make sure each problem has its own header so that I can easily navigate to your answers and that your code is well organized with spaces as described in the best coding practices section and comments as needed.
# code to read in turtles data from earlier
turtles_no_na <- read_csv('data/processed/turtles_tidy.csv') %>%
# change sex to a factor
mutate(sex = as.factor(sex),
sex = recode(sex,
fem = 'female')) %>%
# remove rows with NAs
na.omit()
turtles_no_na
## # A tibble: 15 × 5
## tag sex c_length h_width weight
## <dbl> <fct> <dbl> <dbl> <dbl>
## 1 10 male 41 7.15 7.6
## 2 11 female 46.4 8.18 11
## 3 3 female 42.8 7.32 8.6
## 4 4 male 40 6.6 6.5
## 5 5 female 45 8.05 10.9
## 6 12 female 44 7.55 8.9
## 7 6 female 40 6.53 6.2
## 8 9 male 35 5.74 3.9
## 9 17 female 35.1 6.04 4.5
## 10 19 male 42.3 6.77 7.8
## 11 22 female 48.1 8.55 12.8
## 12 105 male 44 7.1 9
## 13 14 male 43 6.6 7.2
## 14 7 female 48 8.67 13.5
## 15 104 male 44 7.35 9
Using the turtles_no_na data, make a new variable called “size_class”
based on the “weight” variable using case_when()
whereby
weights less than 4 are juvenile
weights greater than 7 are adult
weights between 4 and 7 are subadult
(There are multiple ways to do this which is why there are multiple printouts, but they will yield the same answer)
turtles_no_na <- turtles_no_na %>%
mutate(size_class = case_when(
weight < 4 ~ 'juvenile',
weight > 7 ~ 'adult',
TRUE ~ 'subadult'
))
turtles_no_na$size_class
## [1] "adult" "adult" "adult" "subadult" "adult" "adult"
## [7] "subadult" "juvenile" "subadult" "adult" "adult" "adult"
## [13] "adult" "adult" "adult"
# alternatively
turtles_no_na <- turtles_no_na %>%
mutate(size_class = case_when(
weight < 4 ~ 'juvenile',
weight > 7 ~ 'adult',
weight >= 4 & weight <= 7 ~ 'subadult'
))
turtles_no_na$size_class
## [1] "adult" "adult" "adult" "subadult" "adult" "adult"
## [7] "subadult" "juvenile" "subadult" "adult" "adult" "adult"
## [13] "adult" "adult" "adult"
In the turtles_tidy data (not the turtles_no_na data) replace ALL variable values (except the tag column) for tags 104 and 105 with NAs
Hint you will need to create a vector for the tag numbers you
want to replace and use mutate()
# list of tags we do not trust the data for
bad_tags <- c(104, 105)
turtles_tidy <- turtles_tidy %>%
mutate(
sex = replace(sex,
tag %in% bad_tags,
NA),
c_length = replace(c_length,
tag %in% bad_tags,
NA),
h_width = replace(h_width,
tag %in% bad_tags,
NA),
weight = replace(weight,
tag %in% bad_tags,
NA))
tail(turtles_tidy)
## # A tibble: 6 × 5
## tag sex c_length h_width weight
## <dbl> <fct> <dbl> <dbl> <dbl>
## 1 22 female 48.1 8.55 12.8
## 2 105 <NA> NA NA NA
## 3 14 male 43 6.6 7.2
## 4 7 female 48 8.67 13.5
## 5 1 <NA> 29.2 5.1 2.38
## 6 104 <NA> NA NA NA
# or... use some more tidyverse helper functions and tricks!
turtles_tidy <-turtles_tidy %>%
mutate(across(
c("sex","c_length","h_width", "weight"),
~replace(.x,
tag %in% bad_tags,
NA)))
tail(turtles_tidy)
## # A tibble: 6 × 5
## tag sex c_length h_width weight
## <dbl> <fct> <dbl> <dbl> <dbl>
## 1 22 female 48.1 8.55 12.8
## 2 105 <NA> NA NA NA
## 3 14 male 43 6.6 7.2
## 4 7 female 48 8.67 13.5
## 5 1 <NA> 29.2 5.1 2.38
## 6 104 <NA> NA NA NA
Use the below code to read in the Soils data from the carData package
# Load the example data
soil <- carData::Soils # load example data
print the first few lines of data in “soil”
Pivot the data so that columns Ca - Na are contained in one column called nutrients (again there are two possible solutions (really more than that but two I expect people to use))
#See what variables it contains...
head(soil)
## Group Contour Depth Gp Block pH N Dens P Ca Mg K Na Conduc
## 1 1 Top 0-10 T0 1 5.40 0.188 0.92 215 16.35 7.65 0.72 1.14 1.09
## 2 1 Top 0-10 T0 2 5.65 0.165 1.04 208 12.25 5.15 0.71 0.94 1.35
## 3 1 Top 0-10 T0 3 5.14 0.260 0.95 300 13.02 5.68 0.68 0.60 1.41
## 4 1 Top 0-10 T0 4 5.14 0.169 1.10 248 11.92 7.88 1.09 1.01 1.64
## 5 2 Top 10-30 T1 1 5.14 0.164 1.12 174 14.17 8.12 0.70 2.17 1.85
## 6 2 Top 10-30 T1 2 5.10 0.094 1.22 129 8.55 6.92 0.81 2.67 3.18
# Use 'tidyverse' to reshape the data
soil_nutrient <- pivot_longer(soil,
cols = c(Ca,Mg,K, Na),
names_to = 'nutrient',
values_to = 'value')
soil_nutrient
## # A tibble: 192 × 12
## Group Contour Depth Gp Block pH N Dens P Conduc nutrient value
## <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <int> <dbl> <chr> <dbl>
## 1 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 Ca 16.4
## 2 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 Mg 7.65
## 3 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 K 0.72
## 4 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 Na 1.14
## 5 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 Ca 12.2
## 6 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 Mg 5.15
## 7 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 K 0.71
## 8 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 Na 0.94
## 9 1 Top 0-10 T0 3 5.14 0.26 0.95 300 1.41 Ca 13.0
## 10 1 Top 0-10 T0 3 5.14 0.26 0.95 300 1.41 Mg 5.68
## # ℹ 182 more rows
# alternatively
soil_nutrient <- pivot_longer(soil,
cols = Ca:Na,
names_to = 'nutrient',
values_to = 'value')
soil_nutrient
## # A tibble: 192 × 12
## Group Contour Depth Gp Block pH N Dens P Conduc nutrient value
## <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <int> <dbl> <chr> <dbl>
## 1 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 Ca 16.4
## 2 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 Mg 7.65
## 3 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 K 0.72
## 4 1 Top 0-10 T0 1 5.4 0.188 0.92 215 1.09 Na 1.14
## 5 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 Ca 12.2
## 6 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 Mg 5.15
## 7 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 K 0.71
## 8 1 Top 0-10 T0 2 5.65 0.165 1.04 208 1.35 Na 0.94
## 9 1 Top 0-10 T0 3 5.14 0.26 0.95 300 1.41 Ca 13.0
## 10 1 Top 0-10 T0 3 5.14 0.26 0.95 300 1.41 Mg 5.68
## # ℹ 182 more rows
If you haven’t already download the 3 bobcat data files added to the course after 12 January 2024 and save them to the data/raw folder
Bobcat collection data for Purrr (bobcat_collection_data.csv)
Bobcat necropsy data for Purrr (bobcat_necropsy_only_data.csv)
Bobcat age data for Purrr (bobcat_age_data.csv)
Read in the data files using the tidyverse function
In the same code chunk, set the column names to lowercase for all 3 data sets AND rename the ‘Bobcat_ID#’ column to bobcat_id (NOTE: this requires a lot of code repition which is annoying and does not follow best coding practices, we will learn a much better way to do this when we cover Purrr)
Use the csv file names as the object names when you assign them
to the environment - Make a list with the three data sets and check
their internal structure (there are multiple ways to do
this)
Join the bobcat_necropsy_only_data to the bobcat_collection_data AND then in the same code chunk join the bobcat_age_data as well. Make sure to retain all observations from the bobcat_collection_data. You will need to use the bobcat_id column as the key when joining
Print the summary of your data to check that it worked
# read in data files
bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv') %>%
# set names to lowercase
set_names(
names(.) %>%
tolower()) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#')
bobcat_necropsy_only_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv') %>%
# set names to lowercase
set_names(
names(.) %>%
tolower()) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#')
bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv') %>%
# set names to lowercase
set_names(
names(.) %>%
tolower()) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#')
# or simpler code
# read in data files
bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv') %>%
# set names to lowercase
rename_all(tolower) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#')
bobcat_necropsy_only_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv') %>%
# set names to lowercase
rename_all(tolower) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#')
bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv') %>%
# set names to lowercase
rename_all(tolower) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#')
# make a list and check internal structure
# option 1 - nested code
str(list(bobcat_collection_data,
bobcat_necropsy_only_data,
bobcat_age_data))
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ bobcat_id : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ county : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ collectiondate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ coordinates_n : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ coordinates_w : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ bobcat_id : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ necropsydate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ approxage : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ fecundity_females: chr [1:121] "na" "na" "na" "0" ...
## ..$ rearfoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ body_w/tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ bobcat_id: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
# option 2 - with dplyr
list(bobcat_collection_data,
bobcat_necropsy_only_data,
bobcat_age_data) %>%
str(.)
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ bobcat_id : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ county : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ collectiondate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ coordinates_n : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ coordinates_w : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ bobcat_id : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ necropsydate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ approxage : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ fecundity_females: chr [1:121] "na" "na" "na" "0" ...
## ..$ rearfoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ body_w/tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ bobcat_id: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
# join data
bobcat_data_joined <- bobcat_collection_data %>%
# join necropsy data
left_join(bobcat_necropsy_only_data,
by = 'bobcat_id') %>%
# join age data
left_join(bobcat_age_data,
by = 'bobcat_id')
# print summary
summary(bobcat_data_joined)
## bobcat_id county township collectiondate
## Length:121 Length:121 Length:121 Length:121
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## month coordinates_n coordinates_w necropsy
## Min. : 1.000 Min. :38.25 Min. :-83.25 Min. : 1
## 1st Qu.: 2.000 1st Qu.:39.25 1st Qu.:-82.38 1st Qu.: 31
## Median : 3.000 Median :39.59 Median :-81.95 Median : 61
## Mean : 5.826 Mean :39.58 Mean :-81.95 Mean : 61
## 3rd Qu.:11.000 3rd Qu.:39.97 3rd Qu.:-81.50 3rd Qu.: 91
## Max. :12.000 Max. :40.68 Max. :-80.89 Max. :121
## necropsydate dissector approxage sex
## Length:121 Length:121 Length:121 Length:121
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## fecundity_females rearfoot_cm tail_cm ear_cm
## Length:121 Length:121 Length:121 Length:121
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## body_w/tail_cm body weight_kg condition
## Length:121 Length:121 Length:121 Length:121
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## age
## Length:121
## Class :character
## Mode :character
##
##
##