Submission

Please email me a copy of your R script for this assignment by start of class on Friday February 2nd with your first and last name followed by assignment 2 as the file name (e.g. ‘marissa_dyck_assignment2.R’)

You should always be following best coding practices (see Intro to R module 1) but especially for assingment submissions. Please make sure each problem has its own header so that I can easily navigate to your answers and that your code is well organized with spaces as described in the best coding practices section and comments as needed.

1 Export data

Save the altered turtles data as a comma separated file to the data/processed folder in your working directory using the ‘readr’ package and name it ‘turtles_tidy’

write_csv(turtles.df,
          'data/processed/turtles_tidy.csv')

Download the brown bear damage data (bear_2008_2016.csv) and save it in the data folder.

2 Import and rename data

  • Import the brown bear dataset using the appropriate function from the ‘readr’ package and save it as bear_data.
  • In the same code chunk set the column names to lowercase and
  • change dist_to_town to m_to_town and dist_to_forest to m_to_forest (m for meters)
  • Finally, use one of the functions we’ve covered to view/print your data to check that it worked.
bear_data <- read_csv('data/raw/bear_2008_2016.csv') %>% 
  
  # set column names to lowercase
  set_names(
    names(.) %>%  
      tolower()) %>% 
  
  # rename columns to include units
  rename(m_to_town = dist_to_town,
         m_to_forest = dist_to_forest)
## Rows: 3024 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): Targetspp
## dbl (24): Damage, Year, Month, POINT_X, POINT_Y, Bear_abund, Landcover_code,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bear_data) 
## # A tibble: 6 × 25
##   damage  year month targetspp point_x point_y bear_abund landcover_code
##    <dbl> <dbl> <dbl> <chr>       <dbl>   <dbl>      <dbl>          <dbl>
## 1      0  2008     0 <NA>      542583. 532675.         38            311
## 2      0  2008     0 <NA>      540963. 528113.         40            311
## 3      0  2008     0 <NA>      542435. 510385.         38            211
## 4      0  2008     0 <NA>      541244. 507297.         38            231
## 5      0  2008     0 <NA>      542791. 497755.         32            211
## 6      0  2008     0 <NA>      544639. 610454.         32            313
## # ℹ 17 more variables: altitude <dbl>, human_population <dbl>,
## #   m_to_forest <dbl>, m_to_town <dbl>, livestock_killed <dbl>,
## #   numberdamageperplot <dbl>, shannondivindex <dbl>, prop_arable <dbl>,
## #   prop_orchards <dbl>, prop_pasture <dbl>, prop_ag_mosaic <dbl>,
## #   prop_seminatural <dbl>, prop_deciduous <dbl>, prop_coniferous <dbl>,
## #   prop_mixedforest <dbl>, prop_grassland <dbl>, prop_for_regen <dbl>

3 Change variable type

  • Check the internal structure of the bear dataset
  • Change ‘targetspp’ from character to factor using dplyr make sure to overwrite the previous bear data so the changes are saved
  • Use the ‘levels()’ function to check if this worked. HINT: Use the help file if you aren’t familiar with ‘levels()’
str(bear_data)
## spc_tbl_ [3,024 × 25] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ damage             : num [1:3024] 0 0 0 0 0 0 0 0 0 0 ...
##  $ year               : num [1:3024] 2008 2008 2008 2008 2008 ...
##  $ month              : num [1:3024] 0 0 0 0 0 0 0 0 0 0 ...
##  $ targetspp          : chr [1:3024] NA NA NA NA ...
##  $ point_x            : num [1:3024] 542583 540963 542435 541244 542791 ...
##  $ point_y            : num [1:3024] 532675 528113 510385 507297 497755 ...
##  $ bear_abund         : num [1:3024] 38 40 38 38 32 32 37 45 45 36 ...
##  $ landcover_code     : num [1:3024] 311 311 211 231 211 313 313 312 324 312 ...
##  $ altitude           : num [1:3024] 800 666 504 553 470 ...
##  $ human_population   : num [1:3024] 2 0 2 0 0 0 0 0 0 0 ...
##  $ m_to_forest        : num [1:3024] 0 0 1829 398 942 ...
##  $ m_to_town          : num [1:3024] 2398 2441 571 1598 1068 ...
##  $ livestock_killed   : num [1:3024] 0 0 0 0 0 0 0 0 0 0 ...
##  $ numberdamageperplot: num [1:3024] 0 0 0 0 0 0 0 0 0 0 ...
##  $ shannondivindex    : num [1:3024] 0.963 1.119 1.056 1.506 1.067 ...
##  $ prop_arable        : num [1:3024] 0 0 65.6 27.3 53.2 ...
##  $ prop_orchards      : num [1:3024] 0 0 0 0 0 0 0 0 0 0 ...
##  $ prop_pasture       : num [1:3024] 0 25.3 11.13 20 4.27 ...
##  $ prop_ag_mosaic     : num [1:3024] 0 0 8.55 0 0 ...
##  $ prop_seminatural   : num [1:3024] 16.3 30.6 0 19.8 0 ...
##  $ prop_deciduous     : num [1:3024] 57.796 43.095 0 0.769 32.454 ...
##  $ prop_coniferous    : num [1:3024] 0 0 0 0 0 ...
##  $ prop_mixedforest   : num [1:3024] 0 0 0 0 0 ...
##  $ prop_grassland     : num [1:3024] 0 1 0 0 0 ...
##  $ prop_for_regen     : num [1:3024] 25.9 0 13.5 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Damage = col_double(),
##   ..   Year = col_double(),
##   ..   Month = col_double(),
##   ..   Targetspp = col_character(),
##   ..   POINT_X = col_double(),
##   ..   POINT_Y = col_double(),
##   ..   Bear_abund = col_double(),
##   ..   Landcover_code = col_double(),
##   ..   Altitude = col_double(),
##   ..   Human_population = col_double(),
##   ..   Dist_to_forest = col_double(),
##   ..   Dist_to_town = col_double(),
##   ..   Livestock_killed = col_double(),
##   ..   Numberdamageperplot = col_double(),
##   ..   ShannonDivIndex = col_double(),
##   ..   prop_arable = col_double(),
##   ..   prop_orchards = col_double(),
##   ..   prop_pasture = col_double(),
##   ..   prop_ag_mosaic = col_double(),
##   ..   prop_seminatural = col_double(),
##   ..   prop_deciduous = col_double(),
##   ..   prop_coniferous = col_double(),
##   ..   prop_mixedforest = col_double(),
##   ..   prop_grassland = col_double(),
##   ..   prop_for_regen = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# change targetspp to factor dplyr
bear_data <- bear_data %>% 
  mutate(targetspp = as.factor(targetspp))

levels(bear_data$targetspp)
## [1] "alte"   "bovine" "ovine"

4 Subsetting

  • Now that we know there are 3 livestock types (groups), subset the data to include only the rows for ‘ovine’
  • In the same code chunk select the columns damage, year, month, bear_abund, landcover_code, and altitude
  • Save this to your environment as ‘bear_sheep_data’
bear_sheep_data <- bear_data %>% 
  
  # return only rows for sheep
  filter(targetspp == 'ovine') %>% 
  
  # select specified columns
  select(damage:month, bear_abund:altitude) # most parsimonious way to do it but you could have also done


# select(damage, year, month, bear_abund, landcover_code, altitude)

# or 

# select(1:3, 7:9)

head(bear_sheep_data)
## # A tibble: 6 × 6
##   damage  year month bear_abund landcover_code altitude
##    <dbl> <dbl> <dbl>      <dbl>          <dbl>    <dbl>
## 1      1  2008     9         40            324      570
## 2      1  2008     9         40            324      570
## 3      1  2008     8         36            321     1410
## 4      1  2008     8         22            321     1068
## 5      1  2008     9         40            112      516
## 6      1  2008     8         56            112      533

5 Summarise

  • Using the bear_sheep_data and the summarise() function, calculate the mean, sd and SE (the formula for SE is sd / sqrt(n)) for altitude. This last part might be tricky at first but give it a try and remember you can always google things and check the ‘help’ files; and if that fails ‘phone a friend or me’
bear_sheep_data %>% 
  summarise(mean_alt = mean(altitude),
            sd_alt = sd(altitude),
            se_alt = sd_alt/sqrt(length(altitude))) # you could also look up the length for altitude in the console and type the number here directly but this is more flexible and readable
## # A tibble: 1 × 3
##   mean_alt sd_alt se_alt
##      <dbl>  <dbl>  <dbl>
## 1     699.   183.   12.6
# or another possible answer

bear_sheep_data %>% 
  summarise(mean_alt = mean(altitude),
            sd_alt = sd(altitude),
            se_alt = sd_alt/sqrt(n()))
## # A tibble: 1 × 3
##   mean_alt sd_alt se_alt
##      <dbl>  <dbl>  <dbl>
## 1     699.   183.   12.6