Intro to R

Materials

Scripts

Click here to download the script! Save the script to the project directory you set up in the previous module.
Load your script in RStudio. To do this, open RStudio and click on the folder icon in the toolbar at the top to load your script.

Cheat sheets

Save this to your cheat sheet folder

Iteration

Often times you will want to apply the same function or multiple functions to multiple objects or inputs this is called iteration. In Base R, iterations are achieved by using for-loops, which are cumbersome and not very intuitive. If you’ve taken an R class before and covered for-loops, you are probably already nervous. But do not fear Tidyverse has made a package to conduct iteration operations that is much more user-friendly… and it has a cute inviting name, Purrr

Let’s go through a few examples of how the Purrr package can be used

Purrr map

As you can see from the Purrr cheat sheet, the Purrr package has a lot of usages. For the purposes of this course we will just be skimming the surface and mostly focus on applying functions with Purrr (upper left).

To apply the same function/s to a number of objects we use the map() function.

Let’s go through an example WITHOUT using Purrr first and then apply the tidyverse alternative.

Import data with Purrr

One of the most common ways I use Purrr is when I want to read in multiple data sets for the same analysis. Often times landscape data, weather data, species data, etc. will be entered separately but you need all the data files for an analysis.

Start by reading in the 3 bobcat data files for this module, leave them named as they are

bobcat_collection_data.csv

bobcat_necropsy_only_data.csv

bobcat_age_data.csv

bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv')

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

bobcat_necropsy_only_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv')

## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv')

## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now lets do this using Purrr!

First we need to assign our object to the environment with a name. When we read in multiple data frames at a time they will be stored in a list object. Lists can be confusing to work with at first but they follow a similar structure to a data frame. Let’s name this list bobcat_data.

And just like when we read in an individual data frame we need to provide the path for the data and the name and file extension (e.g. .csv, .txt.)

Then we were reference our purrr::map function AND the function we want to apply to all of the data. This is where things can get a bit confusing at first,

Inside the parentheses for for the purrr::map function we will reference the read_csv() fuction, BUT we must preface this fuction with a ‘~’. The ‘~’ is part of the syntax Purrr uses.
Then where we would normally put the data file name we can substitute ‘.x’ which will reference the data in the list we already provided. Similar to the ‘.’ when using dplyr.

Let’s look at an example

# assign objects as list provide the names and file path of each data frame

bobcat_data <- list('data/raw/bobcat_collection_data.csv',
                    'data/raw/bobcat_necropsy_only_data.csv',
                    'data/raw/bobcat_age_data.csv') %>% 
  
  # use purrr::map to read in all data at once
  purrr::map(
    ~read_csv(.x)
  )

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# look at the list structure
str(bobcat_data)

## List of 3
##  $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#    : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ County        : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
##   ..$ Township      : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
##   ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
##   ..$ Month         : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
##   ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
##   ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   County = col_character(),
##   .. ..   Township = col_character(),
##   .. ..   CollectionDate = col_character(),
##   .. ..   Month = col_double(),
##   .. ..   Coordinates_N = col_double(),
##   .. ..   Coordinates_W = col_double()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#       : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Necropsy         : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ NecropsyDate     : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
##   ..$ Dissector        : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
##   ..$ ApproxAge        : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
##   ..$ Sex              : chr [1:121] "M" "F" "M" "F" ...
##   ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
##   ..$ RearFoot_cm      : chr [1:121] "17.9" "16" "17.1" "14.7" ...
##   ..$ Tail_cm          : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
##   ..$ Ear_cm           : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
##   ..$ Body_w/Tail_cm   : chr [1:121] "89.5" "82" "92" "71.5" ...
##   ..$ Body             : chr [1:121] "78" "70.5" "80.6" "60.2" ...
##   ..$ Weight_kg        : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
##   ..$ Condition        : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Necropsy = col_double(),
##   .. ..   NecropsyDate = col_character(),
##   .. ..   Dissector = col_character(),
##   .. ..   ApproxAge = col_character(),
##   .. ..   Sex = col_character(),
##   .. ..   Fecundity_Females = col_character(),
##   .. ..   RearFoot_cm = col_character(),
##   .. ..   Tail_cm = col_character(),
##   .. ..   Ear_cm = col_character(),
##   .. ..   `Body_w/Tail_cm` = col_character(),
##   .. ..   Body = col_character(),
##   .. ..   Weight_kg = col_character(),
##   .. ..   Condition = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Age       : chr [1:121] "3" "1" "2" "0" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Age = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr>

We can reduce our typing even further since all of the data are stored in the same folder by referencing a file path.

# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw', 
  
  # provide the names of each data frame
  c('bobcat_collection_data.csv',
    'bobcat_necropsy_only_data.csv',
    'bobcat_age_data.csv')) %>% 
  
  # use purrr::map to read in all data at once
  purrr::map(
    ~read_csv(.x))

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(bobcat_data)

## List of 3
##  $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#    : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ County        : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
##   ..$ Township      : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
##   ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
##   ..$ Month         : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
##   ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
##   ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   County = col_character(),
##   .. ..   Township = col_character(),
##   .. ..   CollectionDate = col_character(),
##   .. ..   Month = col_double(),
##   .. ..   Coordinates_N = col_double(),
##   .. ..   Coordinates_W = col_double()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#       : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Necropsy         : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ NecropsyDate     : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
##   ..$ Dissector        : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
##   ..$ ApproxAge        : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
##   ..$ Sex              : chr [1:121] "M" "F" "M" "F" ...
##   ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
##   ..$ RearFoot_cm      : chr [1:121] "17.9" "16" "17.1" "14.7" ...
##   ..$ Tail_cm          : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
##   ..$ Ear_cm           : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
##   ..$ Body_w/Tail_cm   : chr [1:121] "89.5" "82" "92" "71.5" ...
##   ..$ Body             : chr [1:121] "78" "70.5" "80.6" "60.2" ...
##   ..$ Weight_kg        : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
##   ..$ Condition        : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Necropsy = col_double(),
##   .. ..   NecropsyDate = col_character(),
##   .. ..   Dissector = col_character(),
##   .. ..   ApproxAge = col_character(),
##   .. ..   Sex = col_character(),
##   .. ..   Fecundity_Females = col_character(),
##   .. ..   RearFoot_cm = col_character(),
##   .. ..   Tail_cm = col_character(),
##   .. ..   Ear_cm = col_character(),
##   .. ..   `Body_w/Tail_cm` = col_character(),
##   .. ..   Body = col_character(),
##   .. ..   Weight_kg = col_character(),
##   .. ..   Condition = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Age       : chr [1:121] "3" "1" "2" "0" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Age = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr>

We could also format it like this if that makes more sense to you, using a period as the placeholder for the data file names

# assign object name to environment and provide list of data file names

bobcat_data <- list('bobcat_collection_data.csv',
                    'bobcat_necropsy_only_data.csv',
                    'bobcat_age_data.csv') %>% 
  
  # provide file path for the data
  file.path('data/raw', .) %>% 
  
  # use purrr::map to read in all data at once
  purrr::map(
    ~read_csv(.x))

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(bobcat_data)

## List of 3
##  $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#    : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ County        : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
##   ..$ Township      : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
##   ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
##   ..$ Month         : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
##   ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
##   ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   County = col_character(),
##   .. ..   Township = col_character(),
##   .. ..   CollectionDate = col_character(),
##   .. ..   Month = col_double(),
##   .. ..   Coordinates_N = col_double(),
##   .. ..   Coordinates_W = col_double()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#       : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Necropsy         : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ NecropsyDate     : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
##   ..$ Dissector        : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
##   ..$ ApproxAge        : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
##   ..$ Sex              : chr [1:121] "M" "F" "M" "F" ...
##   ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
##   ..$ RearFoot_cm      : chr [1:121] "17.9" "16" "17.1" "14.7" ...
##   ..$ Tail_cm          : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
##   ..$ Ear_cm           : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
##   ..$ Body_w/Tail_cm   : chr [1:121] "89.5" "82" "92" "71.5" ...
##   ..$ Body             : chr [1:121] "78" "70.5" "80.6" "60.2" ...
##   ..$ Weight_kg        : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
##   ..$ Condition        : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Necropsy = col_double(),
##   .. ..   NecropsyDate = col_character(),
##   .. ..   Dissector = col_character(),
##   .. ..   ApproxAge = col_character(),
##   .. ..   Sex = col_character(),
##   .. ..   Fecundity_Females = col_character(),
##   .. ..   RearFoot_cm = col_character(),
##   .. ..   Tail_cm = col_character(),
##   .. ..   Ear_cm = col_character(),
##   .. ..   `Body_w/Tail_cm` = col_character(),
##   .. ..   Body = col_character(),
##   .. ..   Weight_kg = col_character(),
##   .. ..   Condition = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Age       : chr [1:121] "3" "1" "2" "0" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Age = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr>

We can look at each list element by using the $ similar to a data frame.

Let’s see what are list elements are named using the names() argument.

# first see what the list elements are named
names(bobcat_data)

## NULL

hmmmm that’s not ideal, our list elements don’t have names. We can fix this by adding a function outside the Purrrr::map() to rename each object in the list.

# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw', 
  
  # provide the names of each data frame
  c('bobcat_collection_data.csv',
    'bobcat_necropsy_only_data.csv',
    'bobcat_age_data.csv')) %>% 
  
  # use purrr::map to read in all data at once
  purrr::map(
    ~read_csv(.x)) %>% 
  
  # assign names to list objects
purrr::set_names('bobcat_collection',
                   'bobcat_necropsy',
                   'bobcat_age')

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(bobcat_data)

## List of 3
##  $ bobcat_collection: spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#    : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ County        : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
##   ..$ Township      : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
##   ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
##   ..$ Month         : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
##   ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
##   ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   County = col_character(),
##   .. ..   Township = col_character(),
##   .. ..   CollectionDate = col_character(),
##   .. ..   Month = col_double(),
##   .. ..   Coordinates_N = col_double(),
##   .. ..   Coordinates_W = col_double()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ bobcat_necropsy  : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#       : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Necropsy         : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ NecropsyDate     : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
##   ..$ Dissector        : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
##   ..$ ApproxAge        : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
##   ..$ Sex              : chr [1:121] "M" "F" "M" "F" ...
##   ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
##   ..$ RearFoot_cm      : chr [1:121] "17.9" "16" "17.1" "14.7" ...
##   ..$ Tail_cm          : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
##   ..$ Ear_cm           : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
##   ..$ Body_w/Tail_cm   : chr [1:121] "89.5" "82" "92" "71.5" ...
##   ..$ Body             : chr [1:121] "78" "70.5" "80.6" "60.2" ...
##   ..$ Weight_kg        : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
##   ..$ Condition        : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Necropsy = col_double(),
##   .. ..   NecropsyDate = col_character(),
##   .. ..   Dissector = col_character(),
##   .. ..   ApproxAge = col_character(),
##   .. ..   Sex = col_character(),
##   .. ..   Fecundity_Females = col_character(),
##   .. ..   RearFoot_cm = col_character(),
##   .. ..   Tail_cm = col_character(),
##   .. ..   Ear_cm = col_character(),
##   .. ..   `Body_w/Tail_cm` = col_character(),
##   .. ..   Body = col_character(),
##   .. ..   Weight_kg = col_character(),
##   .. ..   Condition = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ bobcat_age       : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
##   ..$ Age       : chr [1:121] "3" "1" "2" "0" ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   `Bobcat_ID#` = col_character(),
##   .. ..   Age = col_character()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr>

Now we can look at each list element like by using the $

# look at the structure of the necropsy data
str(bobcat_data$necropsy)

##  NULL

Format/manipulate data with Purrr

We can also use Purrr to manipulate our data. Recall I recommend always using lowercase for column names, objects, etc. The bobcat data when entered does not follow those rules so we’d want to change that after we import in R.

Let’s look at the non-purrr way first and then see how much code repetition we avoid when we use Purrr. We would likely want to do this when we read in the data so, on your own copy the non-purrr code from above where we read in the data, and set the column names to lowercase for each data set.

Use the head() function to look at the first few rows of each data set.

bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv') %>% 
  
  # set names to lowercase
  set_names(
    names(.) %>% 
      tolower())

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(bobcat_collection_data)

## # A tibble: 6 × 7
##   `bobcat_id#` county  township collectiondate month coordinates_n coordinates_w
##   <chr>        <chr>   <chr>    <chr>          <dbl>         <dbl>         <dbl>
## 1 5/18/01      Athens  Dover    12/30/18          12          39.4         -82.1
## 2 5/18/02      Athens  Canaan   12/14/18          12          39.3         -82.0
## 3 58-18-03     Morgan  Hower    12/7/18           12          39.5         -82.0
## 4 64-19-04     Perry   Reading  2/10/19            2          39.8         -82.3
## 5 34-15-05     Harris… Monroe   1/11/15            1          40.4         -81.2
## 6 71-19-06     Ross    Jeffers… 3/7/19             3          39.2         -82.8

bobcat_necropsy_only_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv')%>% 
  
  # set names to lowercase
  set_names(
    names(.) %>% 
      tolower())

## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(bobcat_necropsy_only_data)

## # A tibble: 6 × 14
##   `bobcat_id#` necropsy necropsydate dissector approxage sex   fecundity_females
##   <chr>           <dbl> <chr>        <chr>     <chr>     <chr> <chr>            
## 1 5/18/01             1 3/6/19       SP,_MD    Ad        M     na               
## 2 5/18/02             2 3/9/19       SP_MD_CH  Ad        F     na               
## 3 58-18-03            3 3/10/19      MD_CH     Ad        M     na               
## 4 64-19-04            4 3/13/19      MD_CH_HK  Juv       F     0                
## 5 34-15-05            5 3/23/19      MD_CH_JG  Ad        M     na               
## 6 71-19-06            6 3/24/19      MD_CH     Ad        M     na               
## # ℹ 7 more variables: rearfoot_cm <chr>, tail_cm <chr>, ear_cm <chr>,
## #   `body_w/tail_cm` <chr>, body <chr>, weight_kg <chr>, condition <chr>

bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv')%>% 
  
  # set names to lowercase
  set_names(
    names(.) %>% 
      tolower())

## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(bobcat_age_data)

## # A tibble: 6 × 2
##   `bobcat_id#` age  
##   <chr>        <chr>
## 1 5/18/01      3    
## 2 5/18/02      1    
## 3 58-18-03     2    
## 4 64-19-04     0    
## 5 34-15-05     1    
## 6 71-19-06     X

That’s a lot of repetition in our code, which we generally want to avoid whenever possible. With Purrr we can do just that.

We are going to use the same code from above that we used to read in the data files using a file path, and we will add a function to set the column names to lower case.

To do multiple iterations within the same purrr::map() function we have to change one thing. Instead of typing a ‘~’ before the read_csv() function and referencing our list (.x) inside the read_csv() function we need to reference the list elements first and then supply the multiple funcitons we want to apply.

See below

# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw', 
                         
                         # provide the names of each data frame
                         c('bobcat_collection_data.csv',
                           'bobcat_necropsy_only_data.csv',
                           'bobcat_age_data.csv')) %>% 
  
  # use purrr::map to read in all data at once
  purrr::map(
    
    # reference list elements with ~
    ~.x %>% 
      
      # read in list elements   
      read_csv() %>% 
      
      # set names to lower case
      set_names(
        names(.) %>% 
          tolower())
  ) %>% 
  
  # assign names to list objects
purrr::set_names('bobcat_collection',
                   'bobcat_necropsy',
                   'bobcat_age')

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(bobcat_data$bobcat_collection)

## # A tibble: 6 × 7
##   `bobcat_id#` county  township collectiondate month coordinates_n coordinates_w
##   <chr>        <chr>   <chr>    <chr>          <dbl>         <dbl>         <dbl>
## 1 5/18/01      Athens  Dover    12/30/18          12          39.4         -82.1
## 2 5/18/02      Athens  Canaan   12/14/18          12          39.3         -82.0
## 3 58-18-03     Morgan  Hower    12/7/18           12          39.5         -82.0
## 4 64-19-04     Perry   Reading  2/10/19            2          39.8         -82.3
## 5 34-15-05     Harris… Monroe   1/11/15            1          40.4         -81.2
## 6 71-19-06     Ross    Jeffers… 3/7/19             3          39.2         -82.8

I also don’t like the column name for ‘bobcat ID number’. See how it reads in with ’’ because someone used the ‘#’ instead of typing number and R doesn’t like that. Let’s change this in all the data sets using Purrr.

For this example we won’t go through the non-purrr way to save time

# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw', 
                         
                         # provide the names of each data frame
                         c('bobcat_collection_data.csv',
                           'bobcat_necropsy_only_data.csv',
                           'bobcat_age_data.csv')) %>% 
  
  # use purrr::map to read in all data at once
  purrr::map(
    
    # reference list elements with ~
    ~.x %>% 
      
      # read in list elements   
      read_csv() %>% 
      
      # set names to lower case
      set_names(
        names(.) %>% 
          tolower()) %>% 
      
      # change bobcats id# to better name
      rename(.,
             'bobcat_id' = 'bobcat_id#') # new name = old name
  ) %>% 
  
  # assign names to list objects
  purrr::set_names('bobcat_collection',
                   'bobcat_necropsy',
                   'bobcat_age')

## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl  (1): Necropsy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(bobcat_data$collection)

## NULL

Much better!

Purrr imap

Saving data

Now that we have tidy data sets we may want to save these to our hard drive so we don’t have to run code to reformat them every time we read in the data.

In your script, type code that will save each of the data sets in the bobcat_data list as a csv to the data/processed folder.

# save each data set as a csv
write_csv(bobcat_data$bobcat_collection,
          'data/processed/bobcat_collection.csv')

write_csv(bobcat_data$bobcat_age,
          'data/processed/bobcat_age.csv')

write_csv(bobcat_data$bobcat_necropsy,
          'data/processed/bobcat_necropsy.csv')

Now the Purrr way!

We can use the Purrr function imap() because it retains the names of elements within our list (.y) when we save them

# save each data set as a csv
purrr::imap(
  bobcat_data,
  ~write_csv(.x,
             file = paste0("data/processed/",
                           .y,
                           '.csv')))

## $bobcat_collection
## # A tibble: 121 × 7
##    bobcat_id county    township collectiondate month coordinates_n coordinates_w
##    <chr>     <chr>     <chr>    <chr>          <dbl>         <dbl>         <dbl>
##  1 5/18/01   Athens    Dover    12/30/18          12          39.4         -82.1
##  2 5/18/02   Athens    Canaan   12/14/18          12          39.3         -82.0
##  3 58-18-03  Morgan    Hower    12/7/18           12          39.5         -82.0
##  4 64-19-04  Perry     Reading  2/10/19            2          39.8         -82.3
##  5 34-15-05  Harrison  Monroe   1/11/15            1          40.4         -81.2
##  6 71-19-06  Ross      Jeffers… 3/7/19             3          39.2         -82.8
##  7 16-19-07  Coshocton Jeffers… 2/28/19            2          40.4         -82.0
##  8 40-19-08  Jackson   Madison  2/27/19            2          38.9         -82.4
##  9 61-19-09  Noble     Elk      2/20/19            2          39.7         -81.3
## 10 27-19-10  Gallia    Gallipo… 3/1/19             3          38.8         -82.2
## # ℹ 111 more rows
## 
## $bobcat_necropsy
## # A tibble: 121 × 14
##    bobcat_id necropsy necropsydate dissector approxage sex   fecundity_females
##    <chr>        <dbl> <chr>        <chr>     <chr>     <chr> <chr>            
##  1 5/18/01          1 3/6/19       SP,_MD    Ad        M     na               
##  2 5/18/02          2 3/9/19       SP_MD_CH  Ad        F     na               
##  3 58-18-03         3 3/10/19      MD_CH     Ad        M     na               
##  4 64-19-04         4 3/13/19      MD_CH_HK  Juv       F     0                
##  5 34-15-05         5 3/23/19      MD_CH_JG  Ad        M     na               
##  6 71-19-06         6 3/24/19      MD_CH     Ad        M     na               
##  7 16-19-07         7 3/31/19      MD_HK     Ad        M     na               
##  8 40-19-08         8 4/7/19       MD_CH_HK  Ad        M     na               
##  9 61-19-09         9 4/13/19      MD_CH     Juv       M     na               
## 10 27-19-10        10 4/14/19      MD_HK     Ad        M     na               
## # ℹ 111 more rows
## # ℹ 7 more variables: rearfoot_cm <chr>, tail_cm <chr>, ear_cm <chr>,
## #   `body_w/tail_cm` <chr>, body <chr>, weight_kg <chr>, condition <chr>
## 
## $bobcat_age
## # A tibble: 121 × 2
##    bobcat_id age  
##    <chr>     <chr>
##  1 5/18/01   3    
##  2 5/18/02   1    
##  3 58-18-03  2    
##  4 64-19-04  0    
##  5 34-15-05  1    
##  6 71-19-06  X    
##  7 16-19-07  X    
##  8 40-19-08  1    
##  9 61-19-09  0    
## 10 27-19-10  3    
## # ℹ 111 more rows

So much less repetition and code if you have a lot of data sets to save

We’ve only barely scratched the surface of what Purrr can do, but given our limited time that is where we will end. Just remember anytime you find yourself repeating the same operations for multiple objects of the same type you may want to consider using Purrr instead to reduce repetition in your code.

Practice problems

There is no formal assignment for this module, but you can complete the problems below for some extra practice.

–go to next module–