Click here to download the script! Save the script to the project directory you set up in the previous module.
Load your script in RStudio. To do this, open RStudio and click on the folder icon in the toolbar at the top to load your script.
Often times you will want to apply the same function or multiple functions to multiple objects or inputs this is called iteration. In Base R, iterations are achieved by using for-loops, which are cumbersome and not very intuitive. If you’ve taken an R class before and covered for-loops, you are probably already nervous. But do not fear Tidyverse has made a package to conduct iteration operations that is much more user-friendly… and it has a cute inviting name, Purrr
Let’s go through a few examples of how the Purrr package can be used
As you can see from the Purrr cheat sheet, the Purrr package has a lot of usages. For the purposes of this course we will just be skimming the surface and mostly focus on applying functions with Purrr (upper left).
To apply the same function/s to a number of
objects we use the map()
function.
Let’s go through an example WITHOUT using Purrr first and then apply the tidyverse alternative.
One of the most common ways I use Purrr is when I want to read in multiple data sets for the same analysis. Often times landscape data, weather data, species data, etc. will be entered separately but you need all the data files for an analysis.
Start by reading in the 3 bobcat data files for this module, leave them named as they are
bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv')
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bobcat_necropsy_only_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv')
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv')
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now lets do this using Purrr!
First we need to assign our object to the environment with a name. When we read in multiple data frames at a time they will be stored in a list object. Lists can be confusing to work with at first but they follow a similar structure to a data frame. Let’s name this list bobcat_data.
And just like when we read in an individual data frame we need to provide the path for the data and the name and file extension (e.g. .csv, .txt.)
Then we were reference our purrr::map function AND the function we want to apply to all of the data. This is where things can get a bit confusing at first,
read_csv()
fuction, BUT we must preface this
fuction with a ‘~’. The ‘~’ is part of the syntax Purrr
uses.Let’s look at an example
# assign objects as list provide the names and file path of each data frame
bobcat_data <- list('data/raw/bobcat_collection_data.csv',
'data/raw/bobcat_necropsy_only_data.csv',
'data/raw/bobcat_age_data.csv') %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x)
)
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# look at the list structure
str(bobcat_data)
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
We can reduce our typing even further since all of the data are stored in the same folder by referencing a file path.
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x))
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bobcat_data)
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
We could also format it like this if that makes more sense to you, using a period as the placeholder for the data file names
# assign object name to environment and provide list of data file names
bobcat_data <- list('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv') %>%
# provide file path for the data
file.path('data/raw', .) %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x))
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bobcat_data)
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
We can look at each list element by using the
$
similar to a data frame.
Let’s see what are list elements are named using the
names()
argument.
# first see what the list elements are named
names(bobcat_data)
## NULL
hmmmm that’s not ideal, our list elements don’t have names. We can
fix this by adding a function outside the
Purrrr::map()
to rename each object in the list.
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x)) %>%
# assign names to list objects
purrr::set_names('bobcat_collection',
'bobcat_necropsy',
'bobcat_age')
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bobcat_data)
## List of 3
## $ bobcat_collection: spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ bobcat_necropsy : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ bobcat_age : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
Now we can look at each list element like by using
the $
# look at the structure of the necropsy data
str(bobcat_data$necropsy)
## NULL
We can also use Purrr to manipulate our data. Recall I recommend always using lowercase for column names, objects, etc. The bobcat data when entered does not follow those rules so we’d want to change that after we import in R.
Let’s look at the non-purrr way first and then see how much code repetition we avoid when we use Purrr. We would likely want to do this when we read in the data so, on your own copy the non-purrr code from above where we read in the data, and set the column names to lowercase for each data set.
Use the head()
function to
look at the first few rows of each data
set.
bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv') %>%
# set names to lowercase
set_names(
names(.) %>%
tolower())
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_collection_data)
## # A tibble: 6 × 7
## `bobcat_id#` county township collectiondate month coordinates_n coordinates_w
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5/18/01 Athens Dover 12/30/18 12 39.4 -82.1
## 2 5/18/02 Athens Canaan 12/14/18 12 39.3 -82.0
## 3 58-18-03 Morgan Hower 12/7/18 12 39.5 -82.0
## 4 64-19-04 Perry Reading 2/10/19 2 39.8 -82.3
## 5 34-15-05 Harris… Monroe 1/11/15 1 40.4 -81.2
## 6 71-19-06 Ross Jeffers… 3/7/19 3 39.2 -82.8
bobcat_necropsy_only_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv')%>%
# set names to lowercase
set_names(
names(.) %>%
tolower())
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_necropsy_only_data)
## # A tibble: 6 × 14
## `bobcat_id#` necropsy necropsydate dissector approxage sex fecundity_females
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 5/18/01 1 3/6/19 SP,_MD Ad M na
## 2 5/18/02 2 3/9/19 SP_MD_CH Ad F na
## 3 58-18-03 3 3/10/19 MD_CH Ad M na
## 4 64-19-04 4 3/13/19 MD_CH_HK Juv F 0
## 5 34-15-05 5 3/23/19 MD_CH_JG Ad M na
## 6 71-19-06 6 3/24/19 MD_CH Ad M na
## # ℹ 7 more variables: rearfoot_cm <chr>, tail_cm <chr>, ear_cm <chr>,
## # `body_w/tail_cm` <chr>, body <chr>, weight_kg <chr>, condition <chr>
bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv')%>%
# set names to lowercase
set_names(
names(.) %>%
tolower())
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_age_data)
## # A tibble: 6 × 2
## `bobcat_id#` age
## <chr> <chr>
## 1 5/18/01 3
## 2 5/18/02 1
## 3 58-18-03 2
## 4 64-19-04 0
## 5 34-15-05 1
## 6 71-19-06 X
That’s a lot of repetition in our code, which we generally want to avoid whenever possible. With Purrr we can do just that.
We are going to use the same code from above that we used to read in the data files using a file path, and we will add a function to set the column names to lower case.
To do multiple iterations within the same
purrr::map()
function we have to change
one thing. Instead of typing a ‘~’ before the read_csv()
function and referencing our list (.x) inside the
read_csv()
function we need to reference
the list elements first and then supply the multiple
funcitons we want to apply.
See below
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
# reference list elements with ~
~.x %>%
# read in list elements
read_csv() %>%
# set names to lower case
set_names(
names(.) %>%
tolower())
) %>%
# assign names to list objects
purrr::set_names('bobcat_collection',
'bobcat_necropsy',
'bobcat_age')
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_data$bobcat_collection)
## # A tibble: 6 × 7
## `bobcat_id#` county township collectiondate month coordinates_n coordinates_w
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5/18/01 Athens Dover 12/30/18 12 39.4 -82.1
## 2 5/18/02 Athens Canaan 12/14/18 12 39.3 -82.0
## 3 58-18-03 Morgan Hower 12/7/18 12 39.5 -82.0
## 4 64-19-04 Perry Reading 2/10/19 2 39.8 -82.3
## 5 34-15-05 Harris… Monroe 1/11/15 1 40.4 -81.2
## 6 71-19-06 Ross Jeffers… 3/7/19 3 39.2 -82.8
I also don’t like the column name for ‘bobcat ID number’. See how it reads in with ’’ because someone used the ‘#’ instead of typing number and R doesn’t like that. Let’s change this in all the data sets using Purrr.
For this example we won’t go through the non-purrr way to save time
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
# reference list elements with ~
~.x %>%
# read in list elements
read_csv() %>%
# set names to lower case
set_names(
names(.) %>%
tolower()) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#') # new name = old name
) %>%
# assign names to list objects
purrr::set_names('bobcat_collection',
'bobcat_necropsy',
'bobcat_age')
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_data$collection)
## NULL
Much better!
Now that we have tidy data sets we may want to save these to our hard drive so we don’t have to run code to reformat them every time we read in the data.
In your script, type code that will save each of the data sets in the bobcat_data list as a csv to the data/processed folder.
# save each data set as a csv
write_csv(bobcat_data$bobcat_collection,
'data/processed/bobcat_collection.csv')
write_csv(bobcat_data$bobcat_age,
'data/processed/bobcat_age.csv')
write_csv(bobcat_data$bobcat_necropsy,
'data/processed/bobcat_necropsy.csv')
Now the Purrr way!
We can use the Purrr function
imap()
because it retains the names of elements within our
list (.y) when we save them
# save each data set as a csv
purrr::imap(
bobcat_data,
~write_csv(.x,
file = paste0("data/processed/",
.y,
'.csv')))
## $bobcat_collection
## # A tibble: 121 × 7
## bobcat_id county township collectiondate month coordinates_n coordinates_w
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5/18/01 Athens Dover 12/30/18 12 39.4 -82.1
## 2 5/18/02 Athens Canaan 12/14/18 12 39.3 -82.0
## 3 58-18-03 Morgan Hower 12/7/18 12 39.5 -82.0
## 4 64-19-04 Perry Reading 2/10/19 2 39.8 -82.3
## 5 34-15-05 Harrison Monroe 1/11/15 1 40.4 -81.2
## 6 71-19-06 Ross Jeffers… 3/7/19 3 39.2 -82.8
## 7 16-19-07 Coshocton Jeffers… 2/28/19 2 40.4 -82.0
## 8 40-19-08 Jackson Madison 2/27/19 2 38.9 -82.4
## 9 61-19-09 Noble Elk 2/20/19 2 39.7 -81.3
## 10 27-19-10 Gallia Gallipo… 3/1/19 3 38.8 -82.2
## # ℹ 111 more rows
##
## $bobcat_necropsy
## # A tibble: 121 × 14
## bobcat_id necropsy necropsydate dissector approxage sex fecundity_females
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 5/18/01 1 3/6/19 SP,_MD Ad M na
## 2 5/18/02 2 3/9/19 SP_MD_CH Ad F na
## 3 58-18-03 3 3/10/19 MD_CH Ad M na
## 4 64-19-04 4 3/13/19 MD_CH_HK Juv F 0
## 5 34-15-05 5 3/23/19 MD_CH_JG Ad M na
## 6 71-19-06 6 3/24/19 MD_CH Ad M na
## 7 16-19-07 7 3/31/19 MD_HK Ad M na
## 8 40-19-08 8 4/7/19 MD_CH_HK Ad M na
## 9 61-19-09 9 4/13/19 MD_CH Juv M na
## 10 27-19-10 10 4/14/19 MD_HK Ad M na
## # ℹ 111 more rows
## # ℹ 7 more variables: rearfoot_cm <chr>, tail_cm <chr>, ear_cm <chr>,
## # `body_w/tail_cm` <chr>, body <chr>, weight_kg <chr>, condition <chr>
##
## $bobcat_age
## # A tibble: 121 × 2
## bobcat_id age
## <chr> <chr>
## 1 5/18/01 3
## 2 5/18/02 1
## 3 58-18-03 2
## 4 64-19-04 0
## 5 34-15-05 1
## 6 71-19-06 X
## 7 16-19-07 X
## 8 40-19-08 1
## 9 61-19-09 0
## 10 27-19-10 3
## # ℹ 111 more rows
So much less repetition and code if you have a lot of data sets to save
We’ve only barely scratched the surface of what Purrr can do, but given our limited time that is where we will end. Just remember anytime you find yourself repeating the same operations for multiple objects of the same type you may want to consider using Purrr instead to reduce repetition in your code.
There is no formal assignment for this module, but you can complete the problems below for some extra practice.