Click here to download the script! Save the script to the project directory you set up in the previous module.
Load your script in RStudio. To do this, open RStudio and click on the folder icon in the toolbar at the top to load your script.
Often times you will want to apply the same function or multiple functions to multiple objects or inputs this is called iteration. In Base R, iterations are achieved by using for-loops, which are cumbersome and not very intuitive. If you’ve taken an R class before and covered for-loops, you are probably already nervous. But do not fear Tidyverse has made a package to conduct iteration operations that is much more user-friendly… and it has a cute inviting name, Purrr
Let’s go through a few examples of how the Purrr package can be used
As you can see from the Purrr cheat sheet, the Purrr package has a lot of usages. For the purposes of this course we will just be skimming the surface and mostly focus on applying functions with Purrr (upper left).
To apply the same function/s to a number of
objects we use the map()
function.
Let’s go through an example WITHOUT using Purrr first and then apply the tidyverse alternative.
First let’s generate a few random objects for us to work with
test_object_1 <- c(1:10)
test_object_2 <- c(11:20)
test_object_3 <- c(21:30)
We just generated three vectors of 10 numbers each
Now let’s say we want to extract the mean from each of these
The non-purrr way would be as follows
# for first object
mean(test_object_1)
## [1] 5.5
# object 2
mean(test_object_2)
## [1] 15.5
# object 3
mean(test_object_2)
## [1] 15.5
While relatively easy to do, you can imagine if you had many more objects to work with, copy and pasting this code would get annoying and likely to produce errors the more times you have to do it.
Instead we can use Purrr to calculate the mean of all the objects at once
#first supply all the objects in a list
list(test_object_1,
test_object_2,
test_object_3) %>%
# then use the map function with ~.x as a placeholder for all the objects before the last pipe (similar to '.')
map(~.x %>%
# then supply the function/s
mean())
## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 15.5
##
## [[3]]
## [1] 25.5
Viola!
###The purrr process
Using Purrr might seem confusing at first, but it becomes second-hand nature once you have some practice, and by following these simple steps.
map()
function, providing ~.x
as a placeholder for your list (If only one operation is being applied
you can use ~function(.x) instead. I’ll show you both
approachesLet’s try this process for a simple object manipulation
Step 1
# code for one object
mean(test_object_1)
## [1] 5.5
Step 2
# now let's adjust this code so it's in a dplyr pipe format
test_object_1 %>%
mean()
## [1] 5.5
This step is crucial! If your existing code isn’t already in dplyr format it will be much more difficult to transition to purrr
Step 3
# instead of one object we supply a list
list(test_object_1,
test_object_2,
test_object_3) %>%
# then use the map function with ~.x as a placeholder for all the objects before the last pipe (similar to '.')
map(~.x %>%
# then supply the function/s
mean())
## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 15.5
##
## [[3]]
## [1] 25.5
One of the most common ways I use Purrr is when I want to read in multiple data sets for the same analysis. Often times landscape data, weather data, species data, etc. will be entered separately but you need all the data files for an analysis. Importing data with purrr isn’t as straighforward as other operations because there it isn’t intuitive how to accomplish step 2.
Start by reading in the 3 bobcat data files for this module, leave them named as they are
bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv')
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bobcat_necropsy_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv')
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv')
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now lets do this using Purrr!
First we need to assign our object to the environment with a name. When we read in multiple data frames at a time they will be stored in a list object. Lists can be confusing to work with at first but they follow a similar structure to a data frame. Let’s name this list bobcat_data.
And just like when we read in an individual data frame we need to provide the path for the data and the name and file extension (e.g. .csv, .txt.)
Then we were reference our purrr::map function AND the function we want to apply to all of the data. This is where things can get a bit confusing at first,
read_csv()
fuction, BUT we must preface this
fuction with a ‘~’. The ‘~’ is part of the syntax Purrr
uses.Let’s look at an example
# assign objects as list provide the names and file path of each data frame
bobcat_data <- list('data/raw/bobcat_collection_data.csv',
'data/raw/bobcat_necropsy_only_data.csv',
'data/raw/bobcat_age_data.csv') %>%
# use purrr::map to read in all data at once
purrr::map(
# this is alternative syntax you might see when only working with one function where the function is supplied after the '~' and '.x' is included in the function ()
~read_csv(.x)
)
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# look at the list structure
str(bobcat_data)
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
We can reduce our typing even further from the earlier example since all of the data are stored in the same folder by referencing a file path.
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x))
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bobcat_data)
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
We could also format it like this if that makes more sense to you, using a period as the placeholder for the data file names
# assign object name to environment and provide list of data file names
bobcat_data <- list('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv') %>%
# provide file path for the data
file.path('data/raw', .) %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x))
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bobcat_data)
## List of 3
## $ : spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
We can look at each list element by using the
$
similar to a data frame.
Let’s see what are list elements are named using the
names()
argument.
# first see what the list elements are named
names(bobcat_data)
## NULL
hmmmm that’s not ideal, our list elements don’t have names. We can
fix this by adding a function outside the
Purrrr::map()
to rename each object in the list.
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x)) %>%
# assign names to list objects
purrr::set_names('collection',
'necropsy',
'age')
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bobcat_data)
## List of 3
## $ collection: spc_tbl_ [121 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ County : chr [1:121] "Athens" "Athens" "Morgan" "Perry" ...
## ..$ Township : chr [1:121] "Dover" "Canaan" "Hower" "Reading" ...
## ..$ CollectionDate: chr [1:121] "12/30/18" "12/14/18" "12/7/18" "2/10/19" ...
## ..$ Month : num [1:121] 12 12 12 2 1 3 2 2 2 3 ...
## ..$ Coordinates_N : num [1:121] 39.4 39.3 39.5 39.8 40.4 ...
## ..$ Coordinates_W : num [1:121] -82.1 -82 -82 -82.3 -81.2 ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. County = col_character(),
## .. .. Township = col_character(),
## .. .. CollectionDate = col_character(),
## .. .. Month = col_double(),
## .. .. Coordinates_N = col_double(),
## .. .. Coordinates_W = col_double()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ necropsy : spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## ..$ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## ..$ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## ..$ Sex : chr [1:121] "M" "F" "M" "F" ...
## ..$ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## ..$ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## ..$ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## ..$ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## ..$ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## ..$ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## ..$ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## ..$ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Necropsy = col_double(),
## .. .. NecropsyDate = col_character(),
## .. .. Dissector = col_character(),
## .. .. ApproxAge = col_character(),
## .. .. Sex = col_character(),
## .. .. Fecundity_Females = col_character(),
## .. .. RearFoot_cm = col_character(),
## .. .. Tail_cm = col_character(),
## .. .. Ear_cm = col_character(),
## .. .. `Body_w/Tail_cm` = col_character(),
## .. .. Body = col_character(),
## .. .. Weight_kg = col_character(),
## .. .. Condition = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
## $ age : spc_tbl_ [121 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## ..$ Bobcat_ID#: chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## ..$ Age : chr [1:121] "3" "1" "2" "0" ...
## ..- attr(*, "spec")=
## .. .. cols(
## .. .. `Bobcat_ID#` = col_character(),
## .. .. Age = col_character()
## .. .. )
## ..- attr(*, "problems")=<externalptr>
Now we can look at each list element like by using
the $
# look at the structure of the necropsy data
str(bobcat_data$necropsy)
## spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Bobcat_ID# : chr [1:121] "5/18/01" "5/18/02" "58-18-03" "64-19-04" ...
## $ Necropsy : num [1:121] 1 2 3 4 5 6 7 8 9 10 ...
## $ NecropsyDate : chr [1:121] "3/6/19" "3/9/19" "3/10/19" "3/13/19" ...
## $ Dissector : chr [1:121] "SP,_MD" "SP_MD_CH" "MD_CH" "MD_CH_HK" ...
## $ ApproxAge : chr [1:121] "Ad" "Ad" "Ad" "Juv" ...
## $ Sex : chr [1:121] "M" "F" "M" "F" ...
## $ Fecundity_Females: chr [1:121] "na" "na" "na" "0" ...
## $ RearFoot_cm : chr [1:121] "17.9" "16" "17.1" "14.7" ...
## $ Tail_cm : chr [1:121] "11.5" "11.5" "11.4" "11.3" ...
## $ Ear_cm : chr [1:121] "6.5" "6.5" "6.4" "6.5" ...
## $ Body_w/Tail_cm : chr [1:121] "89.5" "82" "92" "71.5" ...
## $ Body : chr [1:121] "78" "70.5" "80.6" "60.2" ...
## $ Weight_kg : chr [1:121] "13.6" "6.33" "9.98" "4.62" ...
## $ Condition : chr [1:121] "17.43589744" "8.978723404" "12.382134" "7.674418605" ...
## - attr(*, "spec")=
## .. cols(
## .. `Bobcat_ID#` = col_character(),
## .. Necropsy = col_double(),
## .. NecropsyDate = col_character(),
## .. Dissector = col_character(),
## .. ApproxAge = col_character(),
## .. Sex = col_character(),
## .. Fecundity_Females = col_character(),
## .. RearFoot_cm = col_character(),
## .. Tail_cm = col_character(),
## .. Ear_cm = col_character(),
## .. `Body_w/Tail_cm` = col_character(),
## .. Body = col_character(),
## .. Weight_kg = col_character(),
## .. Condition = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Many of the columns read in improperly, we could tackle this
individually for each dataframe, or we could include specification for
the various column types in our map()
function.
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
~read_csv(.x,
col_types = cols(RearFoot_cm = col_number(),
Tail_cm = col_number(),
Ear_cm = col_number(),
'Body_w/Tail_cm' = col_number(),
Body = col_number(),
Weight_kg = col_number(),
Condition = col_number(),
Age = col_number(),
.default = col_factor())
)) %>%
# assign names to list objects
purrr::set_names('collection',
'necropsy',
'age')
str(bobcat_data$necropsy)
## spc_tbl_ [121 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Bobcat_ID# : Factor w/ 121 levels "5/18/01","5/18/02",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Necropsy : Factor w/ 121 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ NecropsyDate : Factor w/ 68 levels "3/6/19","3/9/19",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Dissector : Factor w/ 35 levels "SP,_MD","SP_MD_CH",..: 1 2 3 4 5 3 6 4 3 6 ...
## $ ApproxAge : Factor w/ 4 levels "Ad","Juv","na",..: 1 1 1 2 1 1 1 1 2 1 ...
## $ Sex : Factor w/ 3 levels "M","F","na": 1 2 1 2 1 1 1 1 1 1 ...
## $ Fecundity_Females: Factor w/ 7 levels "na","0","2","4",..: 1 1 1 2 1 1 1 1 1 1 ...
## $ RearFoot_cm : num [1:121] 17.9 16 17.1 14.7 16.5 16 17 18.2 14.6 16.4 ...
## $ Tail_cm : num [1:121] 11.5 11.5 11.4 11.3 14 12.5 12.5 13 10 14.5 ...
## $ Ear_cm : num [1:121] 6.5 6.5 6.4 6.5 5.7 6.5 7 6.5 7 7.2 ...
## $ Body_w/Tail_cm : num [1:121] 89.5 82 92 71.5 95.2 ...
## $ Body : num [1:121] 78 70.5 80.6 60.2 81.2 ...
## $ Weight_kg : num [1:121] 13.6 6.33 9.98 4.62 11.53 ...
## $ Condition : num [1:121] 17.44 8.98 12.38 7.67 14.19 ...
## - attr(*, "spec")=
## .. cols(
## .. .default = col_factor(),
## .. `Bobcat_ID#` = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. Necropsy = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. NecropsyDate = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. Dissector = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. ApproxAge = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. Sex = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. Fecundity_Females = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
## .. RearFoot_cm = col_number(),
## .. Tail_cm = col_number(),
## .. Ear_cm = col_number(),
## .. `Body_w/Tail_cm` = col_number(),
## .. Body = col_number(),
## .. Weight_kg = col_number(),
## .. Condition = col_number()
## .. )
## - attr(*, "problems")=<externalptr>
If your data frames don’t have all the same names of columns, which they usually don’t, you will likely get a warning about parsing issues. You can ignore this just be sure to check the structure of each dataframe in your list before proceeding and ensure the variables you need later read in properly.
We can also use Purrr to manipulate our data. Recall I recommend always using lowercase for column names, objects, etc. The bobcat data when entered does not follow those rules so we’d want to change that after we import in R.
Let’s look at the non-purrr way first and then see how much code repetition we avoid when we use Purrr. We would likely want to do this when we read in the data so, on your own copy the non-purrr code from above where we read in the data, and set the column names to lowercase for each data set.
Use the head()
function to
look at the first few rows of each data
set.
bobcat_collection_data <- read_csv('data/raw/bobcat_collection_data.csv') %>%
# set names to lowercase
set_names(
names(.) %>%
tolower())
## Rows: 121 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Bobcat_ID#, County, Township, CollectionDate
## dbl (3): Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_collection_data)
## # A tibble: 6 × 7
## `bobcat_id#` county township collectiondate month coordinates_n coordinates_w
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5/18/01 Athens Dover 12/30/18 12 39.4 -82.1
## 2 5/18/02 Athens Canaan 12/14/18 12 39.3 -82.0
## 3 58-18-03 Morgan Hower 12/7/18 12 39.5 -82.0
## 4 64-19-04 Perry Reading 2/10/19 2 39.8 -82.3
## 5 34-15-05 Harris… Monroe 1/11/15 1 40.4 -81.2
## 6 71-19-06 Ross Jeffers… 3/7/19 3 39.2 -82.8
bobcat_necropsy_data <- read_csv('data/raw/bobcat_necropsy_only_data.csv')%>%
# set names to lowercase
set_names(
names(.) %>%
tolower())
## Rows: 121 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): Bobcat_ID#, NecropsyDate, Dissector, ApproxAge, Sex, Fecundity_Fem...
## dbl (1): Necropsy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_necropsy_data)
## # A tibble: 6 × 14
## `bobcat_id#` necropsy necropsydate dissector approxage sex fecundity_females
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 5/18/01 1 3/6/19 SP,_MD Ad M na
## 2 5/18/02 2 3/9/19 SP_MD_CH Ad F na
## 3 58-18-03 3 3/10/19 MD_CH Ad M na
## 4 64-19-04 4 3/13/19 MD_CH_HK Juv F 0
## 5 34-15-05 5 3/23/19 MD_CH_JG Ad M na
## 6 71-19-06 6 3/24/19 MD_CH Ad M na
## # ℹ 7 more variables: rearfoot_cm <chr>, tail_cm <chr>, ear_cm <chr>,
## # `body_w/tail_cm` <chr>, body <chr>, weight_kg <chr>, condition <chr>
bobcat_age_data <- read_csv('data/raw/bobcat_age_data.csv')%>%
# set names to lowercase
set_names(
names(.) %>%
tolower())
## Rows: 121 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Bobcat_ID#, Age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(bobcat_age_data)
## # A tibble: 6 × 2
## `bobcat_id#` age
## <chr> <chr>
## 1 5/18/01 3
## 2 5/18/02 1
## 3 58-18-03 2
## 4 64-19-04 0
## 5 34-15-05 1
## 6 71-19-06 X
That’s a lot of repetition in our code, which we generally want to avoid whenever possible. With Purrr we can do just that.
We are going to use the same code from above that we used to read in the data files using a file path, and we will add a function to set the column names to lower case.
To do multiple iterations within the same
purrr::map()
function we have to change
one thing. Instead of typing a ‘~’ before the read_csv()
function and referencing our list (.x) inside the
read_csv()
function we need to reference
the list elements first and then supply the multiple
funcitons we want to apply.
See below
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
# reference list elements with ~
~.x %>%
# read in list elements
read_csv(col_types = cols(RearFoot_cm = col_number(),
Tail_cm = col_number(),
Ear_cm = col_number(),
'Body_w/Tail_cm' = col_number(),
Body = col_number(),
Weight_kg = col_number(),
Condition = col_number(),
Age = col_number(),
.default = col_factor())
) %>%
# set names to lower case
set_names(
names(.) %>%
tolower())
) %>%
# assign names to list objects
purrr::set_names('collection',
'necropsy',
'age')
## Warning: The following named parsers don't match the column names: RearFoot_cm,
## Tail_cm, Ear_cm, Body_w/Tail_cm, Body, Weight_kg, Condition, Age
## Warning: The following named parsers don't match the column names: Age
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Warning: The following named parsers don't match the column names: RearFoot_cm,
## Tail_cm, Ear_cm, Body_w/Tail_cm, Body, Weight_kg, Condition
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
head(bobcat_data$bobcat_collection)
## NULL
I also don’t like the column name for ‘bobcat ID number’. See how it reads in with ’’ because someone used the ‘#’ instead of typing number and R doesn’t like that. Let’s change this in all the data sets using Purrr.
For this example we won’t go through the non-purrr way to save time
# assign object name to environment and provide file path for the data
bobcat_data <- file.path('data/raw',
# provide the names of each data frame
c('bobcat_collection_data.csv',
'bobcat_necropsy_only_data.csv',
'bobcat_age_data.csv')) %>%
# use purrr::map to read in all data at once
purrr::map(
# reference list elements with ~
~.x %>%
# read in list elements
read_csv(col_types = cols(RearFoot_cm = col_number(),
Tail_cm = col_number(),
Ear_cm = col_number(),
'Body_w/Tail_cm' = col_number(),
Body = col_number(),
Weight_kg = col_number(),
Condition = col_number(),
Age = col_number(),
.default = col_factor())
) %>%
# set names to lower case
set_names(
names(.) %>%
tolower()) %>%
# change bobcats id# to better name
rename(.,
'bobcat_id' = 'bobcat_id#') # new name = old name
) %>%
# assign names to list objects
purrr::set_names('collection',
'necropsy',
'age')
## Warning: The following named parsers don't match the column names: RearFoot_cm,
## Tail_cm, Ear_cm, Body_w/Tail_cm, Body, Weight_kg, Condition, Age
## Warning: The following named parsers don't match the column names: Age
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Warning: The following named parsers don't match the column names: RearFoot_cm,
## Tail_cm, Ear_cm, Body_w/Tail_cm, Body, Weight_kg, Condition
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
head(bobcat_data$collection)
## # A tibble: 6 × 7
## bobcat_id county township collectiondate month coordinates_n coordinates_w
## <fct> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 5/18/01 Athens Dover 12/30/18 12 39.38911 -82.14085
## 2 5/18/02 Athens Canaan 12/14/18 12 39.30614 -81.957962
## 3 58-18-03 Morgan Hower 12/7/18 12 39.494746 -81.988452
## 4 64-19-04 Perry Reading 2/10/19 2 39.822288 -82.286951
## 5 34-15-05 Harrison Monroe 1/11/15 1 40.420816 -81.216981
## 6 71-19-06 Ross Jefferson 3/7/19 3 39.23797 -82.7881
Much better!
Now that we have tidy data sets we may want to save these to our hard drive so we don’t have to run code to reformat them every time we read in the data.
In your script, type code that will save each of the data sets in the bobcat_data list as a csv to the data/processed folder.
# save each data set as a csv
write_csv(bobcat_data$collection,
'data/processed/bobcat_collection.csv')
write_csv(bobcat_data$age,
'data/processed/bobcat_age.csv')
write_csv(bobcat_data$necropsy,
'data/processed/bobcat_necropsy.csv')
Now the Purrr way!
We can use the Purrr function
imap()
because it retains the names of elements within our
list (.y) when we save them
# save each data set as a csv
purrr::imap(
bobcat_data,
~write_csv(.x,
file = paste0("data/processed/",
.y,
'.csv')))
## $collection
## # A tibble: 121 × 7
## bobcat_id county township collectiondate month coordinates_n coordinates_w
## <fct> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 5/18/01 Athens Dover 12/30/18 12 39.38911 -82.14085
## 2 5/18/02 Athens Canaan 12/14/18 12 39.30614 -81.957962
## 3 58-18-03 Morgan Hower 12/7/18 12 39.494746 -81.988452
## 4 64-19-04 Perry Reading 2/10/19 2 39.822288 -82.286951
## 5 34-15-05 Harrison Monroe 1/11/15 1 40.420816 -81.216981
## 6 71-19-06 Ross Jeffers… 3/7/19 3 39.23797 -82.7881
## 7 16-19-07 Coshocton Jeffers… 2/28/19 2 40.3524 -82.018
## 8 40-19-08 Jackson Madison 2/27/19 2 38.903 -82.4497
## 9 61-19-09 Noble Elk 2/20/19 2 39.65858 -81.28657
## 10 27-19-10 Gallia Gallipo… 3/1/19 3 38.8425 -82.1815
## # ℹ 111 more rows
##
## $necropsy
## # A tibble: 121 × 14
## bobcat_id necropsy necropsydate dissector approxage sex fecundity_females
## <fct> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 5/18/01 1 3/6/19 SP,_MD Ad M na
## 2 5/18/02 2 3/9/19 SP_MD_CH Ad F na
## 3 58-18-03 3 3/10/19 MD_CH Ad M na
## 4 64-19-04 4 3/13/19 MD_CH_HK Juv F 0
## 5 34-15-05 5 3/23/19 MD_CH_JG Ad M na
## 6 71-19-06 6 3/24/19 MD_CH Ad M na
## 7 16-19-07 7 3/31/19 MD_HK Ad M na
## 8 40-19-08 8 4/7/19 MD_CH_HK Ad M na
## 9 61-19-09 9 4/13/19 MD_CH Juv M na
## 10 27-19-10 10 4/14/19 MD_HK Ad M na
## # ℹ 111 more rows
## # ℹ 7 more variables: rearfoot_cm <dbl>, tail_cm <dbl>, ear_cm <dbl>,
## # `body_w/tail_cm` <dbl>, body <dbl>, weight_kg <dbl>, condition <dbl>
##
## $age
## # A tibble: 121 × 2
## bobcat_id age
## <fct> <dbl>
## 1 5/18/01 3
## 2 5/18/02 1
## 3 58-18-03 2
## 4 64-19-04 0
## 5 34-15-05 1
## 6 71-19-06 NA
## 7 16-19-07 NA
## 8 40-19-08 1
## 9 61-19-09 0
## 10 27-19-10 3
## # ℹ 111 more rows
So much less repetition and code if you have a lot of data sets to save
###Figures
If you’ve done assignment 4, you may have been annoyed with how much code repetition there was to generate all the histograms for your explanatory variables. But not to fear, Purrr is here!
We can use Purrr’s imap()
function to quickly
generate several of the same plot without having to copy and paste a
bunch of code.
In your script attempt to make histograms for all the numeric variables in the necropsy data following the 3 steps I outlined earlier
# step 1: code for 1 histogram
hist(bobcat_data$necropsy$rearfoot_cm)
# step 2: translate code to dplyr pipe format
bobcat_data$necropsy$rearfoot_cm %>%
hist()
# step 3: provide list and pipe into `map()`
bobcat_data$necropsy %>%
# select only numeric vars
select(is.numeric) %>%
# apply purrr::map
map(~.x %>%
#provide histograms
hist()
)
## Warning: Use of bare predicate functions was deprecated in tidyselect 1.1.0.
## ℹ Please use wrap predicates in `where()` instead.
## # Was:
## data %>% select(is.numeric)
##
## # Now:
## data %>% select(where(is.numeric))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## $rearfoot_cm
## $breaks
## [1] 11 12 13 14 15 16 17 18 19
##
## $counts
## [1] 2 1 10 17 25 31 16 6
##
## $density
## [1] 0.018518519 0.009259259 0.092592593 0.157407407 0.231481481 0.287037037
## [7] 0.148148148 0.055555556
##
## $mids
## [1] 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $tail_cm
## $breaks
## [1] 7 8 9 10 11 12 13 14 15 16 17
##
## $counts
## [1] 1 1 6 8 18 26 20 21 6 1
##
## $density
## [1] 0.009259259 0.009259259 0.055555556 0.074074074 0.166666667 0.240740741
## [7] 0.185185185 0.194444444 0.055555556 0.009259259
##
## $mids
## [1] 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $ear_cm
## $breaks
## [1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
##
## $counts
## [1] 2 5 17 15 30 16 9 3
##
## $density
## [1] 0.04123711 0.10309278 0.35051546 0.30927835 0.61855670 0.32989691 0.18556701
## [8] 0.06185567
##
## $mids
## [1] 4.25 4.75 5.25 5.75 6.25 6.75 7.25 7.75
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $`body_w/tail_cm`
## $breaks
## [1] 55 60 65 70 75 80 85 90 95 100 105
##
## $counts
## [1] 3 2 5 14 22 20 23 13 5 1
##
## $density
## [1] 0.005555556 0.003703704 0.009259259 0.025925926 0.040740741 0.037037037
## [7] 0.042592593 0.024074074 0.009259259 0.001851852
##
## $mids
## [1] 57.5 62.5 67.5 72.5 77.5 82.5 87.5 92.5 97.5 102.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $body
## $breaks
## [1] 45 50 55 60 65 70 75 80 85 90
##
## $counts
## [1] 3 2 11 16 26 24 19 5 2
##
## $density
## [1] 0.005555556 0.003703704 0.020370370 0.029629630 0.048148148 0.044444444
## [7] 0.035185185 0.009259259 0.003703704
##
## $mids
## [1] 47.5 52.5 57.5 62.5 67.5 72.5 77.5 82.5 87.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $weight_kg
## $breaks
## [1] 0 2 4 6 8 10 12 14
##
## $counts
## [1] 2 10 27 31 24 9 5
##
## $density
## [1] 0.009259259 0.046296296 0.125000000 0.143518519 0.111111111 0.041666667
## [7] 0.023148148
##
## $mids
## [1] 1 3 5 7 9 11 13
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $condition
## $breaks
## [1] 2 4 6 8 10 12 14 16 18
##
## $counts
## [1] 1 9 17 26 25 20 6 4
##
## $density
## [1] 0.00462963 0.04166667 0.07870370 0.12037037 0.11574074 0.09259259 0.02777778
## [8] 0.01851852
##
## $mids
## [1] 3 5 7 9 11 13 15 17
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
Likely your hiistrograms print without the variable name in the title, which we probably want to know what we are looking at
We can accomplish this with imap
First figure out how you would add a main title to a single plot
Once you’ve done that try adapting this for purrr
bobcat_data$necropsy %>%
# select only numeric vars
select(is.numeric) %>%
# apply purrr::map
imap(~.x %>%
#provide histograms
hist(main = paste('Histogram of', .y))
)
## $rearfoot_cm
## $breaks
## [1] 11 12 13 14 15 16 17 18 19
##
## $counts
## [1] 2 1 10 17 25 31 16 6
##
## $density
## [1] 0.018518519 0.009259259 0.092592593 0.157407407 0.231481481 0.287037037
## [7] 0.148148148 0.055555556
##
## $mids
## [1] 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $tail_cm
## $breaks
## [1] 7 8 9 10 11 12 13 14 15 16 17
##
## $counts
## [1] 1 1 6 8 18 26 20 21 6 1
##
## $density
## [1] 0.009259259 0.009259259 0.055555556 0.074074074 0.166666667 0.240740741
## [7] 0.185185185 0.194444444 0.055555556 0.009259259
##
## $mids
## [1] 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $ear_cm
## $breaks
## [1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
##
## $counts
## [1] 2 5 17 15 30 16 9 3
##
## $density
## [1] 0.04123711 0.10309278 0.35051546 0.30927835 0.61855670 0.32989691 0.18556701
## [8] 0.06185567
##
## $mids
## [1] 4.25 4.75 5.25 5.75 6.25 6.75 7.25 7.75
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $`body_w/tail_cm`
## $breaks
## [1] 55 60 65 70 75 80 85 90 95 100 105
##
## $counts
## [1] 3 2 5 14 22 20 23 13 5 1
##
## $density
## [1] 0.005555556 0.003703704 0.009259259 0.025925926 0.040740741 0.037037037
## [7] 0.042592593 0.024074074 0.009259259 0.001851852
##
## $mids
## [1] 57.5 62.5 67.5 72.5 77.5 82.5 87.5 92.5 97.5 102.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $body
## $breaks
## [1] 45 50 55 60 65 70 75 80 85 90
##
## $counts
## [1] 3 2 11 16 26 24 19 5 2
##
## $density
## [1] 0.005555556 0.003703704 0.020370370 0.029629630 0.048148148 0.044444444
## [7] 0.035185185 0.009259259 0.003703704
##
## $mids
## [1] 47.5 52.5 57.5 62.5 67.5 72.5 77.5 82.5 87.5
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $weight_kg
## $breaks
## [1] 0 2 4 6 8 10 12 14
##
## $counts
## [1] 2 10 27 31 24 9 5
##
## $density
## [1] 0.009259259 0.046296296 0.125000000 0.143518519 0.111111111 0.041666667
## [7] 0.023148148
##
## $mids
## [1] 1 3 5 7 9 11 13
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
##
## $condition
## $breaks
## [1] 2 4 6 8 10 12 14 16 18
##
## $counts
## [1] 1 9 17 26 25 20 6 4
##
## $density
## [1] 0.00462963 0.04166667 0.07870370 0.12037037 0.11574074 0.09259259 0.02777778
## [8] 0.01851852
##
## $mids
## [1] 3 5 7 9 11 13 15 17
##
## $xname
## [1] "."
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
Purrr also has a handy function to row-bind and column-bind data when you read it in. This is particularly useful when working with large data sets or data collected over several years.
For example if I had two years of bobcat necropsy data that were entered and saved as separate csv files, I could read them each in individually and then rowbind them together and save this new data frame to my environment to work with later as I’ve done below
# read 2019 data
bobcat_necropsy_2019 <- read_csv('data/raw/sample_bobcat_necropsy_data_2019.csv')
## Rows: 67 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): Bobcat_ID#, NecropsyDate, Dissector, County, Township, CollectionD...
## dbl (4): Necropsy, Month, Coordinates_N, Coordinates_W
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# read 2020 data
bobcat_necropsy_2020 <- read_csv('data/raw/sample_bobcat_necropsy_data_2020.csv')
## Rows: 44 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (12): Bobcat_ID#, NecropsyDate, Dissector, County, Township, CollectionD...
## dbl (10): Necropsy, Month, Coordinates_N, Coordinates_W, RearFoot_cm, Tail_c...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bobcat_necropsy_rbind <- rbind(bobcat_necropsy_2019,
bobcat_necropsy_2020)
head(bobcat_necropsy_rbind)
## # A tibble: 6 × 22
## `Bobcat_ID#` Necropsy NecropsyDate Dissector County Township CollectionDate
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 5/18/01 1 3/6/19 SP,_MD Athens Dover 12/30/18
## 2 5/18/02 2 3/9/19 SP_MD_CH Athens Canaan 12/14/18
## 3 58-18-03 3 3/10/19 MD_CH Morgan Hower 12/7/18
## 4 64-19-04 4 3/13/19 MD_CH_HK Perry Reading 2/10/19
## 5 34-15-05 5 3/23/19 MD_CH_JG Harrison Monroe 1/11/15
## 6 71-19-06 6 3/24/19 MD_CH Ross Jefferson 3/7/19
## # ℹ 15 more variables: Month <dbl>, Coordinates_N <dbl>, Coordinates_W <dbl>,
## # ApproxAge <chr>, Age <chr>, Sex <chr>, Fecundity_Females <chr>,
## # RearFoot_cm <chr>, Tail_cm <chr>, Ear_cm <chr>, `Body_w/Tail_cm` <chr>,
## # Body <chr>, Weight_kg <chr>, Condition <chr>, Notes <chr>
As we can see by looking at the objects in our environment or viewing the data this indeed joined the 2019 data with 2020. But there is a much faster way, especially if you have lots of data files.
Here we use the map_dfr()
function in Purrr to
simultaneously rbind our data frames when we read them in. This code
mimics how you would read in multiple data frames into a list except the
function will automatically try to rowbind them instead.
For this to work the columns in the data frame must have the same column type (e.g., character, number, factor etc.) So I’ve also added code to specify how to read in the various columns otherwise we will get an error message.
bobcat_data_dfr <- file.path('data/raw',
# provide the names of each data frame
c('sample_bobcat_necropsy_data_2019.csv',
'sample_bobcat_necropsy_data_2020.csv')) %>%
# use purrr::map to read in all data at once and rowbind them
map_dfr(~.x %>%
read_csv(.,
col_types = cols(RearFoot_cm = col_number(),
Tail_cm = col_number(),
Ear_cm = col_number(),
'Body_w/Tail_cm' = col_number(),
Body = col_number(),
Weight_kg = col_number(),
Condition = col_number(),
.default = col_factor())))
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
Now instead of individual data frames or a list object that we have to rbind, in one chunk of code we’ve read in the data and rbdin it so we have a single data frame to work with!
If you have several data files that you want to join via column-bind
you can also use map_dfc()
with similar coding as above to
join data frames in this way. But be careful, this function
assumes that the rows in each file are in the same order and nothing is
missing or mismatched, so you can easily get errors if that isn’t the
case. The join functions we covered earlier are a much
safer option because you can specify a ‘key’ to ensure the rows are
matched up properly, but it does take more coding.
We’ve only barely scratched the surface of what Purrr can do, but given our limited time that is where we will end. Just remember anytime you find yourself repeating the same operations for multiple objects of the same type you may want to consider using Purrr instead to reduce repetition in your code.