Below are some exercises for your first stats assignment based on the
material we just covered. For all exercises the results will be
available on this website after I’ve graded the assignments. These are
meant to test your knowledge of the material we’ve covered and help you
learn to work with real data. These are meant to mimic working in R in
the real world and you will have to modify code from the module and
possibly learn to use a new function to complete these assignment, just
like in real life. Remember you can always look up information for a
particular function using ?function name
to
open the help window.
Submit you R script as a .R (or .Rmd if using markdown) file to Brightspace
You should always be following best coding practices (see Intro to R module 1) but especially for assingment submissions.
To receive full credit for each assignment
Create a vector called ‘myvec’ using any of the methods you learned with numbers 1 to 10. Note there are multiple ways to do this.
# (1 pt)
# answer 1 (the most parsimonious) using : to specify the range of numbers for a vector
myvec <- 1:10
# answer 2 using the c() function to provide the range of numbers
myvec <- c(1, 10)
# answer 3 (time consuming) using the c() function to type out all the numbers for the vector
myvec <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# answer 4 (unecessarily complicated for this exercise but works) using the seq() function which is great for many things but probably overkill for this question
myvec <- seq(1, 10, by = 1)
# print the vector
myvec
## [1] 1 2 3 4 5 6 7 8 9 10
rbind()
Create a 3 row by 2 column matrix named ‘mymat’. Use the
rbind()
function to bind the following
three rows/vectors together:
c(1,4)
c(2,5)
c(3,6)
# (1 pt)
# create matrix with rbind, using the c() function inside rbind() avoid assigning extra objects to the environemnt
mymat <- rbind(
c(1,4),
c(2,5),
c(3,6)
)
# print matrix
mymat
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
Get the names of columns in the data frame you created earlier
‘mydf’. Hint see the R functions to explore data section. Then
extract all rows for column 5 by name, do the same thing using the
element position e.g. []
# code to create mydf from the lab script
# create a data.frame object and assign to environemnt as my.data
my.data <- data.frame(
# add observation ids with numbers 1 to 100
Obs.Id = 1:100,
# add treatment with repeating characters A to E to create 5 treatment groups with 20 observations each
Treatment = rep(c("A","B","C","D","E"),
each = 20),
# create a block variable with numbers 1:20 that repeats 5 times (1 for each treatemnt group)
Block = rep(1:20,
times = 5),
# create germination variable that is a random number drawn from the poisson distribution with specified means for each group
Germination = rpois(100,
# lamda specifies the mean for the draws in this case group 1 mean = 1, group 2 mean =5 etc.
lambda = rep(c(1,5,4,7,1),
# this specifies the number of draws so 20 random draws from each of the means specified above
each = 20)),
# create average height variable with 100 draws from the normal distribution
AvgHeight = rnorm(100,
# the mean specifies what the mean of normal distibution will be when it draws numbers from it, here we've specified 5 different means for the 5 groups
mean = rep(c(10,30,31,25,35),
# it will do 20 draws from each of the means specified above
each = 20))
)
# subset data to just rows 21:30 with all columns of data
mydf <- my.data[21:30, ]
# answer for problem 1
# (3 pts)
# print the column names for mydf
names(mydf)
## [1] "Obs.Id" "Treatment" "Block" "Germination" "AvgHeight"
# get all rows for the average height column using the column name
mydf$AvgHeight
## [1] 30.30209 27.91465 29.75724 30.17672 29.41516 29.08773 31.55113 31.74733
## [9] 28.74467 29.91922
# get all rows for the average height column using the columns position in the data frame (e.g., 5)
mydf[ , 5]
## [1] 30.30209 27.91465 29.75724 30.17672 29.41516 29.08773 31.55113 31.74733
## [9] 28.74467 29.91922
Create a new matrix called ‘mymat2’ that includes all the data from
columns 3 to 5 of data frame mydf. HINT: use the
as.matrix()
function to coerce a data frame into a matrix.
Since we didn’t cover this function you may need to look it up in the
help files.
Note your values for some columns may be slighly different since the code to create mydf uses random number generators.
# (1 pt)
# create matrix of all rows for columns 3 to 5 of mydf
mymat2 <- as.matrix(mydf[ , 3:5])
# print matrix
mymat2
## Block Germination AvgHeight
## 21 1 6 30.30209
## 22 2 4 27.91465
## 23 3 7 29.75724
## 24 4 5 30.17672
## 25 5 6 29.41516
## 26 6 5 29.08773
## 27 7 4 31.55113
## 28 8 4 31.74733
## 29 9 3 28.74467
## 30 10 6 29.91922
Create a list named ‘mylist’ that is composed of a
- vector: 1:3,
- a matrix: matrix(1:6, nrow = 3, ncol = 2),
- and a data frame: data.frame(x =c (1, 2, 3), y = c(TRUE, FALSE, TRUE),
z = c(“a”, “a”, “b”)).
#create an empty list
mylist <- list()
# add a vector of 1 to 3 to the list
mylist[[1]] <- 1:3
# add a matrix to the list
mylist[[2]] <- matrix(1:6,
nrow = 3,
ncol = 2)
# add a data frame to the list
mylist[[3]] <- data.frame(x = c(1, 2, 3),
y = c(TRUE, FALSE, TRUE),
z = c("a", "a", "b"))
# print the list
mylist
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
##
## [[3]]
## x y z
## 1 1 TRUE a
## 2 2 FALSE a
## 3 3 TRUE b
Extract the second and third observation from the 1st column of the data frame in ‘mylist’ (the list created above).
# multiple ways to do this
# answer 1 -call mylist then reference the position of the data frame in the list [[3], then the rows and columns you want from that element [2:3, 1]
mylist[[3]][2:3, 1]
## [1] 2 3
# answer 2 call mylist then reference the position of the data frame in the list [[3]], then the column in the data frame [[1]], and finally the observations within that columns c(2, 3)
mylist[[3]][[1]][c(2, 3)]
## [1] 2 3