Below are some exercises for your first stats assignment based on the material we just covered. For all exercises the results will be available on this website after I’ve graded the assignments. These are meant to test your knowledge of the material we’ve covered and help you learn to work with real data. These are meant to mimic working in R in the real world and you will have to modify code from the module and possibly learn to use a new function to complete these assignment, just like in real life. Remember you can always look up information for a particular function using ?function nameto open the help window.

Submission

Submit you R script as a .R (or .Rmd if using markdown) file to Brightspace

Grading

You should always be following best coding practices (see Intro to R module 1) but especially for assingment submissions.

To receive full credit for each assignment

  • Please make sure each problem has its own header so that I can easily navigate to your answers
  • Ensure you have comments that explain what you are doing
  • Long code chunks should be broken up with spaces and comments to explain what is happening at each step
  • Object names should be lowercase and short but descriptive enough that they aren’t confused with other objects (For example data1 and data2 are not good names for dataframes you are working with)
  • Just because your code runs doesn’t mean it did what you think it did, always check your data/objects to ensure any functions were performed correctly (there are several ways to do this)

1 Create a vector

Create a vector called ‘myvec’ using any of the methods you learned with numbers 1 to 10. Note there are multiple ways to do this.

# (1 pt)

# answer 1 (the most parsimonious) using : to specify the range of numbers for a vector
myvec <- 1:10

# answer 2 using the c() function to provide the range of numbers
myvec <- c(1, 10)

# answer 3 (time consuming) using the c() function to type out all the numbers for the vector
myvec <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# answer 4 (unecessarily complicated for this exercise but works) using the seq() function which is great for many things but probably overkill for this question
myvec <- seq(1, 10, by = 1)


# print the vector
myvec
##  [1]  1  2  3  4  5  6  7  8  9 10

2 Create a matrix with rbind()

Create a 3 row by 2 column matrix named ‘mymat’. Use the rbind() function to bind the following three rows/vectors together:

c(1,4) 
c(2,5) 
c(3,6)
# (1 pt)

# create matrix with rbind, using the c() function inside rbind() avoid assigning extra objects to the environemnt
mymat <- rbind(
  c(1,4),
  c(2,5),
  c(3,6)
)

# print matrix
mymat
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

3 Extracting objects from data

Get the names of columns in the data frame you created earlier ‘mydf’. Hint see the R functions to explore data section. Then extract all rows for column 5 by name, do the same thing using the element position e.g. []

# code to create mydf from the lab script

# create a data.frame object and assign to environemnt as my.data
my.data <- data.frame(
  
  # add observation ids with numbers 1 to 100
  Obs.Id = 1:100,
  
  # add treatment with repeating characters A to E to create 5 treatment groups with 20 observations each
  Treatment = rep(c("A","B","C","D","E"),
                  each = 20),
  
  # create a block variable with numbers 1:20 that repeats 5 times (1 for each treatemnt group)
  Block = rep(1:20,
              times = 5),
  
  # create germination variable that is a random number drawn from the poisson distribution with specified means for each group
  Germination = rpois(100,
                      # lamda specifies the mean for the draws in this case group 1 mean = 1, group 2 mean =5 etc.
                      lambda = rep(c(1,5,4,7,1),
                                   # this specifies the number of draws so 20 random draws from each of the means specified above
                                   each = 20)),  
  
  # create average height variable with 100 draws from the normal distribution
  AvgHeight = rnorm(100,
                    
                    # the mean specifies what the mean of normal distibution will be when it draws numbers from it, here we've specified 5 different means for the 5 groups
                    mean = rep(c(10,30,31,25,35),
                               # it will do 20 draws from each of the means specified above
                               each = 20))
)

# subset data to just rows 21:30 with all columns of data
mydf <-  my.data[21:30, ]  

# answer for problem 1

# (3 pts)

# print the column names for mydf
names(mydf)
## [1] "Obs.Id"      "Treatment"   "Block"       "Germination" "AvgHeight"
# get all rows for the average height column using the column name 
mydf$AvgHeight
##  [1] 30.30209 27.91465 29.75724 30.17672 29.41516 29.08773 31.55113 31.74733
##  [9] 28.74467 29.91922
# get all rows for the average height column using the columns position in the data frame (e.g., 5)
mydf[ , 5]
##  [1] 30.30209 27.91465 29.75724 30.17672 29.41516 29.08773 31.55113 31.74733
##  [9] 28.74467 29.91922

4 Create a matrix

Create a new matrix called ‘mymat2’ that includes all the data from columns 3 to 5 of data frame mydf. HINT: use the as.matrix() function to coerce a data frame into a matrix. Since we didn’t cover this function you may need to look it up in the help files.

Note your values for some columns may be slighly different since the code to create mydf uses random number generators.

# (1 pt)

# create matrix of all rows for columns 3 to 5 of mydf
mymat2 <- as.matrix(mydf[ , 3:5])

# print matrix
mymat2
##    Block Germination AvgHeight
## 21     1           6  30.30209
## 22     2           4  27.91465
## 23     3           7  29.75724
## 24     4           5  30.17672
## 25     5           6  29.41516
## 26     6           5  29.08773
## 27     7           4  31.55113
## 28     8           4  31.74733
## 29     9           3  28.74467
## 30    10           6  29.91922

5 Create a list

Create a list named ‘mylist’ that is composed of a
- vector: 1:3,
- a matrix: matrix(1:6, nrow = 3, ncol = 2),
- and a data frame: data.frame(x =c (1, 2, 3), y = c(TRUE, FALSE, TRUE), z = c(“a”, “a”, “b”)).

#create an empty list
mylist <- list()

# add a vector of 1 to 3 to the list
mylist[[1]] <- 1:3

# add a matrix to the list
mylist[[2]] <- matrix(1:6,
                      nrow = 3,
                      ncol = 2)

# add a data frame to the list
mylist[[3]] <- data.frame(x = c(1, 2, 3),
                          y = c(TRUE, FALSE, TRUE),
                          z = c("a", "a", "b"))

# print the list
mylist
## [[1]]
## [1] 1 2 3
## 
## [[2]]
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
## 
## [[3]]
##   x     y z
## 1 1  TRUE a
## 2 2 FALSE a
## 3 3  TRUE b

6 Extracting objects from lists

Extract the second and third observation from the 1st column of the data frame in ‘mylist’ (the list created above).

# multiple ways to do this

# answer 1 -call mylist then reference the position of the data frame in the list [[3], then the rows and columns you want from that element [2:3, 1]
mylist[[3]][2:3, 1]
## [1] 2 3
# answer 2 call mylist then reference the position of the data frame in the list [[3]], then the column in the data frame [[1]], and finally the observations within that columns c(2, 3)
mylist[[3]][[1]][c(2, 3)]
## [1] 2 3