Today we’re training how to handle missing values in a data set. Before starting the exercises, please first read section 2.5 of An Introduction to R.
Solutions are available here.
Exercise 1
If X <- c (22,3,7,NA,NA,67)
what will be the output for the R statement length(X)
Exercise 2
If X = c(NA,3,14,NA,33,17,NA,41)
write some R code that will remove all occurrences of NA in X.
a. X[!is.na(X)]
b. X[is.na(X)]
c. X[X==NA]= 0
Exercise 3
If Y = c(1,3,12,NA,33,7,NA,21)
what R statement will replace all occurrences of NA with 11?
a. Y[Y==NA]= 11
b. Y[is.na(Y)]= 11
c. Y[Y==11] = NA
Exercise 4
If X = c(34,33,65,37,89,NA,43,NA,11,NA,23,NA)
then what will count the number of occurrences of NA in X?
a. sum(X==NA)
b. sum(X == NA, is.na(X))
c. sum(is.na(X))
Exercise 5
Consider the following vector W <- c (11, 3, 5, NA, 6)
Write some R code that will return TRUE
for value of W
missing in the vector.
Exercise 6
Load ‘Orange’ dataset from R using the command data(Orange)
. Replace all values of age=118
to NA.
Exercise 7
Consider the following vector A <- c (33, 21, 12, NA, 7, 8)
.
Write some R code that will calculate the mean of A without the missing value.
Exercise 8
Let:
c1 <- c(1,2,3,NA)
;
c2 <- c(2,4,6,89)
;
c3 <- c(45,NA,66,101)
.
If X <- rbind (c1,c2,c3, deparse.level=1)
, write a code that will display all rows with missing values.
Exercise 9
Consider the following data obtained from df <- data.frame (Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, 56, 60), Price = c(34, 52, 21, 44, 20), stringsAsFactors = FALSE)
Write some R code that will return a data frame which removes all rows with NA values in Name
column
Exercise 10
Consider the following data obtained from df <- data.frame(Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, NA, 60), Price = c(34, 52, 33, 44, NA), stringsAsFactors = FALSE)
Write some R code that will remove all rows with NA values and give the following output
Name Sales Price
2 Joseph 18 52
3 Martin 21 33
Answers for those of us to ignorant to determine correct answers? Do they exist? If so, where? Got stuck on NAA’s questions 9 & 10
For some reason when copy/pasting the df <-… section into RStudio it doesn't work, but when you type it in there's no problem, must be some formatting issue. Not sure if that was your question? And you've seen the link to the solutions in the beginning of the page, right?
The double quotes seem to be smart-quotes.
Copy the df section in a text editor and replace the double quotes with standard double quotes
df[complete.cases(df$Name),] should help you out for 9 & 6
is the Orange dataset anywhere to be found?
Use the command given in the instruction: data(Orange) . If you would like to see the data, type Orange. It will be displayed.
How to view this data? I tried command view(Orange) and received an error “view doesn’t exist”. Also, I tried same command with iris data, same error. Then, I created 2 tables and use same command, no issues. It worked. Not sure what I am doing wrong.
How to impute missing values dynamically
https://predictivehacks.com/how-to-impute-missing-values-in-r/