34 R Questions you must prepare before Data Science Interview
- What are the data structure in R which helps in statistical analysis and graphical representation?
Ans.) The following are the data structure in R which are widely used:-
a.) Array
b.)Matrix
c.)Vector
d.) Data frame
e.) List
f.) Tables - What is class() function in R?
Ans.) This is a very important function in R which is a character vector giving the names of the classes from which the object inherits.
Example. > x<- 1:10
> class(x)
[1] “integer” - What is a vector?
Ans.) A vector is a sequence of data elements of the same basic type. Members in a vector are called components.
Example. >vector_example<- c(2,3,4,5)
> print(vector_example)
[1] 2,3,4,5 > print(length(vector_exmple)
[1] 4 - How can you combine 2 vectors?
Ans.) Vectors can be combined from 2 to 1 by using the c() function
Example.
> first <- c(1,2,3,4)
> second <- (“a”, “b”, “c”)
> third <- c(first, second)
> print(third)
[1] “1” “2” “3” “4” “a” “b” “c”
The numbers are also shown in the double quote, this is done to maintain the same primitive data type for the new vector being created J - How to perform arithmetic operations on Vectors? Show with some example
Ans.) There are many arithmetic operators which are being used in R. Remember, R uses the operators component by component. Let’s look at it with some common operators. >x <- c(1,2,3,4)
>y<- c(4,5,6,7)
>x+y
[1] 5 7 9 11
>x-y
[1] -3 -3 -3 -3
>z <- (4,4,4,4,4,4,4)
>x+z
[1] 5 6 7 8 5 6 7
When you have 2 vectors with unequal length and you need to perform an operation on both, then the shorter vector will be used again and again to match the length of both the vectors - Define Index in Vector?
Ans.) Vector in index is used to give the element at that position of the vector. Few programming language starts the index with 0 and other starts with 1. R counts the index from 1. There are many possibilities while putting an index number i.e.
a. positive and in range index
> x<- (1,3,4,5)
>x[2]
[1] “3” b. out of range
> x <- (2,3,4,5)
> x[110]
[1] NA c. negative index – It removes this element and replies back with all the left numbers
>x<- (3,4,5,6,7)
>x[-3]
[1] “3” “4” “6” “7” d. range of values
>x <- (3,4,5,6,7,8)
>x[2:5]
[1] “4” “5” “6” “7” e. duplicate index
> x<- (3,4,5,6,7)
> s[c(2,1,2,3)]
[1] “4” “3” “4” “5” f. logical index – If you want to select a particular group of index number, then you should use logical operators i.e. TRUE and FALSE
>x<- (2,3,4,5,6)
> s[c(TRUE,FALSE,FALSE,TRUE,TRUE)
[1] “2” “5” “6” - A list, as the name suggests is a number of vectors collected together. Suppose, you have a number vector, a character vector, a Boolean vector and some numbers. You want to combine it into one which obviously won’t have the same data type. So you need to create a list
> n = c(2, 3, 5)
> s = c(“a”, “b”, “c”, “d”, “e”)
> b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
> x = list(n, s, b, 3)
> print(x)
[[1]]
[1] 2 3 5 [[2]]
[1] “a” “b” “c” “d” “e” [[3]]
[1] TRUE FALSE TRUE FALSE FALSE [[4]]
[1] 3 - What is a Matrices ?
Ans. ) A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.
Example.
# Matrix creation
> M=matrix(c(1,2,3,4,5,6), nrow=2, ncol=3, byrow=TRUE)
print(M)
[1] [2] [3]
[1] 1 2 3
[2] 4 5 6 where
nrow = number of rows in the matrix
ncol = number of columns in the matrix
byrow = TRUE/FALSE will get you value first by row or column - What is an Array?
Ans.) Array is a super set of Matrices. On one hand the matrices can be of 2 dimension but array can be of any number of dimensions.
Example.
> a <- array(c(“car”,”bike”), dim(3,3,2))
> print(a) , , 1
[,1] [,2] [,3]
[1,] “car” “bike” “car”
[2,] “bike” “car” “bike”
[3,] “car” “bike” “car”
, , 2
[,1] [,2] [,3]
[1,] “bike” “car” “bike”
[2,] “car” “bike” “car”
[3,] “bike” “car” “bike” >my_array<- array(1:24, dim=c(3,4,2))
>my_array
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24 - What is a factor?
Ans.) Factors are the r-objects which are created using a vector. Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values. It stores the vector along with the distinct values of the elements in the vector as labels.
Factors are created using the factor() function. The nlevelsfunctions gives the count of levels. Example
#First let’s create a vector
>vector_example<- c(‘a’,’b’,’c’,’a’,’a’)
#Now create a factor object
>factor_example<- factor(vector_example)
>print(factor_example)
[1] a b c a a
>print(nlevels(factor_example))
[1] 3 nlevels gives you the number of distinct values in the vector. - What is the difference between Matrix and an array ?
Ans.) Matrix can have only 2 dimensions where as an array can have as many dimensions as you want. Matrix is defined with the help of data, number of rows, number of columns and whether the elements are to be put in row wise or column wise.
In array you need to give the dimension of the array. An array can be of any number of dimensions and each dimension is a matrix. For example a 3x3x2 array represents 2 matrices each of dimension 3×3. - What is a data frame?
Ans.) Data frame is a list of vectors of equal length. It can consist of any vector with a particular data type and can combine it into one. So, a data frame can have a vector of logical and another of numeric. The only condition being that all the vectors should have the same length.
Example. #This is how the data frame is created
>student_profile<- data.frame(
name<-c(“Amit”, “Sumit”, “Ajay”)
age <- c(22,23,24)
class <- c(6,7,8)
)
print(student_profile) The above code will create 3 columns with the column name as name, age and class. - What is the difference between a matrix and a dataframe?
Ans.)A dataframe can contain vectors with different inputs and a matrix cannot. (You can have a dataframe of characters, integers, and even other dataframes, but you can’t do that with a matrix.A matrix must be all the same type.)
So, the data frame can have different vector of character, numbers, logical, etc. and it is still cool. But, for matrix you need only one type of data type. Phewww !! - Define repeat loop.
Ans. ) Repeat loop executes a sequence of statement multiple times. It don’t put the condition at the same place where we put the keyword repeat.
Example > name <- c(“Pappu”, “John”)
> temp <- 5
> repeat {
print(name)
temp <- temp+2 if(temp > 11) {
break
}
} So, this will return the name vector 4 times. First it prints the name and then increase the temp to 7 and so on. - Define while loop.
Ans.)In the while loop the condition is tested and the control goes into the body only when the condition is trueExample
> name <- c(“Pappu”, “John”)
> temp <- 5
> repeat (temp<11) {
print(name)
temp <- temp+2
} The name will be printed 4 times - Define the for loop.
Ans.)The for loop are not limited to integers. You can pass character vectors, logical vectors, lists or expressions.
Example.
> x<- LETTERS[1:2]
for ( i in x) {
print(i)
}
[1] “A”
[2] “B” - What is the use of sort() function? How to use the function to sort in descending order?
Ans.) Elements in a vector can be sorted using the function sort()
Example. > temp <- c(3,5,2,6,7,1)
>sort_temp<- sort(temp)
> print(sort_temp)
[1] 1 2 3 5 6 7
>rev_sort<- sort(temp, decreasing = TRUE)
[1] 7,6,5,3,2,1 This function also works with the words. - Create a list which holds a vector, a matrix and a list.
Ans.) example_list<- list(c(“Kamal”,”Nitin”), matrix(c(1,2,3,4,5,6), nrow = 2), list(“red”,1)) - Determine the output of the following function f(2).
b <- 4
f <- function(a)
{
b <- 3
b^3 + g(a)
}
g <- function(a)
{
a*b
}
Ans.) The global variable b has a value 4. The function f has an argument 2 and the function’s body has the local variable b with the value 3. So function f(2) will return 3^3 + g(2) and g(2) will give the value 2*4 = 8 where 4 is the value of b.
Thus, the answer is 35 - What is the output of runif(10)?
Ans.)runif() function is used to generate random values and the argument gives the number of values required. So the above function will generate 10 random values between 0 and 1. - Get all the data of the person having maximum salary.
Ans.)
max_salary_person<- subset(data, salary == max(salary))
print(max_salary_person) - Get all the people who works with TCS and have salary more than 300000
Ans.)
TCS_data_salary<- subset(data, company == “TCS” & salary > 300000) - How is data reshaping done in R?
Ans.) Data reshaping involves various techniques which is used according to the need. It’s not a procedure you need to follow, but independent methods to remould the data set. Following are the methods used:-
a. cbind()
b. rbind()
c. new_column_name.data_frame_name
d. merge()
e. melt()
f. cast() - How to get outer join, left join, right join, inner join and cross join?
Ans.) outer join – merge(x=df1, y=df2, by = “id”, all = TRUE)
left join – merge(x=df1, y=df2, by=”id”, all.x = TRUE)
right join – merge(x=df1, y=df2, by = “id”, all.y = TRUE)
inner join – merge(x=df1, y=df2, by = “id”)
cross join – merge(x=df1, y=df2, by = NULL) - When you are reshaping the data, you sometimes need to melt the data. Explain melt() function
Ans.) Suppose you have a data set which havecompany_name, age, salary, children. So when you want to have the data where you need the data grouped by company_name and then under company_name grouped by age. This whole process is called melting the data and it is performed with melt() function
Example. new_data_set.previous_data_set<- melt(previous_data_set, id=c(“company_name”,”age”)) - What is lapply() function in R?
Ans.)lapply() function is used when you want to apply a function to each element of a list in turn and get a list back.
Example.
x<- list(a=1, b=1:3, c=10:100)
lapply(x,FUN=length)
$a
[1] 1
$b[1] 3$c
[1] 91
You can use other functions like max, min, sum, etc. - What is sapply() function in R?
Ans.)sapply() function is used when you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
Vector is useful sometimes because it will get you a set of values and you can easily perform an operation on it.
Example.
x <-list(a =1, b =1:3, c =10:100)#Compare with above; a named vector, not a list sapply(x, FUN = length)a b c 1391 sapply(x, FUN = sum)a b c 165005 - What is the difference between lapply and sapply?
Ans.)If the programmers want the output to be a data frame or a vector, then sapply function is used whereas if a programmer wants the output to be a list then lapply is used - How to apply mean function in R?
Ans.) Mean is calculated by taking the sum of numbers and dividing it with the total number of elements. The function mean() is used to apply this in R.
Syntax
mean(x, trim=0,na.rm=FALSE) The mean() function have 3 arguments
a.) x contains the vector on which mean is to be applied
b.) trim = 0, It is used to drop some observations from each end of the sorted array.
c.) na.rm is used to remove the missing values from the input vector If there are missing values in the vector then mean will return NA as a result, so in order to drop the missing values to get a mean, you should put na.rm=TRUE which means remove the missing values. - How to make scatterplot in R?
Ans.) Scatterplot is a graph which shows many points plotted in the Cartesian plane. Each point holds 2 values which are present on the x and y axis. The simple scatterplot is plotted using plot() function.
The syntax for scatterplot is:-plot(x, y ,main, xlab, ylab, xlim, ylim, axes)
Where
x is the data set whose values are the horizontal coordinates
y is the data set whose values are the vertical coordinates
main is the tile in the graph
xlab and ylab is the label in the horizontal and vertical axis
xlim and ylim are the limits of values of x and y used in the plotting
axes indicates whether both axis should be there on the plot plot(x =input$wt,y=input$mpg,xlab=”Weight”,ylab=”Milage”,xlim= c(2.5,5),ylim= c(15,30),main=”Weight vsMilage”) - Bonus Question
How to write a countdown function in R?
Ans.)
timer<- function(time){print(time)while(time!=0) {Sys.sleep(1)time<- time – 1print(time) }}countdown(5)
[1] 5
[2] 4
[3] 3
[4] 2
[5] 1 - Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[1]?
Ans.)
Error - Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[[1]]?
Ans.)
[1] 5 12 21 32 - What are some of the functions that R have?
Ans.)The functions present in R are:-
a. Mean
b. Median
c. Distribution
d. Covariance
e. Regression
f. GAM
g. GLM
h. Non-linear
i. Mixed Effects etc
i. Amazon
ii. Amazon India
Or you can mail me at [email protected] and get your copy free 🙂