Import and Export into Googlesheet and AWS using R

Import and Export Googlesheet in R using GoogleSheet4 package
Recently I was working to sync some data from R to AWS S3 bucket and Googlesheet.

Problem Statement – R has retired the package googlesheet package and now you don’t have to use google authenticator each time you sync a sheet in R for any reporting.
Import and Export Googlesheet and AWS S3 bucket

Import and Export Googlesheet and AWS S3 bucket



Solution – Use googlesheet4 package

Problem 2 – There is a very little documentation for googlesheet4 package

Lets understand Import and Export Googlesheet and AWS S3 bucket in R using GoogleSheet4 package.

Here is a tried and tested solution from my side πŸ™‚

Below are the packages you need for Google sheet

install.packages("googlesheets4")
library(googlesheets4)
library(googlesheets)
install.packages("RCurl")
library(RCurl)
Master_Sheet <- read_sheet("https://docs.google.com/spreadsheets/******",sheet = 'All')

url <- getURL(Master_Sheet)

Once you read the sheet, you can do your computation. Make sure that the final set which you want to upload in your Google sheet or AWS S3 Bucket is in a data frame

Creating my final set
Marketing_Cost <- rbind(A,B,S,C,V,A)

Rule 1
Change everything into a character, this is where you might get frustrated

Marketing_Campaign_Cost <- apply(Marketing_Campaign_Cost,2,as.character)

Rule 2
Convert the above in a data frame

MCC <- data.frame(Marketing_Campaign_Cost)

Rule 3
If you want to create a new sheet every time you run a code, then use
gs4_create
But rarely you would do so, most of the time you will be updating a sheet

xx <- write_sheet(
MCC,
ss = "https://docs.google.com/spreadsheets/d/TargetsheetLink",
sheet = "Global")

The above will write your data frame in your desired Googlesheet.
You can also write it in your local

write.csv(MCC,file=”/Users/oyo/abc.csv”)

What about AWS?

library(aws.s3)
Sys.setenv("AWS_ACCESS_KEY_ID" = "Your AWS Key ID","AWS_SECRET_ACCESS_KEY" = "Your AWS Secret Key Id")

s3write_using(MCC, FUN = write.csv,bucket = "Database Name", object = toString(paste("Link to folder/abc.csv",sep="")))

This way you will be able to import and export Googlesheet and AWS in R

Try to solve these questions, other members will evaluate your answer and provide sufficient support.

Walmart Labs-Β https://thedatamonk.com/walmart-data-analyst-questions/
American Express-Β https://thedatamonk.com/american-express-data-analyst-questions/
Barclays-Β https://thedatamonk.com/barclays-data-analyst-interview-questions/
Lowes-Β https://thedatamonk.com/lowes-data-science-interview-questions/
Factspan-Β https://thedatamonk.com/factspan-interview-questions/
Oracle-Β https://thedatamonk.com/oracle-data-analyst-interview-questions/
SAP-Β https://thedatamonk.com/sap-data-science-interview-questions/
Big Basket-Β https://thedatamonk.com/big-basket-data-analyst-interview-questions/
Swiggy – https://thedatamonk.com/swiggy-data-analyst-interview-questions/
Accenture – https://thedatamonk.com/accenture-business-analyst-interview-question/
Deloitte – https://thedatamonk.com/deloitte-data-scientist-interview-questions/
Amazon – https://thedatamonk.com/amazon-data-science-interview-questions/
Myntra – https://thedatamonk.com/myntra-data-science-interview-questions-2/
Flipkart – https://thedatamonk.com/flipkart-business-analyst-interview-questions/

The Data Monk Interview Books β€“ Don’t Miss

Now we are also available on our website where you can directly download the PDF of the topic you are interested in. At Amazon, each book costs ~299, on our website we have put it at a 60-80% discount. There are ~4000 solved interview questions prepared for you.

10 e-book bundle with 1400 interview questions spread across SQL, Python, Statistics, Case Studies, and Machine Learning Algorithms β€“ Ideal for 0-3 years experienced candidates

23 E-book with ~2000 interview questions spread across AWS, SQL, Python, 10+ ML algorithms, MS Excel, and Case Studies β€“ Complete Package for someone between 0 to 8 years of experience (The above 10 e-book bundle has a completely different set of e-books)

12 E-books for 12 Machine Learning algorithms with 1000+ interview questions β€“ For those candidates who want to include any Machine Learning Algorithm in their resume and to learn/revise the important concepts. These 12 e-books are a part of the 23 e-book package

Individual 50+ e-books on separate topics

Important Resources to crack interviews (Mostly Free)

There are a few things which might be very useful for your preparation

The Data Monk Youtube channel – Here you will get only those videos that are asked in interviews for Data Analysts, Data Scientists, Machine Learning Engineers, Business Intelligence Engineers, Analytics managers, etc.
Go through the watchlist which makes you uncomfortable:-

All the list of 200 videos
Complete Python Playlist for Data Science
Company-wise Data Science Interview Questions β€“ Must Watch
All important Machine Learning Algorithm with code in Python
Complete Python Numpy Playlist
Complete Python Pandas Playlist
SQL Complete Playlist
Case Study and Guesstimates Complete Playlist
Complete Playlist of Statistics

R Data Science Interview Questions | Day 39

Practice all the R Data Science Interview Questions. You can try to solve the same questions in Python also. Create your account by clicking on the Login area above and start practicing R Data Science Interview Questions.
Once you are good to go with these questions, you can try the questions on SQL and Statistics
SQL – https://thedatamonk.com/sql-data-science-interview-questions/
Statistics – https://thedatamonk.com/statistics-data-science-interview-questions/

R Data Science Interview Questions

SAP interview Questions | Various Variables

SAP interview Questions | Bundles

Zomato Interview Questions | Fuzzy Merging

Accenture Interview Questions | Data Frame vs Data Table

Accenture Interview Questions | What is lapply and sapply?

What are the different types of sorting algorithms available?

What is lapply() function in R?

What do you understand by element recycling?

How can you resample statistical tests in R language?

What will be the result of multiplying two vectors in R ?

How will you convert a factor variable to numeric in R language ?

Is it possible to model transitioning data like time-zone or a working directory?

Output Question | Interview Question in R | Myntra Interview Question

Get the output | R Interview Question | Zomato

Is Array a matrix or matrix an array? OLA interview Question

What do you understand by element recycling in R?

How to combine multiple vectors in one data frame in R?

How will you explain R programming to person from a non-technical background?

What is the difference between subset() and sample() function in R?

Give the output of the following function | Amazon Interview Question

What is the difference between Matrix and an array?

What is the difference between a matrix and a dataframe in R?

Explain general syntax to create Matrices in R?

The Data Monk Interview Books β€“ Don’t Miss

Now we are also available on our website where you can directly download the PDF of the topic you are interested in. At Amazon, each book costs ~299, on our website we have put it at a 60-80% discount. There are ~4000 solved interview questions prepared for you.

10 e-book bundle with 1400 interview questions spread across SQL, Python, Statistics, Case Studies, and Machine Learning Algorithms β€“ Ideal for 0-3 years experienced candidates

23 E-book with ~2000 interview questions spread across AWS, SQL, Python, 10+ ML algorithms, MS Excel, and Case Studies β€“ Complete Package for someone between 0 to 8 years of experience (The above 10 e-book bundle has a completely different set of e-books)

12 E-books for 12 Machine Learning algorithms with 1000+ interview questions β€“ For those candidates who want to include any Machine Learning Algorithm in their resume and to learn/revise the important concepts. These 12 e-books are a part of the 23 e-book package

Individual 50+ e-books on separate topics

Important Resources to crack interviews (Mostly Free)

There are a few things which might be very useful for your preparation

The Data Monk Youtube channel – Here you will get only those videos that are asked in interviews for Data Analysts, Data Scientists, Machine Learning Engineers, Business Intelligence Engineers, Analytics managers, etc.
Go through the watchlist which makes you uncomfortable:-

All the list of 200 videos
Complete Python Playlist for Data Science
Company-wise Data Science Interview Questions β€“ Must Watch
All important Machine Learning Algorithm with code in Python
Complete Python Numpy Playlist
Complete Python Pandas Playlist
SQL Complete Playlist
Case Study and Guesstimates Complete Playlist
Complete Playlist of Statistics

R interview Questions

Today is Day 10 and we will try to solve the top 10 R Interview questions. These questions will revolve around statistics and algorithms with case studies. Try to answer these R interview Questions

Before we get started, do create your profile on the website. The β€˜Login’ area is at your extreme left ?

Here are our Questions on R with a couple of case studies

R interview Questions

Amazon Case Study – https://thedatamonk.com/question/case-study-amazon-data-science-interview/
Resample in R – https://thedatamonk.com/question/how-can-you-resample-statistical-tests-in-r-language/
Multiply two vectors – https://thedatamonk.com/question/what-will-be-the-result-of-multiplying-two-vectors-in-r/
Convert Factor variable to numeric – https://thedatamonk.com/question/how-will-you-convert-a-factor-variable-to-numeric-in-r-language/
Dict and List – https://thedatamonk.com/question/dictionary-and-list/
Chi-Squared Test – https://thedatamonk.com/question/what-is-a-chi-squared-test/
Delete and Truncate – https://thedatamonk.com/question/what-is-the-difference-between-delete-and-truncate-commands/
Missing Value Treatment with Median – https://thedatamonk.com/question/missing-value-treatment-with-median/
R-Output Question – https://thedatamonk.com/question/get-the-output-r-interview-question-zomato/
Is Array a matrix or Matrix an array – https://thedatamonk.com/question/is-array-a-matrix-or-matrix-an-array-ola-interview-question/

These questions will not take more than 20 minutes to answer, but if you are able to find a solution to these questions and jot it down in the comment box then you will surely be able to answer these in the interviews.

So, Start answering ?

You can get previous days questions here

Day 1- Overview – https://thedatamonk.com/data-science-interview-question-day1/
Day 2- SQL – https://thedatamonk.com/sql-interview-questions/
Day 3- Joins in SQL –https://thedatamonk.com/joins-in-sql/
Day 4 – Statistics – https://thedatamonk.com/statistics-interview-question/
Day 5 – Machine Learning – https://thedatamonk.com/machine-learning-interview-question/
Day 6 – Forecasting – https://thedatamonk.com/forecasting-interview-questions/
Day 7 – ARIMA – https://thedatamonk.com/arima-interview-questions/
Day 8 – Python – https://thedatamonk.com/python-interview-questions/
Day 9 – Machine Learning – https://thedatamonk.com/machine-learning-interview-questions/

Try to answer all these questions by yourself or with the help of google ?

The Data Monk Interview Books β€“ Don’t Miss

Now we are also available on our website where you can directly download the PDF of the topic you are interested in. At Amazon, each book costs ~299, on our website we have put it at a 60-80% discount. There are ~4000 solved interview questions prepared for you.

10 e-book bundle with 1400 interview questions spread across SQL, Python, Statistics, Case Studies, and Machine Learning Algorithms β€“ Ideal for 0-3 years experienced candidates

23 E-book with ~2000 interview questions spread across AWS, SQL, Python, 10+ ML algorithms, MS Excel, and Case Studies β€“ Complete Package for someone between 0 to 8 years of experience (The above 10 e-book bundle has a completely different set of e-books)

12 E-books for 12 Machine Learning algorithms with 1000+ interview questions β€“ For those candidates who want to include any Machine Learning Algorithm in their resume and to learn/revise the important concepts. These 12 e-books are a part of the 23 e-book package

Individual 50+ e-books on separate topics

Important Resources to crack interviews (Mostly Free)

There are a few things which might be very useful for your preparation

The Data Monk Youtube channel – Here you will get only those videos that are asked in interviews for Data Analysts, Data Scientists, Machine Learning Engineers, Business Intelligence Engineers, Analytics managers, etc.
Go through the watchlist which makes you uncomfortable:-

All the list of 200 videos
Complete Python Playlist for Data Science
Company-wise Data Science Interview Questions β€“ Must Watch
All important Machine Learning Algorithm with code in Python
Complete Python Numpy Playlist
Complete Python Pandas Playlist
SQL Complete Playlist
Case Study and Guesstimates Complete Playlist
Complete Playlist of Statistics

100 Questions in R to crack any Data Science Interview

R is one of the two most popular Data Science programming language. If you are new to this domain, then we would always recommend you to start with R because of it’s easier installation steps, minimal version control, and libraries.

I used to write all the questions which were asked to me in my interviews. Since, I had R in my resume, questions used to revolve around functions, loops, regular expression, etc. I jotted down all the questions with their explanation and output and combined it in this book.

The book contains questions on :-
-Apply
-Plot
-Library functions
-User defined functions
-Regular Expression
-Data Type and Data Structure of R
And many more

We highly, highly recommend you to cover this book end to end in order to completely destroy interviews in R.

The book is available for free on 18th and 19th April’20 from the link below

https://www.amazon.in/dp/B0876F4JP7

Do check our other Data Science posts !!

Keep Learning πŸ™‚

The Data Monk



Janta Hack – Analytics Vidhya R code

install.packages(“stringr”)
library(stringr)

data = read.csv(“C:/Users/User/Desktop/Hackathon/JantaHack/train.csv”)
head(data)
str(data)

data$product <- str_count(data$ProductList,”;”)+1
head(data)
data$hours <- with(data, difftime(endTime,startTime,units=”hours”) )
data$min <- with(data, difftime(endTime,startTime,units=”mins”) )
data$x <- as.double(data$endTime – data$startTime, units = “mins”)

table(data$product)
hist(data$product)

table(data$gender)
count <- table(data$gender,data$product)
barplot(count)
str(data)
head(data$ET)
head(data$endTime)

date1 = as.POSIXlt(’16/12/14 14:41′,format=”%Y-%m%dT%H:%M:%S”)
date2 = as.POSIXlt(‘2015-10-05T22:43:00.000’,format=”%Y-%m-%dT%H:%M:%S”)
install.packages(“lubridate”)
library(lubridate)
year(date1)
month(date1)
day(date1)
hour(date1)

data$date <- substr(data$startTime,1,2)
head(data)

#Merge Train and Test Data Set
test <- read.csv(“C:/Users/User/Desktop/Hackathon/JantaHack/test.csv”)
df_test <- as.data.frame(append(test,list(gender=0),after = 4))
head(df_test)

data$gender_num <- ifelse()

data_x <- read.csv(“C:/Users/User/Desktop/Hackathon/JantaHack/train.csv”)
data_x$G <- ifelse(data_x$gender==’male’,1,0)
head(data_x)
data_x = subset(data_x,select=-c(gender))

data_test <- read.csv(“C:/Users/User/Desktop/Hackathon/JantaHack/test.csv”)
df_data_test <- as.data.frame(append(data_test,list(G=0),after=4))
head(df_data_test)

df_Janta <- rbind(data_x,df_data_test)
df_Janta$Product <- str_count(df_Janta$ProductList,”;”)+1
head(df_Janta)

#In trainig datset we have 8192 females and 2308 male
table(data_x$G)

#In total there has been 7934 single purchase
table(df_Janta$Product)

df_Janta$first <- substr(df_Janta$ProductList,21,6)
str(df_Janta)
df_Janta$ProductList <- as.character(df_Janta$ProductList)
df_Janta$x <- substr(df_Janta$ProductList,21,6)
df_Janta$x
str(df_Janta)

first = sapply(df_Janta$ProductList,function(x) {
if(substr(x,1,6) != ”){
return(substr(x,1,6))
}
else {

return("Null")

}

}
)

table(first)
table(first,df_Janta$G)

second = sapply(df_Janta$ProductList,function(x){

return(substr(x,7,6))

})
table(second)

train_f <- read.csv(“C:/Users/User/Desktop/Hackathon/JantaHack/train.csv”)
head(train_f)
test_f <- read.csv(“C:/Users/User/Desktop/Hackathon/JantaHack/test.csv”)
head(test_f)

str(train_f)
test_f <- as.data.frame(append(test_f,list(gender=0),after=4))
str(test_f)

both <- rbind(train_f,test_f)

#Adding number of products
both$no_prod <- str_count(both$ProductList,”;”)+1
str(both)

both$gender <- as.factor(both$gender)
both$gb_p <- as.factor(both$gb_p)
both$gb_p2 <- as.factor(both$gb_p2)
both$gb_1 <- as.factor(both$gb_1)
both$sum <- as.factor(both$sum)
both$sum_gb <- as.factor(both$sum_gb)
str(both)

traindata <- both[1:10500,]
testdata <- both[10501:15000,]

model_log <- glm(gender ~ gb_p+gb_p2+gb_1+sum+sum_gb+no_prod,data = traindata,family = binomial)
summary(model_log)

x <- predict(model_log,testdata)
sub <- cbind(testdata$session_id,x)

write.csv(sub,”C:/Users/User/Desktop/Hackathon/JantaHack/submit_lm.csv”)

install.packages(“caret”)
install.packages(“e1071”)
library(caret)
library(e1071)
set.seed(101)

tuned = tune.svm(gender~ gb_p+gb_p2+gb_1+sum+sum_gb+no_prod , data = traindata, gamma = seq(.1,0.5,0.1), cost = seq(1,60,10))
tuned$best.parameters

model_svm <- svm(gender~ gb_p+gb_p2+gb_1+sum+sum_gb+no_prod , data = traindata, gamma = 0.1, cost = 1, type = “C-classification”)

summary(model_svm)

svm_pred <- predict(model_svm,testdata,type=”response”)

fin_svm <- cbind(testdata$session_id,svm_pred)
write.csv(fin_svm,”C:/Users/User/Desktop/Hackathon/JantaHack/submit_svm.csv”)

model_lin <- lm(gender~ no_prod , data = traindata)
summary(model_lin)
lm_pred <- predict(model_lin,testdata)
head(testdata)
pred_lm <- cbind(testdata$session_id,lm_pred)
head(pred_lm)
table(lm_pred)
write.csv(pred_lm,”C:/Users/User/Desktop/Hackathon/JantaHack/submit_lin.csv”)

install.packages(“randomForest”)
library(randomForest)

model_rf <- randomForest(gender~ gb_p+gb_p2+gb_1+sum+sum_gb+no_prod , data = traindata)
model_rf

pred_rf <- predict(model_rf,testdata)
sub_rf <- cbind(testdata$session_id,pred_rf)

write.csv(sub_rf,”C:/Users/User/Desktop/Hackathon/JantaHack/submit_rf.csv”)

head(train_f)

train_f$Str <- as.String(train_f$ProductList)

library(xgboost)

R – Cheat Sheet

This is a cheat sheet which aims on giving all the important concepts in a very crisp manner. Just give it a read before starting a new project in R or interviewing for a Data Science or Data Analyst or Business Analyst post.
There are various advantages and disadvantages of using R over Python, but we will not dig deep into it.
This is a cheat sheet, so if you need more help, there is this awesome website www.google.com

We will start directly with Data Types which are the building blocks of a programming language.

There are 6 object types supported in R:-
1. Vectors
2. Lists
3. Matrices
4. Arrays
5. Factors
6. Data Frames

There are 6 data types of these objects:-
1. Logical – TRUE, FALSE
2. Numeric – 56.4, 45.3
3. Integer – 1,2,3,4
4. Complex – 6+2i
5. Character – “the”, “data”, “monk”
6. Raw – Any string or anything

Let’s briefly look into each Object types:-
1. Vector

If you want to create a vector with different data types then you have to use c() to define the vector

The only thing worth mentioning here is that when you use a negative index then that index will be ignored. See the example above

2. List
Vector was the simplest object in R, next in the list is a List πŸ˜›

Let’s create a list which will include a list, a vector, and a matrix. If you don’t know much about matrix, just remember it’s a 2-dimensional object which is defined as x<-matrix(c(1,2,3,4,5,6),nrow=2) to create a matrix with 2 rows.

Give names to the elements of list. See the example below

Merge two lists

Converting a list to a vector i.e. unlisting a list


Basically all you need to know about list are:-
1. How to create a list (x <- list())
2. What all can a list include? (Anything ranging from a vector to arrays, matrix, etc.)
3. Giving name to each element of the list (use the function names())
4. Accessing elements of the list (Use [])

Matrix
A matrix is a 2-dimensional object, you need to specify the number of rows and columns, and dim names while declaring a matrix. Here dim names are the names given to the rows and columns πŸ˜›

x <- matrix(c(1,2,3,4,56,23),ncol=2,nrow=3,dimname=list(rownames,column names)

I think you must have got a gist of a matrix. You can definitely create two matrices and apply arithmetic operations like addition, subtraction, etc.
Matrix multiplication is also simple only, see the examples below

Accessing elements in a matrix

Arrays
Arrays are able to store more than 2 dimensions in itself. Vector one dimensional, list is 2-dimensional, and now this array is more than 2 dimensional. God knows where this programming language is going πŸ˜›

Give names to columns, rows, and matrix

34 R Questions you must prepare before Data Science Interview

  1. What are the data structure in R which helps in statistical analysis and graphical representation?
    Ans.)
    The following are the data structure in R which are widely used:-
    a.) Array
    b.)Matrix
    c.)Vector
    d.) Data frame
    e.) List
    f.) Tables
  2. What is class() function in R?
    Ans.)
    This is a very important function in R which is a character vector giving the names of the classes from which the object inherits.
    Example. > x<- 1:10
    > class(x)
    [1] β€œinteger”
  3. What is a vector?
    Ans.) A vector is a sequence of data elements of the same basic type. Members in a vector are called components.
    Example. >vector_example<- c(2,3,4,5)
    > print(vector_example)
    [1] 2,3,4,5 > print(length(vector_exmple)
    [1] 4
  4. How can you combine 2 vectors?
    Ans.)
    Vectors can be combined from 2 to 1 by using the c() function
    Example.
    > first <- c(1,2,3,4)
    > second <- (β€œa”, β€œb”, β€œc”)
    > third <- c(first, second)
    > print(third)
    [1] β€œ1” β€œ2” β€œ3” β€œ4” β€œa” β€œb” β€œc”
    The numbers are also shown in the double quote, this is done to maintain the same primitive data type for the new vector being created J
  5. How to perform arithmetic operations on Vectors? Show with some example
    Ans.)
    There are many arithmetic operators which are being used in R. Remember, R uses the operators component by component. Let’s look at it with some common operators. >x <- c(1,2,3,4)
    >y<- c(4,5,6,7)
    >x+y
    [1] 5 7 9 11
    >x-y
    [1] -3 -3 -3 -3
    >z <- (4,4,4,4,4,4,4)
    >x+z
    [1] 5 6 7 8 5 6 7
    When you have 2 vectors with unequal length and you need to perform an operation on both, then the shorter vector will be used again and again to match the length of both the vectors
  6. Define Index in Vector?
    Ans.) Vector in index is used to give the element at that position of the vector. Few programming language starts the index with 0 and other starts with 1. R counts the index from 1. There are many possibilities while putting an index number i.e.
    a. positive and in range index
    > x<- (1,3,4,5)
    >x[2]
    [1] β€œ3” b. out of range
    > x <- (2,3,4,5)
    > x[110]
    [1] NA c. negative index – It removes this element and replies back with all the left numbers
    >x<- (3,4,5,6,7)
    >x[-3]
    [1] β€œ3” β€œ4” β€œ6” β€œ7” d. range of values
    >x <- (3,4,5,6,7,8)
    >x[2:5]
    [1] β€œ4” β€œ5” β€œ6” β€œ7” e. duplicate index
    > x<- (3,4,5,6,7)
    > s[c(2,1,2,3)]
    [1] β€œ4” β€œ3” β€œ4” β€œ5” f. logical index – If you want to select a particular group of index number, then you should use logical operators i.e. TRUE and FALSE
    >x<- (2,3,4,5,6)
    > s[c(TRUE,FALSE,FALSE,TRUE,TRUE)
    [1] β€œ2” β€œ5” β€œ6”
  7. A list, as the name suggests is a number of vectors collected together. Suppose, you have a number vector, a character vector, a Boolean vector and some numbers. You want to combine it into one which obviously won’t have the same data type. So you need to create a list > n = c(2, 3, 5)
    > s = c(“a”, “b”, “c”, “d”, “e”)
    > b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
    > x = list(n, s, b, 3)
    > print(x)
    [[1]]
    [1] 2 3 5 [[2]]
    [1] β€œa” β€œb” β€œc” β€œd” β€œe” [[3]]
    [1] TRUE FALSE TRUE FALSE FALSE [[4]]
    [1] 3
  8. What is a Matrices ?
    Ans. )
    A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.
    Example.
    # Matrix creation
    > M=matrix(c(1,2,3,4,5,6), nrow=2, ncol=3, byrow=TRUE)
    print(M)
    [1]                    [2]              [3]
    [1]  1                      2                    3
    [2]  4                      5                    6 where
    nrow = number of rows in the matrix
    ncol = number of columns in the matrix
    byrow = TRUE/FALSE will get you value first by row or column
  9. What is an Array?
    Ans.)
    Array is a super set of Matrices. On one hand the matrices can be of 2 dimension but array can be of any number of dimensions.
    Example.
    > a <- array(c(β€œcar”,”bike”), dim(3,3,2))
    > print(a) , , 1
    [,1]     [,2]     [,3]
    [1,] “car”  “bike” “car”
    [2,] “bike” “car”  “bike”
    [3,] “car”  “bike” “car”
    , , 2
    [,1]     [,2]     [,3]
    [1,] “bike” “car”  “bike”
    [2,] “car”  “bike” “car”
    [3,] “bike” “car”  “bike” >my_array<- array(1:24, dim=c(3,4,2))
    >my_array
    , , 1
    [,1] [,2] [,3] [,4]
    [1,]  1     4     7    10
    [2,]  2     5     8    11
    [3,]  3     6     9    12
    , , 2
    [,1] [,2]  [,3] [,4]
    [1,]  13   16   19   22
    [2,]  14   17   20   23
    [3,]  15   18   21   24
  10. What is a factor?
    Ans.)
    Factors are the r-objects which are created using a vector. Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values. It stores the vector along with the distinct values of the elements in the vector as labels.
    Factors are created using the factor() function. The nlevelsfunctions gives the count of levels. Example
    #First let’s create a vector
    >vector_example<- c(β€˜a’,’b’,’c’,’a’,’a’)
    #Now create a factor object
    >factor_example<- factor(vector_example)
    >print(factor_example)
    [1] a b c a a
    >print(nlevels(factor_example))
    [1] 3 nlevels gives you the number of distinct values in the vector.
  11. What is the difference between Matrix and an array ?
    Ans.)
    Matrix can have only 2 dimensions where as an array can have as many dimensions as you want. Matrix is defined with the help of data, number of rows, number of columns and whether the elements are to be put in row wise or column wise.
    In array you need to give the dimension of the array. An array can be of any number of dimensions and each dimension is a matrix. For example a 3x3x2 array represents 2 matrices each of dimension 3×3.
  12. What is a data frame?
    Ans.)
      Data frame is a list of vectors  of equal length. It can consist of any vector with a particular data type and can combine it into one. So, a data frame can have a vector of logical and another of numeric. The only condition being that all the vectors should have the same length.
    Example. #This is how the data frame is created
    >student_profile<- data.frame(
    name<-c(β€œAmit”, β€œSumit”, β€œAjay”)
    age <- c(22,23,24)
    class <- c(6,7,8)
    )
    print(student_profile) The above code will create 3 columns with the column name as name, age and class.
  13. What is the difference between a matrix and a dataframe?
    Ans.)
    A dataframe can contain vectors with different inputs and a matrix cannot. (You can have a dataframe of characters, integers, and even other dataframes, but you can’t do that with a matrix.A matrix must be all the same type.)
    So, the data frame can have different vector of character, numbers, logical, etc. and it is still cool. But, for matrix you need only one type of data type. Phewww !!
  14. Define repeat loop.
    Ans. )
    Repeat loop executes a sequence of statement multiple times. It don’t put the condition at the same place where we put the keyword repeat.
    Example > name <- c(β€œPappu”, β€œJohn”)
    > temp <- 5
    > repeat {
    print(name)
    temp <- temp+2 if(temp > 11) {
    break
    }
    } So, this will return the name vector 4 times. First it prints the name and then increase the temp to 7 and so on.
  15. Define while loop.
    Ans.)
    In the while loop the condition is tested and the control goes into the body only when the condition is true

    Example

    > name <- c(β€œPappu”, β€œJohn”)
    > temp <- 5
    > repeat (temp<11) {
    print(name)
    temp <- temp+2
    } The name will be printed 4 times
  16. Define the for loop.
    Ans.)
    The for loop are not limited to integers. You can pass character vectors, logical vectors, lists or expressions.
    Example.
    > x<- LETTERS[1:2]
    for ( i in x) {
    print(i)
    }
    [1] β€œA”
    [2] β€œB”
  17. What is the use of sort() function? How to use the function to sort in descending order?
    Ans.)
    Elements in a vector can be sorted using the function sort()
    Example. > temp <- c(3,5,2,6,7,1)
    >sort_temp<- sort(temp)
    > print(sort_temp)
    [1] 1 2 3 5 6 7
    >rev_sort<- sort(temp, decreasing = TRUE)
    [1] 7,6,5,3,2,1 This function also works with the words.
  18. Create a list which holds a vector, a matrix and a list.
    Ans.)
    example_list<- list(c(β€œKamal”,”Nitin”), matrix(c(1,2,3,4,5,6), nrow = 2), list(β€œred”,1))
  19. Determine the output of the following function f(2). b <- 4
    f <- function(a)
    {
    b <- 3
    b^3 + g(a)
    }
    g <- function(a)
    {
    a*b
    }
    Ans.)
    The global variable b has a value 4. The function f has an argument 2 and the function’s body has the local variable b with the value 3. So function f(2) will return 3^3 + g(2) and g(2) will give the value 2*4 = 8 where 4 is the value of b.
    Thus, the answer is 35
  20. What is the output of runif(10)?
    Ans.)
    runif() function is used to generate random values and the argument gives the number of values required. So the above function will generate 10 random values between 0 and 1.
  21. Get all the data of the person having maximum salary.
    Ans.)

    max_salary_person<- subset(data, salary == max(salary))
    print(max_salary_person)
  22. Get all the people who works with TCS and have salary more than 300000
    Ans.)

    TCS_data_salary<- subset(data, company == β€œTCS”  & salary > 300000)
  23. How is data reshaping done in R?
    Ans.)
    Data reshaping involves various techniques which is used according to the need. It’s not a procedure you need to follow, but independent methods to remould the data set. Following are the methods used:-
    a. cbind()
    b. rbind()
    c. new_column_name.data_frame_name
    d. merge()
    e. melt()
    f. cast()
  24. How to get outer join, left join, right join, inner join and cross join?
    Ans.)
    outer join – merge(x=df1, y=df2, by = β€œid”, all = TRUE)
    left join – merge(x=df1, y=df2, by=”id”, all.x = TRUE)
    right join – merge(x=df1, y=df2, by = β€œid”, all.y = TRUE)
    inner join – merge(x=df1, y=df2, by = β€œid”)
    cross join – merge(x=df1, y=df2, by = NULL)
  25. When you are reshaping the data, you sometimes need to melt the data. Explain melt() function
    Ans.)
    Suppose you have a data set which havecompany_name, age, salary, children. So when you want to have the data where you need the data grouped by company_name and then under company_name grouped by age. This whole process is called melting the data and it is performed with melt() function
    Example. new_data_set.previous_data_set<- melt(previous_data_set, id=c(β€œcompany_name”,”age”))
  26. What is lapply() function in R?
    Ans.)
    lapply() function is used when you want to apply a function to each element of a list in turn and get a list back.
    Example.
    x<- list(a=1, b=1:3, c=10:100)
    lapply(x,FUN=length)
    $a
    [1] 1
    $b[1] 3

    $c

    [1] 91

    You can use other functions like max, min, sum, etc.
  27. What is sapply() function in R?
    Ans.)
    sapply() function is used when you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
    Vector
     is useful sometimes because it will get you a set of values and you can easily perform an operation on it.
    Example.
    x <-list(a =1, b =1:3, c =10:100)#Compare with above; a named vector, not a list sapply(x, FUN = length)a  b  c   1391 sapply(x, FUN = sum)a    b    c    165005
  28. What is the difference between lapply and sapply?
    Ans.)
    If the programmers want the output to be a data frame or a vector, then sapply function is used whereas if a programmer wants the output to be a list then lapply is used
  29. How to apply mean function in R?
    Ans.)
    Mean is calculated by taking the sum of numbers and dividing it with the total number of elements. The function mean() is used to apply this in R.
    Syntax
    mean(x, trim=0,na.rm=FALSE) The mean() function have 3 arguments
    a.) x contains the vector on which mean is to be applied
    b.) trim = 0, It is used to drop some observations from each end of the sorted array.
    c.) na.rm is used to remove the missing values from the input vector If there are missing values in the vector then mean will return NA as a result, so in order to drop the missing values to get a mean, you should put na.rm=TRUE which means remove the missing values.
  30. How to make scatterplot in R?
    Ans.)
    Scatterplot is a graph which shows many points plotted in the Cartesian plane. Each point holds 2 values which are present on the x and y axis. The simple scatterplot is plotted using plot() function.
    The syntax for scatterplot is:-

    plot(x, y ,main, xlab, ylab, xlim, ylim, axes)

    Where
    x is the data set whose values are the horizontal coordinates
    y is the data set whose values are the vertical coordinates
    main is the tile in the graph
    xlab and ylab is the label in the horizontal and vertical axis
    xlim and ylim are the limits of values of x and y used in the plotting
    axes indicates whether both axis should be there on the plot plot(x =input$wt,y=input$mpg,xlab=”Weight”,ylab=”Milage”,xlim= c(2.5,5),ylim= c(15,30),main=”Weight vsMilage”)
  31. Bonus Question
    How to write a countdown function in R?
    Ans.)

    timer<- function(time){print(time)while(time!=0)  {Sys.sleep(1)time<- time – 1print(time)  }}countdown(5)
    [1] 5
    [2] 4
    [3] 3
    [4] 2
    [5] 1
  32. Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[1]?
    Ans.)

    Error
  33. Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[[1]]?
    Ans.)

    [1] 5 12 21 32
  34. What are some of the functions that R have?
    Ans.)
    The functions present in R are:-
    a. Mean
    b. Median
    c. Distribution
    d. Covariance
    e. Regression
    f. GAM
    g. GLM
    h. Non-linear
    i. Mixed Effects etc
TheDataMonk has compiled the top 100 R questions which you must prepare before any R interview focusing on a Data Science role. You can buy it on Amazon. Links below
i. Amazon
ii. Amazon India

Or you can mail me at nitinkamal132@gmail.com and get your copy free πŸ™‚