Most Asked Supervised Learning Interview Questions

Supervised Learning is one of the most important part of Data Science realm. It is the part of Data Science where you supervise or have the output for a given set of inputs and you have to create a model to learn from these input-output combinations using different features(columns/attributes). The final aim is to predict the value of a new input.

Following are some of the top Supervised Learning interview questions.

OyoRooms Interview Questions

OYO Rooms, commonly known as OYO, is India’s largest hospitality company, consisting mainly of budget hotels. It was founded in 2013 by Ritesh Agarwal and has since grown to over 8,500 hotels in 337 cities in India, Malaysia, UAE, Nepal, China and Indonesia.

Location – Bangalore
Job Title – Data Analyst
Experience Required – 2-3 Years
Number of Rounds – 4 Round 1 – Written SQL & ExcelTest

There were 30 SQL and MS Excel questions (MCQ) in 30 minutes. The questions were definitely doable. It was mostly around getting the output of the SQL query, some common syntax, and Excel formulas. The cut-off was around 15 in our drive. Some of the questions are given below:-
1. Order of execution of a SQL query?
A. SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

2. Syntax of ROW_NUMBER() function

SELECT Column_Name, ROW_NUMBER() OVER (ORDER BY Column_To_Order_On DESC) as ROW_NUM
FROM Table_Name

3. What is the syntax of the SUBSTITUTE function in MS Excel?

SUBSTITUTE(text, oldText, newText, [instanceNumber])

4. How to fill a column with “greater than 10” and “less than 10” text, where numbers to compare is in another column?

=IF (Column_Index>10, “Greater than ten, “Less than ten”)

5. Select the correct syntax of VLOOKUP()

=VLOOKUP(lookup_value, table_array, col_index_num, range_lookup)

6. What is the order of precedence in EXCEL formulas?

A. Parenthesis (Brackets)

     Exponentiation (^)
     Multiplication or Division
     Addition or Subtraction 

7. Find the name of employees starting with ‘C’

SELECT * FROM Employees WHERE EmpName like ‘C%’  

8. How to find the second highest salary from an employee table(Most asked question in an interview)

SELECT name, MAX(salary) AS salary
FROM employee
WHERE salary < (SELECT MAX(salary) FROM employee); 9. Difference between HAVING and WHERE command.
Standard Answer below

WHERE clause is used for filtering rows and it applies on each and every row while HAVING clause is used to filter groups in SQL

10. Difference between TRUNCATE and DELETE
Standard Answer below

The DROP command removes a table from the database. All the tables’ rows, indexes, and privileges will also be removed. … DROP and TRUNCATE are DDL commands, whereas DELETE is a DML command. DELETE operations can be rolled back (undone), while DROP and TRUNCATE operations cannot be rolled back.

Round 2 – Case Study round (Non-Elimination Round)

For OyoRooms Case study, do visit this page

Round 3 – Technical Interview (Python and Machine Learning)

In this round, the questions were mostly around my project and the project for which the team is recruiting. My interview was mostly around Supervised Learning part of Data Science. Following questions were asked, I would suggest you look for answers on this page.

1. What is Supervised Learning?
2. What is reinforcement learning?
3. Example of Supervised Learning algorithm.
4. Difference between Classification and regression.
5. What is Lasso Regression?
6. What is Loss Function?
7. What is Cross Validation?
8. What is regularization?

Since I worked on Python, so some of the syntaxes were asked

9. How to create a train and test data set?
10. What is random_state?
11. How to convert all the exclamation marks in NaN?
12. How to set pipeline in Python?
13. How to create a 5-fold cross-validation method in Python? from sklearn.model_selection import cross_val_score
reg = linear_model.LinearRegression()
cv_results = cross_val_score(reg,X,y,cv=5)

14. What is a hyperparameter?

You are not expected to answer each and every question, but you should have some knowledge about each of these. The Python questions were asked to see if I am good with the basic coding

Round 4 – HR Round

Basic HR Questions and Salary Negotiation. You can ask questions about the team which you will work for.

Salary Offered – Best in the industry (4.5/5)

 

Exploring Python’s Numpy package

Numpy is a Python package which is well known for its data-crunching functions. It is one of the most used Python packages for Data Science. We will look into the package in brief below:-

To start with, import the numpy package

import numpy as np

Few striking points when you use numpy functions/objects:-
1. When you use traditional list, then you can include elements with different data types. But, numpy array can’t have elements of different data types

2. You can perform operation on each element of Numpy array, but you can’t do the same with a list. Example below:-

x = [1,2,3]
y = numpy.array([1,2,3])
x*3 will throw an error
y*3 
will give you output as 3,6,9

3. How to store a list in a numpy array?

x = [4,6,7]
z = numpy.array(x)

4. How to create a 2 Dimensional numpy array?

x = numpy.array([[1,2,3],[9,8,7]])

Output
[[1 2 3] [9 8 7]]

5. The first cell in a two-dimensional array will have the address [0,0]. Thus to access the 2nd row and 3 rd column of a 2D array, you should give the address as x[1][2]

6. You can directly convert a list of list into 2D array by passing the list in numpy.array() function

7. To access all the columns of 14th row, use the below command

x[13,:]

8. If you want to add two 2D array, then you can do it by ‘+’ operator. Example below:-

x = numpy.array([[1,2,3],[9,8,7]])
y = numpy.array([[4,5,6],[3,4,5]])
print(x+y)

Output

[[ 5 7 9] [12 12 12]]

9. Different built-in functions in Numpy package

a. sum()
b. sort()
c. mean()
d. median()
e. corrcoef()
f. std()

10. Generate sample data from

x = numpy.random.normal((3,0.5,100),2)

The above code will get you 100 sample with distribution mean of 3 and standard deviation of 0.5.

Numpy contains a very very diverse set of mathematical function. I could have copied it from somewhere !! But here is a website which seems to have all the built-in mathematical functions of numpy. Do check out.

 

Important Built-in functions in Python

We all know what a function is and how helpful it is?
Just iterating what we already know, Functions are nothing but a reusable piece of code which often takes some value, processes it and returns the result. The best thing about a function is that you can directly call the function anywhere in the code. Following are some of the important built-in functions which are mostly used in our projects. We are not giving an extensive and boring list of all the functions present in this Universe, the below list is hand-picked 😛

1. Typecasting – You can change the data type of a variable by typecasting it

x = False
z = int(x)

2. Round() – Round function is used to round off any decimal number

    round(2.54,1)
    2.5

3.Sorted(list, reverse = TRUE) – IT will sort the list in descending order

4.list_name.index(“element”) – It will get you the index of the specific element

5.  list_name.append(“‘Saurav”) – It will append the element at the very end of the list.

6. list_name.upper() will change every element in the upper case

7. x.count(“data”) – This will count the number of occurrences of the word “data” in  x

8. x.reverse() will reverse the order of the list x

9. max() and min() functions – Gets you maximum and minimum value from the list

10. len() – Returns the length of an object

11. abs() – Returns absolute value of a number

12. print() – Prints anything 😛

Above are all the built-in functions which we generally use. If you want the complete list, then go here. I just gave them a backlink 😛

Starting with Python

Python is undoubtedly the best language to perform any data science operation. The best part of using Python is that the Python community is very active, so whenever you are stuck in a piece of code or concept, then you can easily find it on the internet. Without wasting much time, let’s start with the basics of Python.

P.S. – This tutorial will be fast-paced

Google and download Anaconda Navigator from here. We will work on the Jupyter Notebook, Open the Anaconda Navigator and launch the Jupyter Notebook.

Very Basic concepts in a nutshell (You don’t have to give 4-6 hours for the basics, we will keep covering the topics as and when it comes)
1. You can directly define a variable, Self-understandable code below # Create a variable multi
multi = 3.6 # Create a variable x
x = 100 # Feed the variables in your formula
result = x * multi**10 # Print the result
print (result)

2. There are 4 important data types in Python – Integer, Float, Boolean and String. There are other data types also, but we don’t really need them. So chuck !!

3. List – A list is an important data structure in Python. Basically, it’s an array where you keep multiple data types at one place.

x = [[‘Rahul’,10156],[‘Nitin’,8382],[‘Ajit’,9876]]
print(x)
print(type(x))

O/P

[[‘Rahul’, 10156], [‘Nitin’, 8382], [‘Ajit’, 9876]] <class ‘list’>

4. List’s element starts with index 0 i.e. if there are 8 elements in a list, you can access the elements from 0 to 7. You can also access the last element with -1, second last with -2, etc.

print(x[-1])
[‘Ajit’, 9876]

5. To access the range of list, use print(x[1:3]), it will help you access 2nd and 3rd element. The higher range value is excluded and the lower one is included.

6. You can directly modify a list with the index of the element you want to change ex. x[2] = [‘Ankit’,9765]

7. Add an element to the list, simply use ‘+’ sign. 
Example – x + [‘Kamal’+4321]

8. Delete an element from the list, simply use the function del()

9. Don’t use “=” sign for copying a list into another. Instead, use
new_list = list(old_list)
or
new_list = old_list[:]

These were the basics of data structure and data types. If you want to learn “How to write basic functions in Python” then click on the link.

 

 

 

 

 

 

 

 

Day 1 – What is Data?

Welcome to the first day of “100 Days Data Science Challenge”. In this series, we will try to make you understand with one topic at a time which should not take more than 2 hours. So, in total, you need to devote 150-200 hours in the next 100 days to understand various data science topics, ranging from beginner to intermediate level.

We will start with the basics.

What is data?
In Latin, Data is the plural of datum. In English Data is both singular and plural. Data is any sort of information, it could be in the form of facts or figures,  numbers, text, images, sound, etc.

Example- “Sachin”,”51″,”greatest”

What is

Regular Expression

We will start with Regular Expressions. RegEx is a way of creating a rule to filter what you want from your data. We will to keep Python as a consistent coding language for this complete exercise.

Many of you must have come across SQL questions where you need to get the data of customers whose name starts with A and in the WHERE condition you write something like,

WHERE Customer_Name LIKE ‘A%’

Well !! This is the basic Regular Expression where you request the query to get you a specific result. The way we write Regular Expression in Python is a bit different. Check out the table below:-

To use Regular Expression, first, you need to “import re” package
And the following 4 functions quite useful for using your regex 1. findall – It returns a complete list of all the matches
2. search – It returns a match object
3. split – Splits the string wherever there is a match
4. sub – It replaces one or many matches of the regex

Following are some important metacharacter and special sequence

RegEx Description
w+ Get all the words
d Digits
s Spaces
S Anything but white spaces
+ One or more occurrences
^ Starts with
$ Ends with
* Zero or more occurences
+ One or more occurrences
| Either Or
[] A set of Character
Special sequence

Let’s get down on some questions to understand the basics of how to write a regex

1. re.split(‘s+’,’My name is Data Monk’)
‘My’ ‘name’ ‘is’ ‘Data’ ‘Monk’ – The above function took the regex s+ to get all the words from the given string and split it 2. end_Sentence = r'[.?!]’
print(re.split(end_Sentence, String)
The above line of codes will split the document wherever a sentence is ending with a full stop, question mark, or an exclamation mark
3. [a-z A-Z 0-9 -.]
This will match all the upper case, lower case, digits, – and . 4. r”[.*]”
Since it contains an asterisk, so it will match anything and everything You can find many more RegEx exercise questions on different websites. Do practice a few 🙂
Let’s continue with our NLP

 

Natural Language Processing Using Python

What is NLP?

NLP stands for Natural Language Processing. To put it in a simple way, NLP is a way of developing applications which can understand human language. Millions of Gigabytes of data is being generated every day and there is a high demand to understand this data in a simpler way. We have different analytical methods to fetch insights from numerical data, and NLP comes into the picture when we are dealing with textual data.

What are the applications of NLP?

NLP is being used heavily in the following domain:-
1. Search Engines – Be it Google or Bing, we write a query and rely on the NLP part of the algorithm to fetch us the best results.
2. Speech Engines – Almost all mobile phones come with an inbuilt speech engine which translates your voice feed into a query and gets you the search results. Example – Siri
3. Social Media – Have you ever wondered how you are receiving relevant ads in your social media feed? Well, that is NLP, decoding your interest and pushing relevant ads on your feed
4. Survey Analysis – You must have filled some surveys, and there are billions of survey copies in the digital world, you need to go through every survey answer to get a gist of customer sentiments, but going through millions of copies is quite cumbersome. So, what do you do? You build an NLP model, feed the content and boom..It’s done.
5. Artificial Intelligence Chat Bot – This is one important application of NLP, where you chat with a bot and it gets what you ask for
6. Spam filters – For every relevant email, there are at least 2 spam emails, but you never worry about it because you never got to see one. All those spam filters are powered by NLP

What are we going to learn in this post?

We will use Python to understand the basics of NLP, regular expression and few algorithms. So, be with us for another 30-40 minutes to understand Natural Language Processing using Python.

Before starting NLP, Please go through Regular Expression.

 

 

 

 

 

Sapient Business Analyst Interview Questions

Publicis Sapient, formerly Sapient, is a digital transformation partner helping established organizations get to their future, digitally-enabled state by fusing strategy, consulting and customer experience with agile engineering. It was founded in 1990.
Location – Bangalore
Job Title – Business Analyst
Experience required – 1-3 years
Number of Rounds – 4
Round 1 – Telephonic Round
The telephonic interview lasted for ~45 minutes where the questions were mostly on the tools and technologies I have worked on in my previous organization. Slowly, the questions shifted to SQL and statistics. Following are the questions which were asked:-
  1. What is the output of SELECT NULL+0?
    NULL
  2. What are the ranking functions in SQL?
    There are mainly 3 types of ranking functions:-
    Rank()
    Row_Number()
    Dense_Rank()
  3. What is a partition by clause and how is it used?
    Partition by clause is used to create to divide the whole data in different parts depending on the column on which it is partitioned. Suppose the data contains 50 rows and have data for 6 States, then if you do a partition by on state, the whole data set will be treated differently on all the 6 partitions. The syntax for partition by in ROW_NUMBER() is given below SELECT *, ROW_NUMBER() OVER (PARTITION BY State ORDER BY population DESC) AS row_num
    FROM Table_Name So, a new column will be added in the result as row_num and it will give a row number to all the state row starting from 1. Once the rows of a particular state is over, then it will again take up another state and will start the counting from there
  4. What is A/B Testing?
    A/B testing is a form of statistical hypothesis testing with two variants leading to the technical term, two-sample hypothesis testing, used in the field of statistics. In simple words, A/B Testing in web analytics is used to compare the performance of 2 web design to get a better design. Suppose you have 2 designs to display an advertisement on your website, one being a picture and other a text or link. So, you can compare the performance of the two design by A/B Testing.
  5. What is regression?
    Regression is a form of predictive modeling technique to determine the strength of the relationship between a dependent and independent variable. One of these variables is called a predictor variable whose value is gathered through experiments. The other variable is called the response variable whose value is derived from the predictor variable.Y=aX+b – Linear regression (X is predictor variable and Y is response variable)
  6. Give some example of regression?
    Regression is used for forecasting, time series modeling and finding the casual effect relationship between the variables. For example, the relationship between rash driving and the number of road accidents by a driver is best studied through regression.
  7. What is a multiple regression?
    Multiple regression is an extension of linear regression into the relationship between more than two variables. In simple linear relation we have one predictor and one response variable, but in multiple regression, we have more than one predictor variable and one response variable.Y=a1x1+a2x2+..+b
  8. What is DENSE_RANK() function?
    DENSE_RANK() again is a ranking function which is very similar to RANK() function. The only difference is that it does not miss any rank even if there are duplicates in the table.
  9. Syntax of DENSE_RANK() function
    SELECT *, DENSE_RANK() OVER (PARTITION BY Column1 ORDER BY Column2 DESC)
    FROM Table_Name
  10. There was a question on self-join where you have to get the employee name and manager name from a table having 3 columns, EmployeeID, EmployeeName, ManagerID
    SELECT e1.Name AS EmployeeName, e2.Name AS ManagerName
    FROM Employee AS e1
    INNER JOIN Employee AS e2
    ON e1.ManagerID = e2.EmplyeeID

There were a few questions on the project you are working on right now

Round 2 – Case Study

The Case Study topic was to recommend two food items to a customer who is new to the restaurant. You can find the complete analysis of this case study and other case studies here

Round 3 – Face to Face Technical Round
This round was mostly about past projects. I had a Natural Language Processing project, so the interview revolved around the same topic. Following questions were asked in this round:- 1. What was the project for?
A. The project was to do sentiment analysis on the survey data filled by online customers. 2. What algorithms/methods did you try?
A. We tried multiple algorithms, starting from TF-IDF, Part-Of-Speech tagging, n-gram, Lemmatization, Stemming, Tokenization, Latent Semantic Indexing, Sentiment Analysis. 3. What all methods do you need to perform in order to convert a keyword into its base form(Normalization)?
A. Lemmatization and Stemming 4. What is N-gram?
A. N-grams are simply all combinations of adjacent words or letters of length n that you can find in your text file.
For example
This is a sentence
N-grams = This is, is a, a sentence 5. What is the use of TF-IDF?
A. TF-IDF stands for Term Frequency and Inverse Document Frequency. TF-IDF is numerical statistics that help to understand the importance of a particular word in a document. Term frequency gets you the number of times a particular word has occurred in a document and Inverse Document Frequency gets you the importance of the words. It helps out in filtering out the most common words like a, an, the, was, etc.. So, you get only the important terms. 6. What is Lemmatization?
Lemmatization takes into account the morphological analysis of the word. It converts a word into its pure root form by looking into the morphological information studies – Third person, singular number, present tense of verb study
Lemma – study
studying – Gerund of the verb study
Lemma – study

As you can see, both the words studies and studying has been narrowed down to the lemma study. 7.

7. Explain the complete flow of your NLP project
A. The brief of the process is given below with some coding examples:-
Step 1 – Get the text dataset
Step 2 – Tokenize the text using get_text() in Python
Step 3 – Split the text using
tokens = [t for t in text.split()]
Step 4 – Get the count of the word frequency using the NLTK package in Python
freq = nltk.FreqDist(tokens)
Step 5 – Remove stop words. Code below
for token in tokens:
          if token in stopwords.words(‘english’):
             clean_tokens.remove(token) Step 6 – Tokenize non-English words
Step 7 – Get synonyms and antonyms using WordNet package from NLTK in Python
Step 8 – Stemming of words. I used PorterStemmer algorithm
stem_Word = PorterStemmer
Step 9 – Once we are done with stemming, go for Lemmatization. WordNet package
lemma = WordNetLemmatizer()
Step 10 – Build a classifier. We can you Logistic Regression to create a baseline model. Later we used Naive Bayes Classification.

There were questions only on the logical part of the process and not on the code implementation. But, it’s always better to infuse coding example wherever you can. The interview lasted for around 1 hour.

Round 4 – Human Resource
Basic questions, like:-
1. Why are you quitting your present job?
2. What are your expectations with the company? and the company’s expectation
3. Salary negotiation
4. Have you ever lead a team?

Salary offered – Best in the industry (5/5) 

34 R Questions you must prepare before Data Science Interview

  1. What are the data structure in R which helps in statistical analysis and graphical representation?
    Ans.)
    The following are the data structure in R which are widely used:-
    a.) Array
    b.)Matrix
    c.)Vector
    d.) Data frame
    e.) List
    f.) Tables
  2. What is class() function in R?
    Ans.)
    This is a very important function in R which is a character vector giving the names of the classes from which the object inherits.
    Example. > x<- 1:10
    > class(x)
    [1] “integer”
  3. What is a vector?
    Ans.) A vector is a sequence of data elements of the same basic type. Members in a vector are called components.
    Example. >vector_example<- c(2,3,4,5)
    > print(vector_example)
    [1] 2,3,4,5 > print(length(vector_exmple)
    [1] 4
  4. How can you combine 2 vectors?
    Ans.)
    Vectors can be combined from 2 to 1 by using the c() function
    Example.
    > first <- c(1,2,3,4)
    > second <- (“a”, “b”, “c”)
    > third <- c(first, second)
    > print(third)
    [1] “1” “2” “3” “4” “a” “b” “c”
    The numbers are also shown in the double quote, this is done to maintain the same primitive data type for the new vector being created J
  5. How to perform arithmetic operations on Vectors? Show with some example
    Ans.)
    There are many arithmetic operators which are being used in R. Remember, R uses the operators component by component. Let’s look at it with some common operators. >x <- c(1,2,3,4)
    >y<- c(4,5,6,7)
    >x+y
    [1] 5 7 9 11
    >x-y
    [1] -3 -3 -3 -3
    >z <- (4,4,4,4,4,4,4)
    >x+z
    [1] 5 6 7 8 5 6 7
    When you have 2 vectors with unequal length and you need to perform an operation on both, then the shorter vector will be used again and again to match the length of both the vectors
  6. Define Index in Vector?
    Ans.) Vector in index is used to give the element at that position of the vector. Few programming language starts the index with 0 and other starts with 1. R counts the index from 1. There are many possibilities while putting an index number i.e.
    a. positive and in range index
    > x<- (1,3,4,5)
    >x[2]
    [1] “3” b. out of range
    > x <- (2,3,4,5)
    > x[110]
    [1] NA c. negative index – It removes this element and replies back with all the left numbers
    >x<- (3,4,5,6,7)
    >x[-3]
    [1] “3” “4” “6” “7” d. range of values
    >x <- (3,4,5,6,7,8)
    >x[2:5]
    [1] “4” “5” “6” “7” e. duplicate index
    > x<- (3,4,5,6,7)
    > s[c(2,1,2,3)]
    [1] “4” “3” “4” “5” f. logical index – If you want to select a particular group of index number, then you should use logical operators i.e. TRUE and FALSE
    >x<- (2,3,4,5,6)
    > s[c(TRUE,FALSE,FALSE,TRUE,TRUE)
    [1] “2” “5” “6”
  7. A list, as the name suggests is a number of vectors collected together. Suppose, you have a number vector, a character vector, a Boolean vector and some numbers. You want to combine it into one which obviously won’t have the same data type. So you need to create a list > n = c(2, 3, 5)
    > s = c(“a”, “b”, “c”, “d”, “e”)
    > b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
    > x = list(n, s, b, 3)
    > print(x)
    [[1]]
    [1] 2 3 5 [[2]]
    [1] “a” “b” “c” “d” “e” [[3]]
    [1] TRUE FALSE TRUE FALSE FALSE [[4]]
    [1] 3
  8. What is a Matrices ?
    Ans. )
    A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.
    Example.
    # Matrix creation
    > M=matrix(c(1,2,3,4,5,6), nrow=2, ncol=3, byrow=TRUE)
    print(M)
    [1]                    [2]              [3]
    [1]  1                      2                    3
    [2]  4                      5                    6 where
    nrow = number of rows in the matrix
    ncol = number of columns in the matrix
    byrow = TRUE/FALSE will get you value first by row or column
  9. What is an Array?
    Ans.)
    Array is a super set of Matrices. On one hand the matrices can be of 2 dimension but array can be of any number of dimensions.
    Example.
    > a <- array(c(“car”,”bike”), dim(3,3,2))
    > print(a) , , 1
    [,1]     [,2]     [,3]
    [1,] “car”  “bike” “car”
    [2,] “bike” “car”  “bike”
    [3,] “car”  “bike” “car”
    , , 2
    [,1]     [,2]     [,3]
    [1,] “bike” “car”  “bike”
    [2,] “car”  “bike” “car”
    [3,] “bike” “car”  “bike” >my_array<- array(1:24, dim=c(3,4,2))
    >my_array
    , , 1
    [,1] [,2] [,3] [,4]
    [1,]  1     4     7    10
    [2,]  2     5     8    11
    [3,]  3     6     9    12
    , , 2
    [,1] [,2]  [,3] [,4]
    [1,]  13   16   19   22
    [2,]  14   17   20   23
    [3,]  15   18   21   24
  10. What is a factor?
    Ans.)
    Factors are the r-objects which are created using a vector. Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values. It stores the vector along with the distinct values of the elements in the vector as labels.
    Factors are created using the factor() function. The nlevelsfunctions gives the count of levels. Example
    #First let’s create a vector
    >vector_example<- c(‘a’,’b’,’c’,’a’,’a’)
    #Now create a factor object
    >factor_example<- factor(vector_example)
    >print(factor_example)
    [1] a b c a a
    >print(nlevels(factor_example))
    [1] 3 nlevels gives you the number of distinct values in the vector.
  11. What is the difference between Matrix and an array ?
    Ans.)
    Matrix can have only 2 dimensions where as an array can have as many dimensions as you want. Matrix is defined with the help of data, number of rows, number of columns and whether the elements are to be put in row wise or column wise.
    In array you need to give the dimension of the array. An array can be of any number of dimensions and each dimension is a matrix. For example a 3x3x2 array represents 2 matrices each of dimension 3×3.
  12. What is a data frame?
    Ans.)
      Data frame is a list of vectors  of equal length. It can consist of any vector with a particular data type and can combine it into one. So, a data frame can have a vector of logical and another of numeric. The only condition being that all the vectors should have the same length.
    Example. #This is how the data frame is created
    >student_profile<- data.frame(
    name<-c(“Amit”, “Sumit”, “Ajay”)
    age <- c(22,23,24)
    class <- c(6,7,8)
    )
    print(student_profile) The above code will create 3 columns with the column name as name, age and class.
  13. What is the difference between a matrix and a dataframe?
    Ans.)
    A dataframe can contain vectors with different inputs and a matrix cannot. (You can have a dataframe of characters, integers, and even other dataframes, but you can’t do that with a matrix.A matrix must be all the same type.)
    So, the data frame can have different vector of character, numbers, logical, etc. and it is still cool. But, for matrix you need only one type of data type. Phewww !!
  14. Define repeat loop.
    Ans. )
    Repeat loop executes a sequence of statement multiple times. It don’t put the condition at the same place where we put the keyword repeat.
    Example > name <- c(“Pappu”, “John”)
    > temp <- 5
    > repeat {
    print(name)
    temp <- temp+2 if(temp > 11) {
    break
    }
    } So, this will return the name vector 4 times. First it prints the name and then increase the temp to 7 and so on.
  15. Define while loop.
    Ans.)
    In the while loop the condition is tested and the control goes into the body only when the condition is true

    Example

    > name <- c(“Pappu”, “John”)
    > temp <- 5
    > repeat (temp<11) {
    print(name)
    temp <- temp+2
    } The name will be printed 4 times
  16. Define the for loop.
    Ans.)
    The for loop are not limited to integers. You can pass character vectors, logical vectors, lists or expressions.
    Example.
    > x<- LETTERS[1:2]
    for ( i in x) {
    print(i)
    }
    [1] “A”
    [2] “B”
  17. What is the use of sort() function? How to use the function to sort in descending order?
    Ans.)
    Elements in a vector can be sorted using the function sort()
    Example. > temp <- c(3,5,2,6,7,1)
    >sort_temp<- sort(temp)
    > print(sort_temp)
    [1] 1 2 3 5 6 7
    >rev_sort<- sort(temp, decreasing = TRUE)
    [1] 7,6,5,3,2,1 This function also works with the words.
  18. Create a list which holds a vector, a matrix and a list.
    Ans.)
    example_list<- list(c(“Kamal”,”Nitin”), matrix(c(1,2,3,4,5,6), nrow = 2), list(“red”,1))
  19. Determine the output of the following function f(2). b <- 4
    f <- function(a)
    {
    b <- 3
    b^3 + g(a)
    }
    g <- function(a)
    {
    a*b
    }
    Ans.)
    The global variable b has a value 4. The function f has an argument 2 and the function’s body has the local variable b with the value 3. So function f(2) will return 3^3 + g(2) and g(2) will give the value 2*4 = 8 where 4 is the value of b.
    Thus, the answer is 35
  20. What is the output of runif(10)?
    Ans.)
    runif() function is used to generate random values and the argument gives the number of values required. So the above function will generate 10 random values between 0 and 1.
  21. Get all the data of the person having maximum salary.
    Ans.)

    max_salary_person<- subset(data, salary == max(salary))
    print(max_salary_person)
  22. Get all the people who works with TCS and have salary more than 300000
    Ans.)

    TCS_data_salary<- subset(data, company == “TCS”  & salary > 300000)
  23. How is data reshaping done in R?
    Ans.)
    Data reshaping involves various techniques which is used according to the need. It’s not a procedure you need to follow, but independent methods to remould the data set. Following are the methods used:-
    a. cbind()
    b. rbind()
    c. new_column_name.data_frame_name
    d. merge()
    e. melt()
    f. cast()
  24. How to get outer join, left join, right join, inner join and cross join?
    Ans.)
    outer join – merge(x=df1, y=df2, by = “id”, all = TRUE)
    left join – merge(x=df1, y=df2, by=”id”, all.x = TRUE)
    right join – merge(x=df1, y=df2, by = “id”, all.y = TRUE)
    inner join – merge(x=df1, y=df2, by = “id”)
    cross join – merge(x=df1, y=df2, by = NULL)
  25. When you are reshaping the data, you sometimes need to melt the data. Explain melt() function
    Ans.)
    Suppose you have a data set which havecompany_name, age, salary, children. So when you want to have the data where you need the data grouped by company_name and then under company_name grouped by age. This whole process is called melting the data and it is performed with melt() function
    Example. new_data_set.previous_data_set<- melt(previous_data_set, id=c(“company_name”,”age”))
  26. What is lapply() function in R?
    Ans.)
    lapply() function is used when you want to apply a function to each element of a list in turn and get a list back.
    Example.
    x<- list(a=1, b=1:3, c=10:100)
    lapply(x,FUN=length)
    $a
    [1] 1
    $b[1] 3

    $c

    [1] 91

    You can use other functions like max, min, sum, etc.
  27. What is sapply() function in R?
    Ans.)
    sapply() function is used when you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.
    Vector
     is useful sometimes because it will get you a set of values and you can easily perform an operation on it.
    Example.
    x <-list(a =1, b =1:3, c =10:100)#Compare with above; a named vector, not a list sapply(x, FUN = length)a  b  c   1391 sapply(x, FUN = sum)a    b    c    165005
  28. What is the difference between lapply and sapply?
    Ans.)
    If the programmers want the output to be a data frame or a vector, then sapply function is used whereas if a programmer wants the output to be a list then lapply is used
  29. How to apply mean function in R?
    Ans.)
    Mean is calculated by taking the sum of numbers and dividing it with the total number of elements. The function mean() is used to apply this in R.
    Syntax
    mean(x, trim=0,na.rm=FALSE) The mean() function have 3 arguments
    a.) x contains the vector on which mean is to be applied
    b.) trim = 0, It is used to drop some observations from each end of the sorted array.
    c.) na.rm is used to remove the missing values from the input vector If there are missing values in the vector then mean will return NA as a result, so in order to drop the missing values to get a mean, you should put na.rm=TRUE which means remove the missing values.
  30. How to make scatterplot in R?
    Ans.)
    Scatterplot is a graph which shows many points plotted in the Cartesian plane. Each point holds 2 values which are present on the x and y axis. The simple scatterplot is plotted using plot() function.
    The syntax for scatterplot is:-

    plot(x, y ,main, xlab, ylab, xlim, ylim, axes)

    Where
    x is the data set whose values are the horizontal coordinates
    y is the data set whose values are the vertical coordinates
    main is the tile in the graph
    xlab and ylab is the label in the horizontal and vertical axis
    xlim and ylim are the limits of values of x and y used in the plotting
    axes indicates whether both axis should be there on the plot plot(x =input$wt,y=input$mpg,xlab=”Weight”,ylab=”Milage”,xlim= c(2.5,5),ylim= c(15,30),main=”Weight vsMilage”)
  31. Bonus Question
    How to write a countdown function in R?
    Ans.)

    timer<- function(time){print(time)while(time!=0)  {Sys.sleep(1)time<- time – 1print(time)  }}countdown(5)
    [1] 5
    [2] 4
    [3] 3
    [4] 2
    [5] 1
  32. Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[1]?
    Ans.)

    Error
  33. Vector v is c(1,2,3,4) and list x is list(5:8), what is the output of v*x[[1]]?
    Ans.)

    [1] 5 12 21 32
  34. What are some of the functions that R have?
    Ans.)
    The functions present in R are:-
    a. Mean
    b. Median
    c. Distribution
    d. Covariance
    e. Regression
    f. GAM
    g. GLM
    h. Non-linear
    i. Mixed Effects etc
TheDataMonk has compiled the top 100 R questions which you must prepare before any R interview focusing on a Data Science role. You can buy it on Amazon. Links below
i. Amazon
ii. Amazon India

Or you can mail me at nitinkamal132@gmail.com and get your copy free 🙂