Publicis Sapient, formerly Sapient, is a digital transformation partner helping established organizations get to their future, digitally-enabled state by fusing strategy, consulting and customer experience with agile engineering. It was founded in 1990.
Location – Bangalore
Job Title – Business Analyst
Experience required – 1-3 years
Number of Rounds – 4
Round 1 – Telephonic Round
The telephonic interview lasted for ~45 minutes where the questions were mostly on the tools and technologies I have worked on in my previous organization. Slowly, the questions shifted to SQL and statistics. Following are the questions which were asked:-
- What is the output of SELECT NULL+0?
- What are the ranking functions in SQL?
There are mainly 3 types of ranking functions:-
- What is a partition by clause and how is it used?
Partition by clause is used to create to divide the whole data in different parts depending on the column on which it is partitioned. Suppose the data contains 50 rows and have data for 6 States, then if you do a partition by on state, the whole data set will be treated differently on all the 6 partitions. The syntax for partition by in ROW_NUMBER() is given below
SELECT *, ROW_NUMBER() OVER (PARTITION BY State ORDER BY population DESC) AS row_num
So, a new column will be added in the result as row_num and it will give a row number to all the state row starting from 1. Once the rows of a particular state is over, then it will again take up another state and will start the counting from there
- What is A/B Testing?
A/B testing is a form of statistical hypothesis testing with two variants leading to the technical term, two-sample hypothesis testing, used in the field of statistics. In simple words, A/B Testing in web analytics is used to compare the performance of 2 web design to get a better design. Suppose you have 2 designs to display an advertisement on your website, one being a picture and other a text or link. So, you can compare the performance of the two design by A/B Testing.
- What is regression?
Regression is a form of predictive modeling technique to determine the strength of the relationship between a dependent and independent variable. One of these variables is called a predictor variable whose value is gathered through experiments. The other variable is called the response variable whose value is derived from the predictor variable.Y=aX+b – Linear regression (X is predictor variable and Y is response variable)
- Give some example of regression?
Regression is used for forecasting, time series modeling and finding the casual effect relationship between the variables. For example, the relationship between rash driving and the number of road accidents by a driver is best studied through regression.
- What is a multiple regression?
Multiple regression is an extension of linear regression into the relationship between more than two variables. In simple linear relation we have one predictor and one response variable, but in multiple regression, we have more than one predictor variable and one response variable.Y=a1x1+a2x2+..+b
- What is DENSE_RANK() function?
DENSE_RANK() again is a ranking function which is very similar to RANK() function. The only difference is that it does not miss any rank even if there are duplicates in the table.
- Syntax of DENSE_RANK() function
SELECT *, DENSE_RANK() OVER (PARTITION BY Column1 ORDER BY Column2 DESC)
- There was a question on self-join where you have to get the employee name and manager name from a table having 3 columns, EmployeeID, EmployeeName, ManagerID
SELECT e1.Name AS EmployeeName, e2.Name AS ManagerName
FROM Employee AS e1
INNER JOIN Employee AS e2
ON e1.ManagerID = e2.EmplyeeID
There were a few questions on the project you are working on right now
Round 2 – Case Study
The Case Study topic was to recommend two food items to a customer who is new to the restaurant. You can find the complete analysis of this case study and other case studies here
Round 3 – Face to Face Technical Round
This round was mostly about past projects. I had a Natural Language Processing project, so the interview revolved around the same topic. Following questions were asked in this round:-
1. What was the project for?
A. The project was to do sentiment analysis on the survey data filled by online customers.
2. What algorithms/methods did you try?
A. We tried multiple algorithms, starting from TF-IDF, Part-Of-Speech tagging, n-gram, Lemmatization, Stemming, Tokenization, Latent Semantic Indexing, Sentiment Analysis.
3. What all methods do you need to perform in order to convert a keyword into its base form(Normalization)?
A. Lemmatization and Stemming
4. What is N-gram?
A. N-grams are simply all combinations of adjacent words or letters of length n that you can find in your text file.
This is a sentence
N-grams = This is, is a, a sentence
5. What is the use of TF-IDF?
A. TF-IDF stands for Term Frequency and Inverse Document Frequency. TF-IDF is numerical statistics that help to understand the importance of a particular word in a document. Term frequency gets you the number of times a particular word has occurred in a document and Inverse Document Frequency gets you the importance of the words. It helps out in filtering out the most common words like a, an, the, was, etc.. So, you get only the important terms.
6. What is Lemmatization?
Lemmatization takes into account the morphological analysis of the word. It converts a word into its pure root form by looking into the morphological information
studies – Third person, singular number, present tense of verb study
Lemma – study
studying – Gerund of the verb study
Lemma – study
As you can see, both the words studies and studying has been narrowed down to the lemma study. 7.
7. Explain the complete flow of your NLP project
A. The brief of the process is given below with some coding examples:-
Step 1 –
Get the text dataset
Step 2 –
Tokenize the text using get_text() in Python
Step 3 –
Split the text using
tokens = [t for t in text.split()]
Step 4 –
Get the count of the word frequency using the NLTK package in Python
freq = nltk.FreqDist(tokens)
Step 5 –
Remove stop words. Code below
for token in tokens:
if token in stopwords.words(‘english’):
Step 6 –
Tokenize non-English words
Step 7 –
Get synonyms and antonyms using WordNet package from NLTK in Python
Step 8 –
Stemming of words. I used PorterStemmer algorithm
stem_Word = PorterStemmer
Step 9 –
Once we are done with stemming, go for Lemmatization. WordNet package
lemma = WordNetLemmatizer()
Step 10 –
Build a classifier. We can you Logistic Regression to create a baseline model. Later we used Naive Bayes Classification.
There were questions only on the logical part of the process and not on the code implementation. But, it’s always better to infuse coding example wherever you can. The interview lasted for around 1 hour.
Round 4 – Human Resource
Basic questions, like:-
1. Why are you quitting your present job?
2. What are your expectations with the company? and the company’s expectation
3. Salary negotiation
4. Have you ever lead a team?
Salary offered – Best in the industry (5/5)