Company: Factspan
Designation: Data Scientist / Data Analyst
Year of Experience Required: 0 to 4 years
Technical Expertise: SQL, Python/R, Statistics, Machine Learning, Case Studies
Salary Range: 12 LPA – 25 LPA
Factspan is a pure-play analytics company that partners with organizations to build analytics centers of excellence. By generating insights and solutions from data, Factspan helps businesses solve challenges, make strategic recommendations, and implement processes for success. If you’re preparing for a Data Science or Data Analyst role at Factspan, here’s a detailed breakdown of their interview process and the types of questions you can expect.
Factspan interview questions

Interview Process
The Factspan interview process typically consists of 5 rounds, each designed to evaluate different aspects of your technical and analytical skills:
Round 1 – Telephonic Screening
Focus: Basic understanding of Data Science concepts, SQL, and Python/R.
Format: You’ll be asked to explain your projects and solve a few coding or SQL problems.
Round 2 – Walk-in/Face-to-Face Technical Round
Focus: Advanced SQL, coding, and problem-solving.
Format: You’ll solve problems on a whiteboard or shared document.
Round 3 – Project Analysis
Focus: Deep dive into your past projects.
Format: You’ll be asked to explain your approach, tools used, and the impact of your work.
Round 4 – Case Studies
Focus: Business problem-solving and data-driven decision-making.
Format: You’ll be given a real-world scenario and asked to propose solutions.
Round 5 – Hiring Manager Round
Focus: Cultural fit, communication skills, and long-term career goals.
Format: Behavioral questions and high-level discussions about your experience.
Difficulty of Questions
SQL – 8/10
1) How do you retrieve the top 3 most frequently ordered products?
SELECT product_id, COUNT(*) AS order_count
FROM order_details
GROUP BY product_id
ORDER BY order_count DESC
LIMIT 3;
2) How do you find customers who placed orders on two consecutive days?
SELECT customer_id, order_id, order_date, prev_order_date
FROM (
SELECT customer_id, order_id, order_date,
LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_date
FROM orders
) AS order_comparison
WHERE DATEDIFF(order_date, prev_order_date) = 1;
3) How can you find products where total ordered quantity exceeds available stock?
SELECT p.product_id, p.name, p.stock, SUM(od.quantity) AS total_ordered
FROM products p
JOIN order_details od ON p.product_id = od.product_id
GROUP BY p.product_id, p.stock
HAVING total_ordered > p.stock;
4) How do you find employees who have been working for the longest time in a company?
SELECT employee_id, name, joining_date
FROM employees
ORDER BY joining_date ASC
LIMIT 1;
5) How do you find customers who have ordered at least one product from every product category?
SELECT customer_id
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY customer_id
HAVING COUNT(DISTINCT p.category_id) = (SELECT COUNT(DISTINCT category_id) FROM products);
🚀 Master MySQL Interviews! Get expert answers to all MySQL interview questions in one power-packed eBook. – 550 SQL Interview Questions to crack Any Analytics Interview.
R/Python – 7/10
1) Write a Python function to reverse the order of words in a sentence without reversing the characters in each word.

2) Write a Python function to find the missing number in a list of numbers from 1 to n.

3) Write a Python function to find and remove duplicate rows from a Pandas DataFrame.

4) Write a Python function to convert a column in a Pandas DataFrame to a categorical type and display the categories.

5) Write a Python function to find common elements between two lists without using set operations.

🚀 Become a Full Stack Analytics Pro! Get the ultimate guide to mastering analytics and landing your dream job. Grab your copy now! -> 2200 Most Asked Analytics Interview Questions
Statistics/ML
1) Given that you have WiFi data in your office, how would you determine which rooms and areas are underutilized and overutilized?
To analyze WiFi data for office space utilization, follow these steps:
- Collect WiFi Connection Data – Track devices connected to access points, including timestamps and locations.
- Analyze Traffic Patterns – Identify peak usage times and areas with consistently high or low connections.
- Compare with Seating Capacity – Check if rooms are overused (more devices than capacity) or underused (few or no connections).
- Use Heatmaps – Visualize usage trends with heatmaps to spot patterns in space utilization.
- Track Duration of Connections – Rooms with long connection durations indicate prolonged usage, while low durations suggest inefficiency.
This approach helps optimize office space, improving efficiency and reducing congestion.
2) Your linear regression didn’t run and communicates that there is an infinite number of best estimates for the regression coefficients. What could be wrong?
This issue suggests perfect multicollinearity, where one or more independent variables are highly correlated or linearly dependent.
Possible reasons:
- Duplicate Variables – Two or more predictors contain identical or nearly identical values.
- One Variable is a Combination of Others – Example: If X3 = X1 + X2, the model cannot differentiate their effects.
- Not Enough Data Points – If the number of predictors is greater than the number of observations, the model cannot find unique estimates.
Solution:
- Remove Redundant Variables – Use correlation matrices or Variance Inflation Factor (VIF) to detect multicollinearity.
- Use Principal Component Analysis (PCA) – Reduces dimensionality while preserving variance.
- Increase Sample Size – More data points can help stabilize estimates.
3) Now you have a feasible number of predictors, but you’re fairly sure that you don’t need all of them. How would you perform feature selection on the dataset?
To select the most relevant features, use these techniques:
- Filter Methods
- Use Correlation Analysis to remove highly correlated variables.
- Apply Chi-Square Tests (for categorical data) or ANOVA (for numerical data).
- Wrapper Methods
- Recursive Feature Elimination (RFE) removes the least important features step by step.
- Forward or Backward Selection adds/removes predictors based on model performance.
- Embedded Methods
- LASSO Regression automatically eliminates less important variables.
- Decision Tree-based Methods rank feature importance.
By applying these methods, you can improve model accuracy and efficiency.
4) What is the role of trial and error in data analysis? What is the role of making a hypothesis before diving in?
Role of Trial and Error:
- Helps identify unexpected patterns in data.
- Allows iterative testing of models and algorithms.
- Improves model performance by experimenting with different parameters.
Role of Hypothesis Before Analysis:
- Provides a structured approach to analysis.
- Prevents bias by ensuring data-driven decisions.
- Helps in statistical validation through hypothesis testing.
A balanced approach—forming hypotheses first, then refining them through trial and error—leads to more reliable insights.
5) What would be the hazards of letting users sneak a peek at the other bucket in an A/B test?
- Bias in Decision-Making: Users may switch between versions, contaminating results.
- Selection Bias: Users might prefer one version over another, skewing engagement metrics.
- False Positives: Results might suggest an effect that doesn’t exist due to user behavior manipulation.
To prevent this, implement server-side randomization and prevent users from switching versions mid-test.
🚀 Crack Any ML Interview! Get 1,200 Machine Learning Interview Questions in one ultimate eBook. Boost your confidence and ace your next interview! – Machine Learning 1200 Interview Questions
Case Study
Problem Statement:
A large retail chain has approached Factspan for customer segmentation analysis. The company wants to understand customer buying patterns to improve its marketing campaigns and product recommendations.
Your task as a Data Scientist is to analyze transaction data, identify customer segments, and recommend data-driven marketing strategies.
Dataset Overview:
You have access to a dataset containing customer purchase history. The dataset includes:
- Customer_ID – Unique identifier for each customer
- Age_Group – Age category (18-25, 26-35, 36-50, 50+)
- Annual_Income – Estimated annual income of the customer
- Total_Transactions – Number of transactions made in the last year
- Average_Spend_per_Transaction – Average amount spent per purchase
- Product_Category – Most frequently purchased product category
- Discount_Usage (%) – Percentage of purchases made using discounts
- Loyalty_Program_Status – 1 if enrolled in a loyalty program, 0 otherwise
- Online_vs_Offline_Purchase (%) – Percentage of purchases made online vs. in-store
Key Questions to Answer:
1. How can we segment customers based on purchasing behavior?
- Are high-income customers spending more on average per transaction?
- Do loyalty program members have a higher purchase frequency?
- Are discount-driven customers more likely to shop online?
2. What marketing strategies can be used for different customer segments?
- Should Factspan recommend personalized discounts for price-sensitive customers?
- Can targeted promotions increase retention among low-spending customers?
- How can product recommendations be optimized for different age groups?
3. How can Factspan help the retail company increase revenue?
- Can AI-driven customer lifetime value (CLV) predictions help prioritize high-value customers?
- Should the retail chain expand online shopping options based on customer behavior?
- Can the company improve customer retention through better loyalty programs?
Key Insights & Business Recommendations
1. Identifying Customer Segments
- High-Spending Customers Prefer Premium Products: Customers with higher annual incomes and high average transaction values are less price-sensitive and prefer premium brands. SAP should focus on personalized luxury product recommendations for them.
- Loyalty Program Members Have Higher Retention: Customers enrolled in loyalty programs show 35% higher purchase frequency, suggesting the need for expanding rewards and exclusive offers.
- Online-First Shoppers Are Discount-Driven: Customers who shop more than 70% online tend to use discounts frequently, indicating a need for targeted digital promotions.
2. Data-Driven Marketing Strategies
- Personalized Discounts for Price-Sensitive Segments: Customers with low average spend but frequent purchases can be targeted with discount-based promotions to increase order value.
- Product Bundling for Mid-Tier Customers: Offering combo deals and product bundles can increase spending among middle-income customers.
- Omnichannel Engagement for High-Value Customers: High-spending customers should receive personalized emails, early access to sales, and in-store VIP experiences to boost loyalty.
3. Strategic Actions for Revenue Growth
- AI-Based Customer Lifetime Value (CLV) Prediction: Factspan can build a predictive model to prioritize high-value customers for personalized engagement.
- Optimizing Online Shopping Experience: Expanding online features like AI-powered product recommendations and personalized search can increase sales among digital shoppers.
- Loyalty Program Enhancements: Introducing tiered loyalty rewards (Bronze, Silver, Gold) can encourage more sign-ups and long-term engagement.
🚀 Basic, you can practice a lot of case studies and other statistics topics here –
https://thedatamonk.com/data-science-resources/
🚀 Get The Data Monk 23 eBook Bundle covering everything from ML to SQL. Your all-in-one prep for cracking any interview! -> The Data Monk 23 e-book bundle 📚
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
- YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link – The Data Monk Youtube Channel - Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
Link – The Data Monk website - E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
Link – The Data E-shop Page - Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
Link – The Data Monk Instagram page - Mock Interviews/Career Guidance/Mentorship/Resume Making
Book a slot on Top Mate
For any information related to courses or e-books, please send an email to [email protected]