Most asked Business Analyst Questions

Welcome to the Most asked Business Analyst Questions. Today we will cover some basic topics on SQL, Python, Machine learning and Case Study. Let’s get started with the Most asked Business Analyst Questions.

To check your learnings head to:

1) Create a Linear Regression Model within 20 Minutes in Python

To quickly build a Linear Regression model in Python, follow these steps:

Step 1: Import Required Libraries

We use pandas for data handling, numpy for numerical operations, sklearn for model building, and matplotlib for visualization.

Step 2: Create a Sample Dataset

You can create a dataset manually in MS Excel with 5 independent variables and 1 dependent variable, then import it using pandas. Alternatively, generate synthetic data using Python.

Step 3: Train the Model

Split the data into training and testing sets.
Fit a Linear Regression model using LinearRegression() from sklearn.

Step 4: Evaluate Performance

Calculate R² score and Mean Squared Error (MSE) to check accuracy.
Visualize predictions vs actual values using matplotlib.

This ensures quick model development within 20 minutes!

2) Differentiate Between UNION, MINUS, and INTERSECT in SQL

1. UNION

Combines results from two queries and removes duplicates.
UNION ALL includes duplicates.

2. MINUS (in MySQL, use EXCEPT)

Returns records that exist in the first table but not in the second.

3. INTERSECT

Returns only common records present in both queries.

Example:

Assume two tables: employees_2023 and employees_2024.

-- UNION: Get unique employees from both years
SELECT employee_id FROM employees_2023
UNION
SELECT employee_id FROM employees_2024;

-- MINUS: Get employees who left in 2023 (Not in 2024)
SELECT employee_id FROM employees_2023
MINUS
SELECT employee_id FROM employees_2024;

-- INTERSECT: Get employees who were present in both years
SELECT employee_id FROM employees_2023
INTERSECT
SELECT employee_id FROM employees_2024;

3) What is Scope Resolution in Python?

Scope resolution in Python refers to how variables are looked up in different scopes when accessed inside functions or classes.

Python follows the LEGB rule:

Local Scope (L): Variables defined inside a function.
Enclosing Scope (E): Variables in outer functions (for nested functions).
Global Scope (G): Variables at the module level.
Built-in Scope (B): Predefined names like print(), len().

Example:

x = 10  # Global Variable
def outer():
    x = 20  # Enclosing Variable
    def inner():
        x = 30  # Local Variable
        print(x)  # Outputs 30 (Local Scope)
    inner()
    print(x)  # Outputs 20 (Enclosing Scope)
outer()
print(x)  # Outputs 10 (Global Scope)

4) Define Degree of Freedom with Example

The Degree of Freedom (DOF) refers to the number of independent values that can vary in a dataset without violating constraints.

Example in Statistics:

For a dataset of n observations, if we estimate the mean, only n-1 values can vary freely. The last value is dependent on the mean, reducing DOF by 1.

DOF in Linear Regression:

In Simple Linear Regression with one independent variable, the DOF = n – 2 (one for the intercept and one for slope).
In Multiple Linear Regression with k predictors, the DOF = n – (k + 1).

Why is DOF important?

It affects model accuracy and statistical tests like the t-test and F-test.
Overfitting occurs if the DOF is too low (too many features, not enough data).

5) Can a Model Have High R² but Low Performance?

Yes! A Linear or Logistic Regression model can have high R² but still perform poorly due to:

Overfitting – The model fits training data too well but performs poorly on new data.
Multicollinearity – High correlation between independent variables leads to unstable coefficients.
Biased Data – If data is skewed or not representative, R² can be misleading.
Omitted Variable Bias – Missing important factors can give false high R² values.
Heteroscedasticity – Unequal variance in residuals reduces predictive reliability.

Solution:

Use Adjusted R² instead of R² to account for feature count.
Check metrics like MSE, RMSE, AUC (for classification), or Precision-Recall curves.
Perform cross-validation to ensure generalization.

🚀 Get The Data Monk 23 eBook Bundle covering everything from ML to SQL. Your all-in-one prep for cracking any interview! -> The Data Monk 23 e-book bundle 📚

The Data Monk services

We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now

YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link – The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
Link – The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
Link – The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
Link – The Data Monk Instagram page
Mock Interviews/Career Guidance/Mentorship/Resume Making
Book a slot on Top Mate

For any information related to courses or e-books, please send an email to nitinkamal132@gmail.com