Most asked Big Data Questions

Welcome to the Most asked Big Data Questions. Today we will cover some basic topics on SQL, Python, Machine learning and Case Study. Let’s get started with the Most asked Big Data Questions.

1) What is Scope Resolution in Python?

Scope resolution in Python determines where a variable is looked up when referenced in a function. Python follows the LEGB rule for variable resolution:

Local (L): Variables defined inside a function.
Enclosing (E): Variables in an outer function (for nested functions).
Global (G): Variables defined at the module level.
Built-in (B): Predefined names like print() or len().

Example:

x = "Global Scope"
def outer():
    x = "Enclosing Scope"
    def inner():
        x = "Local Scope"
        print(x)  # Outputs: Local Scope
    inner()
    print(x)  # Outputs: Enclosing Scope
outer()
print(x)  # Outputs: Global Scope

2) What is the Output of the Below Program?

def abc():
    print("The")
    return "Data"
    return "Monk"
print(abc())

Output:

The
Data

Explanation:

The function abc() first prints “The”.
It then returns “Data”.
The second return statement (return “Monk”) is never executed because once a return statement runs, the function exits.

3) Explain the Working of Map and HashMap in Python

Map in Python

The map() function applies a function to each element in an iterable without using loops.

Example using map() with lambda:

numbers = [1, 2, 3, 4]

squared = list(map(lambda x: x ** 2, numbers))

print(squared) # Output: [1, 4, 9, 16]

map(lambda x: x ** 2, numbers) applies square function to each element.
Faster than using a for loop.

What is a HashMap?

A HashMap (or dictionary in Python) stores key-value pairs and provides O(1) average-time complexity for lookups.

Example of HashMap (Python Dictionary):

student_grades = {“Alice”: 85, “Bob”: 90, “Charlie”: 78}
print(student_grades[“Bob”]) # Output: 90

Uses hashing to store and retrieve values efficiently.
Keys must be unique, and values can be anything.

4) How Do You Create a Sample Data of 1000 Rows From a Population of 1 Million Rows and 100 Columns?

In Python (Using Pandas & NumPy):

import pandas as pd
import numpy as np

# Create a dataset of 1 million rows and 100 columns

data = pd.DataFrame(np.random.rand(1000000, 100), columns=[f"Col_{i}" for i in range(100)])

# Randomly sample 1000 rows

sampled_data = data.sample(n=1000, random_state=42)

print(sampled_data.shape)  # Output: (1000, 100)

Uses random sampling with sample(n=1000).
random_state=42 ensures reproducibility.

In SQL (Using ORDER BY RAND()):

SELECT * FROM my_table
ORDER BY RAND()
LIMIT 1000;

Randomly selects 1000 rows from a dataset in SQL.

May be slow on large tables; consider stratified sampling if needed.

🚀 Get The Data Monk 23 eBook Bundle covering everything from ML to SQL. Your all-in-one prep for cracking any interview! -> The Data Monk 23 e-book bundle 📚

The Data Monk services

We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now

YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions
Link – The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study
Link – The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out
Link – The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms
Link – The Data Monk Instagram page
Mock Interviews/Career Guidance/Mentorship/Resume Making
Book a slot on Top Mate

For any information related to courses or e-books, please send an email to nitinkamal132@gmail.com