## Data leakage

Under which situation cross-validation falls in the trap of data leakage?
## Encoding behavior

Why tree based algorithm are less likely to get affected by label encoding, if random forests are deep enough how it can handle categorical variables without one hot encoding?
## Standardization or Scaling ?

How do we decide if standardization is better or scaling of data is better without using cross validation techniques ? Will it be dependent on the algorithm we are using (distance based or not) or we need to dig deeper ...
## accuracy of linear regression model

Can you name a possible method of improving the accuracy of a linear regression model?
## error minimization techniques

How do linear and logistic regression differ in their error minimization techniques?
## linear regression output as probabilities

Why shouldn’t you use linear regression outputs as probabilities?
## Topic Modeling

In the doc-term matrix, passed in LDA topic modeling, columns refer to?
## Find the 4th Highest employee salary from the following table

Find the 4th highest salary from the given employee table
## linear vs logistics regression

Why can't linear regression be used in place of logistics regression for binary classification?
## logistic regression

what is a decision  boundary?
## you have uploaded the dataset in csv format on google spread sheet and spread it publicly .you want to access in python how you can do

you have uploaded the dataset in csv format on google spread sheet and spread it publicly .you want to access in python how you can do this ..
## How to identify given data is structured or unstructured

How do you say given data is structed or unstructured 1.let us assume i have given 20k rows and 3 columns in this first column is review text and second column is rating and third column is sentiment class .. 2.In sentiment ...
## How to convert string into data time value

write a code to convert string into date time value
## what is the precession and recall

What is the precession and recall explain in detail practical scenerio
## Explain Confusion Matrix Machine Learning Alagorithm

Explain Confusion matrix in detail with practical scenerio
## Feature Selection

Two people A and B, train an algorithm on a same set of data. A trains by selecting 10 most important features from the dataset using an Algorithm C and randomly splits the dataset into training and testing dataset. B ...
## Predict who would visit your mall on this month?

A shopping mall owner has data of all its customers who visited in the year 2019 month-wise(Say in January 2019, the data would have all the names who visited the mall at least once). How can you predict how many ...
## Can we use R2-Sqaure to validate our model?

Can we use R2-Sqaure to validate our model? When does the R2-score metric make sense?
## Decision Trees

Why decision trees and their ensembles have such amazing predictive power(And why is it prone to overfit to the dataset)?
## Which test is used for more than two independent population location parameter?

OPTIONS : - f(ANOVA) -Kruskal Wallis - both
## Assume the name and columns of table. There given information about different car names and details of particular car like body_style, average_milage, price etc. Count total cars per company.

df['company'].value_counts()
## Does Python support switch or case statement in Python? Why?

Ans :  Python does not support swith or case statement like other languages.
## Difference Between Linear and Logistic Regression.

Ans : Linear : is supervised learning regression algorithm. Logistic : is also supervised learning but it is a classification algorithm.
## List the total numbers of products of each brand.

Ans :  select count(Product_Brand),Prouduct_Brand from Product_Master group by Prouduct_Brand
## Is logistic regression a regression technique?

Logistic regression is used to solve classification problems. So what is the reason why it is called a regression? What is the link here?
## python for datascience

why do have to use reshape(-1,1) or reshape(1,-1) for a single feature before fitting it to the model in sklearn library?
## pandas

data = {'a':1 , 'b':2 , 'c':3 ,'d':4} import pandas as pd series = pd.Series(data) dummy = series.copy(deep=True) print(dummy) the above code is to make a copy of series what happens when argument deep is set to False?
## logistic regression

Can we apply logistic regression for this given data?
## linear regression

A linear model tends to have a training error of 3% and a testing error of 20% is the model under fitted or overfitted?
## logistic regression

A trained logistic model represents an equation as 1/1+exp(-(c+a0x0+a1x1+a2x2) how many predictors are used to create this model?
## python

re.search(‘^From:’, line) what will the above line of code do?
## Clustering algorithms

How to cluster unsupervised data where all the attributes and its values are categorical?
## python

uh = urllib.request.urlopen(url, context=ctx) find out which library module is necessary to import in the python code to execute the above line of code to open an url.
## SQL

which join is used to join a table with itself Inner join full join self join does the virtual table created occupy space for the operation to joining with itself?
## What is CNN?

This is the simple application of a filter to an input that results in inactivation. Repeated application of the same filter to input results in a map of activations called a feature map, indicating the locations and strength of a ...
## What is the difference between machine learning and deep learning?

Machine Learning | Deep learning Machine Learning is a technique to learn from that data and then apply what has been learned to make an informed decision | The main difference between deep and machine learning is, machine learning models ...
## What is the statistical test for data validation with an example, Chi-square, ANOVA test, Z statics, T statics, F statics, Hypothesis Testing?

Before discussing the different statistical tests, we need to get a clear understanding of what a null hypothesis is. A null hypothesis proposes that has no significant difference exists in the set of a given observation. Null: Two samples' mean is ...
## SQL Query

What is the correct order of writing SQL query from given tags(select, where, group by, having, from, order by)
## SQL Query

ORG             |    counts whitman.edu    17 vt.edu               110 utoronto.ca     1 unicon.net      9 umich.edu     491 ufp.pt             28 uct.ac.za        ...
## What do you understand by Type I and Type II errors?

Type I Error: Type I error (False Positive) is an error where the outcome of a test shows the non-acceptance of a true condition. For example, a cricket match is going on and, when a batsman is not out, the umpire ...
## What is a Confusion Matrix?

Confusion matrix is used to explain a model’s performance and gives the summary of predictions on the classification problems. It assists in identifying the uncertainty between classes. A confusion matrix gives the count of correct and incorrect values and also the ...
## What are Bias and Variance?

Bias is the difference between the average prediction of our model and the correct value. If the bias value is high, then the prediction of the model is not accurate. Variance is the number that gives the difference of prediction over a training ...
## Differentiate between classification and regression in Machine Learning.

In Machine Learning, there are various types of prediction problems based on supervised and unsupervised learning. These are classification, regression, clustering, and association. Here, we will discuss about classification and regression. Classification: In classification, we try to create a Machine Learning model ...
## what are the types of Machine Learning?

So basically there are 3 types of techniques: Supervised Learning: In this type of the Machine Learning technique, machines learn under the supervision of labeled data. Unsupervised Learning: Unlike supervised learning, it has unlabeled data. So, there is no supervision under which it works ...
## All things you need to know about Tensorflow.

TensorFlow: TensorFlow is an open-source software library released in 2015 by Google to make it easier for the developers to design, build, and train deep learning models. TensorFlow is originated as an internal library that the Google developers used to ...
## What will be the output of Following SQL Query

What will be the output of following sql query SELECT     name,         last_name,         salary FROM Salary WHERE salary >    (SELECT AVG (salary)                     FROM Salary);
## What are the various aspects of a Machine Learning process?

Here we will discuss the components involved in solving a problem using machine learning. 1. Domain knowledge This is the first step wherein we need to understand how to extract the various features from the data and learn more about the ...
