How do you do n-fold cross validation in Python?

Question

How do you do n-fold cross validation in Python?

Question

Write a self explanatory code

in progress 0

Python TheDataMonk 3 years 4 Answers 570 views Grand Master 0

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me

Answers ( 4 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

Shivani agarwal Newbie · Answer 1 · May 28, 2021

Shivani agarwal Newbie

0

May 28, 2021 at 12:15 pm

Reply

Let df be the original datset.

#Shuffle the dataset
df.sample(frac =1)
y = int(len(df)/k)
print(y)
for i in range(0,y):
test = df[k*i:k*(i+1)]

train = pd.concat([df[:k*i],df[k*(i+1):]])

Suvam · Answer 2 · May 28, 2021

1.Randomly split your entire dataset into n”folds”
2.For each k-fold in your dataset, build your model on n – 1 folds of the dataset. Then, test the model to check the effectiveness for nth fold
3.Record the error you see on each of the predictions
4.Repeat this until each of the n-folds has served as the test set
5.The average of your n recorded errors is called the cross-validation error and will serve as your performance metric for the model

khushnuma Newbie · Answer 3 · May 29, 2021

1.Randomly split your entire dataset into n ”folds”
2.For each n-fold in your dataset, build your model on n – 1 folds of the dataset. …
3.Record the error you see on each of the predictions.
4.Repeat this until each of the n-folds has served as the test set.

#Implementing cross validation

n= 5
kf = KFold(n_splits=n, random_state=None)
model = LogisticRegression(solver= ‘liblinear’)

acc_score = []

for train_index , test_index in kf.split(X):
X_train , X_test = X.iloc[train_index,:],X.iloc[test_index,:]
y_train , y_test = y[train_index] , y[test_index]

model.fit(X_train,y_train)
pred_values = model.predict(X_test)

acc = accuracy_score(pred_values , y_test)
acc_score.append(acc)

avg_acc_score = sum(acc_score)/k
print(‘accuracy of each fold -{}’.format(acc_score))
print(‘Avg accuracy : {}’.format(avg_acc_score))

dhingra.13 Member · Answer 4 · May 30, 2021

K-fold cross-validation will involve the partition of the dataset into a training and validation set. It is basically used when the sample data we have is not large enough to split it into three parts.

The code for K-fold is shown below.

train, validation = train_test_split(data, test_size=0.40, random_state=100)
from sklearn.model_selection import KFold
kf=RepeatedKFold(n_splits=5, n_repeats=10, random_state=None)
for train_index, test_index in kf.split(X):
print(“Train:”,train_index,”Validation:”, test_index)
X_train, X_test=X[train_index], X[test_index]
Y_train, Y_test=Y[train_index], Y[test_index]

Register Now

Login

Lost Password

Login

Register Now

How do you do n-fold cross validation in Python?

Top Categories