Share
How do you do n-fold cross validation in Python?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Write a self explanatory code
Answers ( 4 )
Let df be the original datset.
#Shuffle the dataset
df.sample(frac =1)
y = int(len(df)/k)
print(y)
for i in range(0,y):
test = df[k*i:k*(i+1)]
train = pd.concat([df[:k*i],df[k*(i+1):]])
1.Randomly split your entire dataset into n”folds”
2.For each k-fold in your dataset, build your model on n – 1 folds of the dataset. Then, test the model to check the effectiveness for nth fold
3.Record the error you see on each of the predictions
4.Repeat this until each of the n-folds has served as the test set
5.The average of your n recorded errors is called the cross-validation error and will serve as your performance metric for the model
1.Randomly split your entire dataset into n ”folds”
2.For each n-fold in your dataset, build your model on n – 1 folds of the dataset. …
3.Record the error you see on each of the predictions.
4.Repeat this until each of the n-folds has served as the test set.
#Implementing cross validation
n= 5
kf = KFold(n_splits=n, random_state=None)
model = LogisticRegression(solver= ‘liblinear’)
acc_score = []
for train_index , test_index in kf.split(X):
X_train , X_test = X.iloc[train_index,:],X.iloc[test_index,:]
y_train , y_test = y[train_index] , y[test_index]
model.fit(X_train,y_train)
pred_values = model.predict(X_test)
acc = accuracy_score(pred_values , y_test)
acc_score.append(acc)
avg_acc_score = sum(acc_score)/k
print(‘accuracy of each fold -{}’.format(acc_score))
print(‘Avg accuracy : {}’.format(avg_acc_score))
K-fold cross-validation will involve the partition of the dataset into a training and validation set. It is basically used when the sample data we have is not large enough to split it into three parts.
The code for K-fold is shown below.
train, validation = train_test_split(data, test_size=0.40, random_state=100)
from sklearn.model_selection import KFold
kf=RepeatedKFold(n_splits=5, n_repeats=10, random_state=None)
for train_index, test_index in kf.split(X):
print(“Train:”,train_index,”Validation:”, test_index)
X_train, X_test=X[train_index], X[test_index]
Y_train, Y_test=Y[train_index], Y[test_index]