How do you do n-fold cross validation in Python?

Question

Write a self explanatory code

in progress 0
TheDataMonk 55 years 4 Answers 707 views Grand Master 0

Answers ( 4 )

  1. Let df be the original datset.

    #Shuffle the dataset
    df.sample(frac =1)
    y = int(len(df)/k)
    print(y)
    for i in range(0,y):
    test = df[k*i:k*(i+1)]

    train = pd.concat([df[:k*i],df[k*(i+1):]])

  2. 1.Randomly split your entire dataset into n”folds”
    2.For each k-fold in your dataset, build your model on n – 1 folds of the dataset. Then, test the model to check the effectiveness for nth fold
    3.Record the error you see on each of the predictions
    4.Repeat this until each of the n-folds has served as the test set
    5.The average of your n recorded errors is called the cross-validation error and will serve as your performance metric for the model

  3. 1.Randomly split your entire dataset into n ”folds”
    2.For each n-fold in your dataset, build your model on n – 1 folds of the dataset. …
    3.Record the error you see on each of the predictions.
    4.Repeat this until each of the n-folds has served as the test set.

    #Implementing cross validation

    n= 5
    kf = KFold(n_splits=n, random_state=None)
    model = LogisticRegression(solver= ‘liblinear’)

    acc_score = []

    for train_index , test_index in kf.split(X):
    X_train , X_test = X.iloc[train_index,:],X.iloc[test_index,:]
    y_train , y_test = y[train_index] , y[test_index]

    model.fit(X_train,y_train)
    pred_values = model.predict(X_test)

    acc = accuracy_score(pred_values , y_test)
    acc_score.append(acc)

    avg_acc_score = sum(acc_score)/k
    print(‘accuracy of each fold -{}’.format(acc_score))
    print(‘Avg accuracy : {}’.format(avg_acc_score))

  4. K-fold cross-validation will involve the partition of the dataset into a training and validation set. It is basically used when the sample data we have is not large enough to split it into three parts.

    The code for K-fold is shown below.

    train, validation = train_test_split(data, test_size=0.40, random_state=100)
    from sklearn.model_selection import KFold
    kf=RepeatedKFold(n_splits=5, n_repeats=10, random_state=None)
    for train_index, test_index in kf.split(X):
    print(“Train:”,train_index,”Validation:”, test_index)
    X_train, X_test=X[train_index], X[test_index]
    Y_train, Y_test=Y[train_index], Y[test_index]

Leave an answer

Browse
Browse