# KeepNotes blog

Stay hungry, Stay Foolish.

0%

• 避免由于`train_test_split`造成的不恰当的随机划分，而交叉验证则可以将每个样本会都在测试集出现一次
• 通过多次划分，提供我们模型对于训练集的敏感性信息（最坏/最好情况下的表现）
• 数据的利用率更高（根据需求增加训练集的样本量）

``````import numpy as np
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import cross_val_score

clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, iris.data, iris.target, cv=5)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
>>Accuracy: 0.98 (+/- 0.03)``````

``````from sklearn.model_selection import cross_validate
scoring = ['precision_macro', 'recall_macro']
scores = cross_validate(clf, iris.data, iris.target, scoring=scoring, cv=5)
sorted(scores.keys())
print(scores)``````

#### 分层K折交叉验证

``skf = StratifiedKFold(n_splits=3)``

``````# 如2折，打乱
kf = KFold(n_splits=2, shuffle=True, random_state=12345)``````

``rkf = RepeatedKFold(n_splits=2, n_repeats=2, random_state=12345)``

#### 留一法交叉验证（leave-one-out）

``````loo = LeaveOneOut()
scores = cross_val_score(clf, iris.data, iris.target, cv=loo)``````

``lpo = LeavePOut(p=2)``

#### 打乱划分交叉验证（shuffle-split）

``ss = ShuffleSplit(n_splits=5, train_size=0.5, test_size=0.5, random_state=0)``

``sss = StratifiedShuffleSplit(n_splits=5, train_size=0.5, test_size=0.5, random_state=0)``

#### 分组交叉验证

• GroupKFold
• LeaveOneGroupOut
• LeavePGroupsOut
• GroupShuffleSplit

#### 时间序列分割

``tscv = TimeSeriesSplit(n_splits=5)``

Cross-validation: evaluating estimator performance
《Python机器学习基础教程》