
  • 发布日期:2019-10-22
  • 难度:一般
  • 类别:分类与预测、分类结果评估
  • 标签:Python、scikit-learn、决策树、k折交叉验证、混淆矩阵、准确率、精确率、召回率、F值、乳腺癌数据集

1. 问题描述


2. 程序实现

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.3, random_state=42)
from sklearn.tree import DecisionTreeClassifier
clf_tree=DecisionTreeClassifier(max_depth=None, min_samples_split=2, random_state=1)
clf_tree = clf_tree.fit(X_train, y_train)
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_pred))
[[ 59   4]
 [  7 101]]
In [4]:
from sklearn.metrics import accuracy_score
print("accuracy of malignant and benign:%s" % (accuracy_score(y_test,y_pred)))
from sklearn.metrics import precision_score
print("precision of malignant:%s" % (precision_score(y_test,y_pred, pos_label=0)))
print("precision of benign:%s" % (precision_score(y_test,y_pred)))
from sklearn.metrics import recall_score
print("recall of malignant:%s" % (recall_score(y_test,y_pred,pos_label=0)))
print("recall of benign:%s" % (recall_score(y_test,y_pred)))
accuracy of malignant and benign:0.93567251462
precision of malignant:0.893939393939
precision of benign:0.961904761905
recall of malignant:0.936507936508
recall of benign:0.935185185185
In [5]:
from sklearn import cross_validation
score=cross_validation.cross_val_score(clf_tree, cancer.data, cancer.target, cv=10,scoring='f1')
[ 0.92957746  0.87671233  0.93150685  0.86956522  0.97142857  0.93150685
  0.91891892  0.95652174  0.93939394  0.92307692]
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)