基于乳腺癌数据集建立支持向量机分类器

  • 发布日期:2019-10-22
  • 难度:一般
  • 类别:分类与预测、支持向量机
  • 标签:Python、scikit-learn、支持向量机、乳腺癌数据集

1. 问题描述

对于真实数据来说,往往先对所有特征进行训练及预测,再通过降维处理之后进行可视化绘图展示。下面使用SVC分类模型对乳腺癌数据集进行代码实现。

2. 程序实现

In [1]:
#导入数据集
from sklearn.datasets import load_breast_cancer
cancer=load_breast_cancer()
#划分为训练集和测试集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.3, random_state=42)
#建立svc模型
from sklearn import svm
clf=svm.SVC()
clf.fit(X_train,y_train)
#效果评估
train_score=clf.score(X_train, y_train)
test_score=clf.score(X_test,y_test)
print(train_score)
print(test_score)
1.0
0.631578947368
In [2]:
result=clf.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test,result))
precision    recall  f1-score   support

          0       0.00      0.00      0.00        63
          1       0.63      1.00      0.77       108

avg / total       0.40      0.63      0.49       171

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
In [3]:
#对预测结果进行降维及可视化
#降维
from sklearn.decomposition import PCA
pca=PCA(n_components=2)
newData=pca.fit_transform(X_test)
#绘图可视化
import matplotlib.pyplot as plt
type1_x = []   
type1_y = []
type2_x = []
type2_y = []
for i in range(len(y_test)):        
    if y_test[i] == 0:              
        type1_x.append(newData[i,0]) 
        type1_y.append(newData[i,1])
    if y_test[i] == 1:
        type2_x.append(newData[i,0]) 
        type2_y.append(newData[i,1])
plt.figure()
plt.scatter(type1_x,type1_y,c='#000080',label='benign')
plt.scatter(type2_x,type2_y,c='#FF0000',label='malignant')
plt.legend()
plt.show()