scikit-learn中的一个MiniBatchKMeans聚类方法实例

  • 发布日期:2019-10-25
  • 难度:中等
  • 类别:聚类分析、k-means聚类方法应用案例
  • 标签:Python、sklearn.cluster.MiniBatchKMeans

1. 问题描述

如下程序是Mini-Batch-KMeans算法的一个实例。应用的数据集为sklearn中的make_blobs的2000个数据点,在实际聚类中,每批使用200个数据点,最佳聚类数目为四类。程序分别实现了将数据点聚为两类的情况、聚为三类的情况、聚为四类的情况、聚为五类的情况,并用Caliski-Harabasz系数进行聚类效果的评估。

2. 程序实现

In [4]:
#导入必要的包。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import MiniBatchKMeans
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
# X为样本特征,Y为样本簇类别, 共2000个样本,每个样本2个特征,共4个簇,
#簇中心在[-2,-2], [0,0],[1,1], [3,3], 簇方差分别为[0.4, 0.2, 0.2,0.2]
X, y = make_blobs(n_samples=2000, n_features=2, centers=[[-2,-2],
[0,0], [1,1], [3,3]], cluster_std=[0.4, 0.2, 0.2, 0.2],
random_state =9)
plt.scatter(X[:, 0], X[:, 1], marker='o')
#分别将数据点聚为2-5类,每批数据量设置为200,并用CH系数评估聚类效果。
for index, k in enumerate((2,3,4,5)):
    plt.subplot(2,2,index+1)
    y_pred = MiniBatchKMeans(n_clusters=k, batch_size = 200,random_state=9).fit_predict(X)
    score= metrics.calinski_harabaz_score(X,y_pred)
    plt.scatter(X[:, 0], X[:, 1], c=y_pred)
    plt.text(.99, .01, ('k=%d, score: %.2f' % (k,score)),transform=plt.gca().transAxes, size=10,
horizontalalignment='right')
plt.show()
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/deprecation.py:85: DeprecationWarning: Function calinski_harabaz_score is deprecated; Function 'calinski_harabaz_score' has been renamed to 'calinski_harabasz_score' and will be removed in version 0.23.
  warnings.warn(msg, category=DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/deprecation.py:85: DeprecationWarning: Function calinski_harabaz_score is deprecated; Function 'calinski_harabaz_score' has been renamed to 'calinski_harabasz_score' and will be removed in version 0.23.
  warnings.warn(msg, category=DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/deprecation.py:85: DeprecationWarning: Function calinski_harabaz_score is deprecated; Function 'calinski_harabaz_score' has been renamed to 'calinski_harabasz_score' and will be removed in version 0.23.
  warnings.warn(msg, category=DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/deprecation.py:85: DeprecationWarning: Function calinski_harabaz_score is deprecated; Function 'calinski_harabaz_score' has been renamed to 'calinski_harabasz_score' and will be removed in version 0.23.
  warnings.warn(msg, category=DeprecationWarning)