scipy中的hierarchy聚类方法参数说明

  • 发布日期:2019-10-25
  • 难度:简单
  • 类别:聚类分析、scipy.cluster.hierarchy参数说明
  • 标签:Python、sklearn.cluster.hierarchy

1. 问题描述

基于Python第三方库scipy,对scipy.cluster.hierarchy的参数进行说明。

2. 程序实现

In [2]:
#导入层次聚类算法的语句。
from scipy.cluster import hierarchy
import pandas as pd
#调用该方法的一个具体实例。
df = pd.read_excel("drink.xlsx")
Z = hierarchy.linkage(df, method ='ward',metric='euclidean')
hierarchy.dendrogram(Z,labels = df.index)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-7a416d2f15c3> in <module>
      3 import pandas as pd
      4 #调用该方法的一个具体实例。
----> 5 df = pd.read_excel("drink.xlsx")
      6 Z = hierarchy.linkage(df, method ='ward',metric='euclidean')
      7 hierarchy.dendrogram(Z,labels = df.index)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    176                 else:
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper
    180     return _deprecate_kwarg

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    176                 else:
    177                     kwargs[new_arg_name] = new_arg_value
--> 178             return func(*args, **kwargs)
    179         return wrapper
    180     return _deprecate_kwarg

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/excel.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, **kwds)
    305 
    306     if not isinstance(io, ExcelFile):
--> 307         io = ExcelFile(io, engine=engine)
    308 
    309     return io.parse(

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/excel.py in __init__(self, io, **kwds)
    392             self.book = xlrd.open_workbook(file_contents=data)
    393         elif isinstance(self._io, compat.string_types):
--> 394             self.book = xlrd.open_workbook(self._io)
    395         else:
    396             raise ValueError('Must explicitly set engine if not passing in'

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    109     else:
    110         filename = os.path.expanduser(filename)
--> 111         with open(filename, "rb") as f:
    112             peek = f.read(peeksz)
    113     if peek == b"PK\x03\x04": # a ZIP file

FileNotFoundError: [Errno 2] No such file or directory: 'drink.xlsx'

hierarchy算法的主要参数包括:

  • Method:计算生成的类团u与数据点之间的距离。具体选择包括:single、complete、average、weighted、centroid、median和ward,区别在于类团与类团之间的距离计算方式不同。
  • metric: 距离的度量方式,通常选择eucliean。