多维数据可视化

  • 发布日期:2019-11-27
  • 难度:简单
  • 类别:数据预处理、数据可视化
  • 标签:Python、数据可视化、Andrews曲线法、平行坐标法、主成分分析、多维尺度分析

1. 问题描述

基于iris鸢尾花数据集作为多维数据可视化的Python实现案例,可视化方法包括Andrews曲线法、平行坐标法、主成分分析、多维尺度分析四种。

In [7]:
import pandas as pd
import matplotlib.pyplot as plt
#读取数据
data = pd.read_csv('iris.csv')
print(data.head())
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa
In [12]:
from pandas.plotting import andrews_curves
andrews_curves(data,'Name')
plt.show()
In [13]:
from pandas.plotting import parallel_coordinates
parallel_coordinates(data,'Name')
plt.show()
In [14]:
from sklearn import decomposition
#主成分分析法,为实现可视化效果,维度个数为2
PCA = decomposition.PCA(n_components=2)
X = PCA.fit_transform(data.ix[:,:-1].values)
#作图
pos=pd.DataFrame()
pos['X'] =X[:, 0]
pos['Y'] =X[:, 1]
pos['Name'] = data['Name']
ax = pos.ix[pos['Name']=='Iris-virginica'].plot(kind='scatter', x='X', y='Y', color='blue', label='virginica')
ax = pos.ix[pos['Name']=='Iris-setosa'].plot(kind='scatter', x='X', y='Y', color='green', label='setosa', ax=ax)
pos.ix[pos['Name']=='Iris-versicolor'].plot(kind='scatter', x='X', y='Y', color='red', label='versicolor', ax=ax)
plt.show()
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:4: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.
C:\app\anaconda\lib\site-packages\pandas\core\indexing.py:822: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:10: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  # Remove the CWD from sys.path while we load stuff.
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:11: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  # This is added back by InteractiveShellApp.init_path()
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:12: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  if sys.path[0] == '':
In [15]:
from sklearn import manifold
from sklearn.metrics import euclidean_distances
similarities = euclidean_distances(data.ix[:,:-1].values)
#实施多维尺度,为实现可视化效果,维度个数为2
mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9, dissimilarity="precomputed",n_jobs=1)
X = mds.fit(similarities).embedding_
#作图
pos=pd.DataFrame(X, columns=['X', 'Y'])
pos['Name'] = data['Name']
ax = pos.ix[pos['Name']=='Iris-virginica'].plot(kind='scatter', x='X', y='Y', color='blue', label='virginica')
ax = pos.ix[pos['Name']=='Iris-setosa'].plot(kind='scatter', x='X', y='Y', color='green', label='setosa', ax=ax)
pos.ix[pos['Name']=='Iris-versicolor'].plot(kind='scatter', x='X', y='Y', color='red', label='versicolor', ax=ax)
plt.show()
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:3: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until
C:\app\anaconda\lib\site-packages\pandas\core\indexing.py:822: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:10: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  # Remove the CWD from sys.path while we load stuff.
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:11: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  # This is added back by InteractiveShellApp.init_path()
C:\app\anaconda\lib\site-packages\ipykernel_launcher.py:12: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  if sys.path[0] == '':
In [ ]: