matplotlib 如何在Python中显示PCA后的散点图?

lvmkulzt  于 2023-10-24  发布在  Python
关注(0)|答案(2)|浏览(153)

我做了一个自己的随机数据,它包含一个18行5列的文本文件,所有的条目都是整数。
我成功地做了PCA,但现在卡住了。我无法做散点图。下面是我的代码:

f=open(r'<path>mydata.txt')
print(f.read()) #reading from a file

with open(r'<path>mydata.txt') as f:
emp= []
for line in f:
    line = line.split() 
    if line:            
        line = [int(i) for i in line]
        emp.append(line)

from sklearn.decomposition import PCA
import pylab as pl
from itertools import cycle
X = emp
pca = PCA(n_components=3, whiten=True).fit(X)
X_pca = pca.transform(X) #regular PCA

现在,PCA已经完成,方差已知,我该如何绘图?
下面是我的数据集中的样本数据:

2    1    2    3    0
2    3    2    3    0
1    3    1    1    0
1    5    2    1    0
2    3    1    1    0
3    3    0    1    0
7    1    1    1    1
7    2    2    1    1
1    1    1    4    1
3    2    3    2    1
2    2    2    2    1
1    3    2    3    1
2    3    2    1    2
2    2    1    1    2
7    5    3    2    2
3    4    2    4    2
2    1    1    1    2
7    1    3    3    2
pvabu6sv

pvabu6sv1#

这就是你想要的吗?

import numpy as np
from matplotlib import pyplot as plt

data1 = [np.random.normal(0,0.1, 10), np.random.normal(0,0.1,10)]
data2 = [np.random.normal(1,0.2, 10), np.random.normal(2,0.3,10)]
data3 = [np.random.normal(-2,0.1, 10), np.random.normal(1,0.5,10)]

plt.scatter(data1[0],data1[1])
plt.scatter(data2[0],data2[1])
plt.scatter(data3[0],data3[1])

plt.show()

三个不同数据集的结果如下所示:

编辑

希望我现在能更好地理解你的问题。下面是新代码:

import numpy as np
from matplotlib import pyplot as plt    

with open(r'mydata.txt') as f:
    emp= []
    for line in f:
        line = line.split() 
        if line:            
            line = [int(i) for i in line]
            emp.append(line)

from sklearn.decomposition import PCA
import pylab as pl
from itertools import cycle
X = emp
pca = PCA(n_components=3, whiten=True).fit(X)
X_pca = pca.transform(X) #regular PCA

jobs = ['A', 'B', 'C']
job_id = np.array([e[4] for e in emp])

fig, axes = plt.subplots(3,3, figsize=(5,5))

for row in range(axes.shape[0]):
    for col in range(axes.shape[1]):
        ax = axes[row,col]
        if row == col:
            ax.tick_params(
                axis='both',which='both',
                bottom='off',top='off',
                labelbottom='off',
                left='off',right='off',
                labelleft='off'
            )
            ax.text(0.5,0.5,jobs[row],horizontalalignment='center')
        else:
            ax.scatter(X_pca[:,row][job_id==0],X_pca[:,col][job_id==0],c='r')
            ax.scatter(X_pca[:,row][job_id==1],X_pca[:,col][job_id==1],c='g')
            ax.scatter(X_pca[:,row][job_id==2],X_pca[:,col][job_id==2],c='b')
fig.tight_layout()
plt.show()

我将作业分别命名为'A', 'B', and 'C',id为0, 1, and 2。从emp的最后一行,我创建了一个numpy数组来保存这些索引。在关键的绘图命令中,我通过作业id屏蔽了数据。希望这能有所帮助。
结果图如下所示:

编辑2

如果你只想要一个图,比如说,X_pca的第一列和第二列相互关联,代码变得简单得多:

import numpy as np
from matplotlib import pyplot as plt

with open(r'mydata.txt') as f:
    emp= []
    for line in f:
        line = line.split() 
        if line:            
            line = [int(i) for i in line]
            emp.append(line)

from sklearn.decomposition import PCA
import pylab as pl
from itertools import cycle
X = emp
pca = PCA(n_components=3, whiten=True).fit(X)
X_pca = pca.transform(X) #regular PCA

jobs = ['A', 'B', 'C']
job_id = np.array([e[4] for e in emp])

row = 0
col = 1

plt.scatter(X_pca[:,row][job_id==0],X_pca[:,col][job_id==0],c='r')
plt.scatter(X_pca[:,row][job_id==1],X_pca[:,col][job_id==1],c='g')
plt.scatter(X_pca[:,row][job_id==2],X_pca[:,col][job_id==2],c='b')

plt.show()

结果如下:

我强烈建议您阅读这些示例中使用的函数的文档。

4ktjp1zp

4ktjp1zp2#

根据你的评论,你想得到这个(https://i.stack.imgur.com/VsicE.jpg),这里是如何使用sklearn库:
在这个例子中,我使用虹膜数据:

第1部分:仅绘制散点图

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
from numpy import linalg as LA
import pandas as pd
from scipy import stats

iris = datasets.load_iris()
X = iris.data
y = iris.target
#In general a good idea is to scale the data
X = stats.zscore(X)

pca = PCA()
x_new = pca.fit_transform(X)

plt.scatter(x_new[:,0], x_new[:,1], c = y)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()

结果1

第2部分:如果你想绘制著名的双标图

#Create the biplot function
def biplot(score,coeff,labels=None):
    xs = score[:,0]
    ys = score[:,1]
    n = coeff.shape[0]
    scalex = 1.0/(xs.max() - xs.min())
    scaley = 1.0/(ys.max() - ys.min())
    plt.scatter(xs * scalex,ys * scaley, c = y)
    for i in range(n):
        plt.arrow(0, 0, coeff[i,0], coeff[i,1],color = 'r',alpha = 0.5)
        if labels is None:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, "Var"+str(i+1), color = 'g', ha = 'center', va = 'center')
        else:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, labels[i], color = 'g', ha = 'center', va = 'center')
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.xlabel("PC{}".format(1))
plt.ylabel("PC{}".format(2))
plt.grid()

#Call the function. Use only the 2 PCs.
biplot(x_new[:,0:2],np.transpose(pca.components_[0:2, :]))
plt.show()

结果2

相关问题