I've computed a minimum spanning tree from a distance matrix, using NetworkX. I want now to build a dendrogram from it.
My MST :
I've tried using the adjacency matrix (using NetworkX's to_pandas_adjacency)
(T is my MST)
df = nx.to_pandas_adjacency(T)
from scipy.spatial.distance import squareform
dist_array = squareform(df) #https://stackoverflow.com/questions/18952587/use-distance-matrix-in-scipy-cluster-hierarchy-linkage
plt.figure(figsize=(10,10))
mergings = linkage(dist_array, method='complete', metric='euclidean')
dendrogram(mergings, labels = distances.index, leaf_rotation=90, leaf_font_size=14)
plt.show()
Now, as the adjacency matrix is filled with 0's for non-edges, I guess linkage compute Euclidean distance and end up with a 3 clusters dendrogram (where all the cluster's points are at 0 distance), while I'm expecting to get the same linkage as in my original MST !
I tried using inf or large value for the nonedge default value to to_pandas_adjacency, but end up with invalid matrix...
Help anyone ? My best guess is that I'm not understanding and using linkage as I should...
EditI know, doing it the other way around (DT and then build the MST) might probably be easier, but I need to reproduce the order of operations in order to recreate the results of an original study...
Edit 2Since the scipy's linkage function compute Euclidean distance between each point (or node here), I guess (but without any certainty) I need to find a way to convert my adjacency matrix to an array similar to what's linkage function output, ie weighted edge list, but with sub clusters size as fourth column.
1条答案
按热度按时间l2osamch1#
我有一个类似的问题,并试图找到一个解决方案。这是我第一次张贴的答案,所以请让我知道,如果有任何问题。
我建议您直接从
scipy.cluster.hierarchy
使用linkage
和dendrogram
,而不是使用networkx包。首先,通过
scipy.spatial.distance.squareform
将距离矩阵转换为压缩距离矩阵,然后使用scipy.cluster.hierarchy.linkage
获得聚类。在
linkage
中可以使用不同的距离函数最后,使用
dendrogram
绘制聚类图。结果应与networkx中的最小生成树一致。