matplotlib Pandas数据框架到网络图

h22fl7wq  于 2023-10-24  发布在  其他
关注(0)|答案(1)|浏览(137)

我有一些超过100行的CSV文件,比如这个表,


我试图创建一个从A到B、C和D的图形,并使用权重边距离。

我现在使用pyvis,但问题是边长都一样。

import pandas as pd
from IPython.display import display, HTML
from pyvis.network import Network 

df = pd.read_csv('sample_data.csv')
got_net = Network(notebook=True, cdn_resources='in_line')

sources = df['from']
targets = df['to']
weights = df['weight']
edge_data = zip(sources, targets, weights)

for e in edge_data:
    src = e[0]
    dst = e[1]
    w = e[2]

    got_net.add_node(src, src, title=src)
    got_net.add_node(dst, dst, title=dst)
    got_net.add_edge(src, dst, label = "weight")

neighbor_map = got_net.get_adj_list()  

for node in got_net.nodes:
    node['title'] += ' Neighbors:' + ''.join(neighbor_map[node['id']])
    node['value'] = len(neighbor_map[node['id']])

got_net.save_graph('graph.html')
display(HTML("graph.html"))
b1uwtaje

b1uwtaje1#

不要忘记设置网络图的“物理引擎”,设置边的权重时也要使用value(不是label)参数

例如,考虑一个网络可视化,它显示了一段著名的英语文学作品中二元组的频率,比如莎士比亚的"To be or not to be" soliloquy来自 * 哈姆雷特 *:

from collections import defaultdict
import re

hamlet_speech = # [See link above]

shakespeare_letters = re.sub("[',.;:\-\—?\n ]", "", hamlet_speech.upper())

bigrams = [
    shakespeare_letters[i : i + 2]
    for i in range(len(shakespeare_letters) - 1)
]

freqs = defaultdict(int)

for xy in bigrams:
    freqs[xy] += 1

df = pd.DataFrame(
    [[*xy] + [w] for xy, w in freqs.items()], columns=["from", "to", "weight"]
)
df.sort_values(by="weight", inplace=True, ascending=False)
df = df[df.weight > 3]
df

给出:

from to  weight
10     T  H      53
16     H  E      33
35     N  D      19
0      T  O      18
55     O  F      17
..   ... ..     ...
226    R  D       4
241    L  L       4
21     I  O       4
8      T  T       4
166    P  A       4

[99 rows x 3 columns]

注意:* 为了简化这个例子,我只包括了最频繁(出现次数> 3)的后续字母对。

不出所料,最主要的结果是:

*“TH”(例如“the”...)是最常见的二元语法,
*后接“他”(也如“那个”)。

让我们看看pyvis如何直观地表示这一点:

import pandas as pd
from IPython.display import display, HTML
from pyvis.network import Network

got_net = Network(
    notebook=True,
    cdn_resources="remote",
    height="500px",
    width="100%",
    bgcolor="white",
    font_color="red",
)

# set the physics layout of the network
got_net.repulsion()
got_data = df

sources = got_data["from"]
targets = got_data["to"]
weights = got_data["weight"]

edge_data = zip(sources, targets, weights)

for e in edge_data:
    src = e[0]
    dst = e[1]
    w = e[2]

    got_net.add_node(src, src, title=src)
    got_net.add_node(dst, dst, title=dst)
    got_net.add_edge(src, dst, value=w)

neighbor_map = got_net.get_adj_list()

# add neighbor data to node hover data
for node in got_net.nodes:
    node["title"] += "\nNeighbors:\n"
    neighbor_distances = {}
    for neighbor in neighbor_map[node["id"]]:
        bigram = node["id"] + neighbor
        dist = freqs[bigram]
        neighbor_distances[neighbor] = dist
    for n, d in sorted(
        neighbor_distances.items(), key=lambda kv: kv[1], reverse=True
    ):
        node["title"] += f"{n}: {d}\n"
    node["value"] = len(neighbor_map[node["id"]])

got_net.show("network.html")

显然,这种类型的“网络”可视化的一个关键考虑是,单个节点可以同时是“从”和“到”的位置。因此,当可视化两个单个节点之间的任何连接的“权重”时,必须考虑到 * 两者 * 的重量随着节点和可能的成对路径的数量置换地增加,这显然可以很快变得相当复杂。
尽管如此,pyvis还是很好地处理了这一点,它使网络具有交互性,并且通过节点在网络中相对于其他节点的整体位置以及它们的边的 * 宽度 * 来直观地表示节点互连的强度(即weight)。

相关问题