pandas 如何统计职位以下的人

pn9klfpd 于 2023-03-21 发布在其他

关注(0)|答案(2)|浏览(98)

我希望计算有多少人低于一个给定的用户的数据框架。
| 雇员|经理|
| - ------|- ------|
| A类|- -|
| B|A类|
| C级|A类|
| D级|A类|
| E级|A类|
| F级|B|
| G级|B|
| 高|C级|
| 我|C级|
我想得到输出：I、H、G、F、E和D下面没有员工C下面有两个员工（H和I）B下面有两个员工（F和G）A下面有八个员工（B、C、D和E加上B和C的员工）
有人有什么建议吗？在我的DF中，我有更多的层次结构层和非常大量的数据。
我想过把它存储在字典里，然后做一个循环来更新它，但我认为这种解决方案一点效率都没有，我想知道有没有更高效的技术来解决这类问题。

pandas

来源：https://stackoverflow.com/questions/75762216/how-to-count-people-who-are-below-a-position

2条答案

按热度按时间

ac1kyiln1#

正如@34jbonz最初提到的，networkx是完成这项任务的最佳工具。然而，由于networkx提供了pandas接口，因此无需预处理数据

G = nx.from_pandas_edgelist(temp, source='manager',target='employee',create_using=nx.DiGraph)

另外，应该避免使用apply和descendants，因为这会导致某些计算被多次执行。这里，深度优先搜索是最有效的解决方案

for node in nx.dfs_postorder_nodes(G,'-'):
    successors = list(G.successors(node))
    G.nodes[node]['size'] = sum([G.nodes[p]['size'] for p in successors]) + len(successors)
    G.nodes[node]['descendants'] = [s for sn in successors for s in G.nodes[sn]['descendants']]\
        + successors

最后，可以从networkx图中批量提取信息作为dict，然后将其转换为 Dataframe 。

pd.DataFrame.from_dict(dict(G.nodes(data=True)),orient='index')

赞(0）回复(0）举报 2023-03-21

nnt7mjpx2#

我会使用networkx的有向图。这是一个超级有趣的python包。

import networkx as nx, pandas as pd

#set up data
employee = ['A', 'B', 'C','D','E','F','G','H','I']
manager = ['', 'A', 'A','A','A','B','B','C','C']
relations = pd.DataFrame(list(zip(employee,manager)), columns = ['Employee', 'Manager'])

# If there is no manager, make it the employee
relations.Manager = np.where(relations.Manager == '', relations.Employee, relations.Manager)
# or might need depending on data format:
relations.Manager = np.where(relations.Manager.isna(), relations.Employee, relations.Manager)

# Create tuples for 'edges'
relations['edge'] = list(zip(relations.Manager, relations.Employee))

# Create graph
G = nx.DiGraph()
G.add_nodes_from(list(relations.Employee))
G.add_edges_from(list(set(relations.edge)))

#Find all the descendants of nodes/employees
relations['employees_below'] = relations.apply(lambda row: nx.descendants(G,row.Employee), axis = 1)

退货：

Employee Manager    edge           employees_below
0        A       A  (A, A)  {C, G, I, D, H, F, E, B}
1        B       A  (A, B)                    {F, G}
2        C       A  (A, C)                    {H, I}
3        D       A  (A, D)                        {}
4        E       A  (A, E)                        {}
5        F       B  (B, F)                        {}
6        G       B  (B, G)                        {}
7        H       C  (C, H)                        {}
8        I       C  (C, I)                        {}

工作原理：图是节点和边。在这种情况下，你的节点是员工，你的边是经理和员工之间的关系。快速谷歌一下“networkx有向图”图像，你就会知道这在图像表示中是什么样子的。

确保在每个人都有一个管理器的地方清理你的数据（例如，如果没有管理器，就自己做）
首先，以(manager, employee)元组的形式创建边，并将其保存在某个地方（我选择将其作为df中的一列，称为edges）。
接下来，在networkx中创建一个有向图。由于层次关系，需要有向图。这意味着关系从经理到员工。因此，在这种情况下，每条边都是从经理到员工的方向。
将每个员工作为“节点”添加到图表中。
使用前面讨论的预定义元组(manager, employee)，将每个员工-经理关系作为边添加到图中。
最后，你可以通过找到这个节点的所有后代来获得雇员的下属的输出。后代是可以从一个节点（即雇员）到达的所有节点（即雇员）。我选择将其分配给一个列，并将descendants函数应用于每行的雇员apply。

赞(0）回复(0）举报 2023-03-21

我来回答

pandas 如何统计职位以下的人

2条答案

相关问题

热门标签

最新问答