pandas 得到每个子群的动态尾n

lhcgjxsq  于 2023-04-19  发布在  其他
关注(0)|答案(2)|浏览(102)

我有一个数据集看起来像这样。(我正在寻找一个数据框架解决方案)

df = ({'id':["a","a","a","a","b","b","b"],
      'tail_num' :[2,2,2,2,1,1,1],
      'value':[1,2,3,4,5,6,7]})

df = pd.DataFrame(df)

对于id为'a'的子组,我想获得与tail_num列对齐的最新2条记录,对于'b',我想获得tail 1值。实现这一点的最佳方法是什么?谢谢!
所需的输出如下所示(基本上是基于tail_num获取tail n值,并显示所有列):

df = ({'id':["a","a","b"],
      'tail_num' :[2,2,1],
      'value':[3,4,7]})

df = pd.DataFrame(df)
p4rjhz4m

p4rjhz4m1#

您可以GroupByid列并应用 custom/mappedtail

dmap = dict(zip(df["id"], df["tail_num"]))

out = df.groupby("id", group_keys=False).apply(lambda g: g.tail(dmap[g.name]))

或者没有字典:

out = df.groupby(["id", "tail_num"], group_keys=False).apply(lambda g: g.tail(x.name[1]))

Ouptut:

print(out)

  id  tail_num  value
2  a         2      3
3  a         2      4
6  b         1      7
ryevplcw

ryevplcw2#

由于在问题中最初并不清楚您正在寻找Pandas解决方案,因此以下是如何在普通Python中完成的。
在未来,请不要让回答者猜测你需要什么。

counts = {x: 0 for x in set(df["id"])}
results = {"id": [], "tail_num": [], "value": []}

# Iterate through the list, in reverse order
for i in range(len(df["id"]) - 1, 0, -1):
    counts[df["id"][i]] += 1
    # Skip if we already have tail_num entries for this id.
    # Otherwise add to the output.
    if counts[df["id"][i]] <= df["tail_num"][i]:
        results["id"] += [df["id"][i]]
        results["tail_num"] +=[df["tail_num"][i]]
        results["value"] += [df["value"][i]]

# Reverse again to get back the original order.
results = [list(reversed(results[x])) for x in results.keys()]

print(results)

输出:

[['a', 'a', 'b'], [2, 2, 1], [3, 4, 7]]

相关问题