Pandas-累积计数(带标签)

eyh26e7m  于 2022-11-20  发布在  其他
关注(0)|答案(1)|浏览(193)

我有一个Pandasdf,看起来像下面这样:

+---------+---------+------------+--------+
| Cluster | Country | Publishers | Assets |
+---------+---------+------------+--------+
| South   | IT      | SS         | Asset1 |
| South   | IT      | SS         | Asset2 |
| South   | IT      | SS         | Asset3 |
| South   | IT      | ML         | Asset1 |
| South   | IT      | ML         | Asset2 |
| South   | IT      | ML         | Asset3 |
| South   | IT      | TT         | Asset1 |
| South   | IT      | TT         | Asset2 |
| South   | IT      | TT         | Asset3 |
| South   | ES      | SS         | Asset1 |
| South   | ES      | SS         | Asset2 |
+---------+---------+------------+--------+

我想创建一个新列“Package”,该列使用基于以下列的累计计数:

  • 出版商
  • 资产

结果会是这样的:

+---------+---------+------------+--------+---------+
| Cluster | Country | Publishers | Assets | Package |
+---------+---------+------------+--------+---------+
| South   | IT      | SS         | Asset1 | 1       |
| South   | IT      | SS         | Asset2 | 1a      |
| South   | IT      | SS         | Asset3 | 1b      |
| South   | IT      | ML         | Asset1 | 2       |
| South   | IT      | ML         | Asset2 | 2a      |
| South   | IT      | ML         | Asset3 | 2b      |
| South   | IT      | TT         | Asset1 | 3       |
| South   | IT      | TT         | Asset2 | 3a      |
| South   | IT      | TT         | Asset3 | 3b      |
| South   | ES      | SS         | Asset1 | 4       |
| South   | ES      | SS         | Asset2 | 4a      |
+---------+---------+------------+--------+---------+

到目前为止我试过
df['Package'] = df.groupby(['Cluster','Publishers']).cumcount(),但它似乎不起作用,因为在每个发布者示例完成后,值重置为0。

sczxawaw

sczxawaw1#

您可以使用groupby.cumcount,但使用不同的grouper。您还需要相关的groupby.ngroup

from string import ascii_lowercase

# group by consecutive identical values
group = df['Publishers'].ne(df['Publishers'].shift()).cumsum()
# alternatively, you can also group by Cluster/Country/Publishers
# group = ['Cluster', 'Country', 'Publisher']

df['Package'] =(
  df.groupby(group).ngroup().add(1).astype(str)
 +df.groupby(group).cumcount().map(dict(enumerate(['']+list(ascii_lowercase))))
)

输出:

Cluster Country Publishers  Assets Package
0    South      IT         SS  Asset1       1
1    South      IT         SS  Asset2      1a
2    South      IT         SS  Asset3      1b
3    South      IT         ML  Asset1       2
4    South      IT         ML  Asset2      2a
5    South      IT         ML  Asset3      2b
6    South      IT         TT  Asset1       3
7    South      IT         TT  Asset2      3a
8    South      IT         TT  Asset3      3b
9    South      ES         SS  Asset1       4
10   South      ES         SS  Asset2      4a

相关问题