pandas 更改的id计数生成错误值

0lvr5msh 于 2023-06-20 发布在其他

关注(0)|答案(2)|浏览(102)

我有一个df，看起来像这样：

import pandas as pd

data = {
    'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    'type': ['BR', 'BR', 'NBR', 'NBR', 'NBR', 'NBR', 'BR', 'BR', 'BR', 'BR', 'BR']
}

df = pd.DataFrame(data)

我试着计算4种情况，一种是api_spec_id中的所有行都是，type= BR，第二种是api_spec_id中的至少一行，the type is BR。
这是我正在使用的代码，但它似乎是错误的，因为它为最后两个生成了相同的输出：

import pandas as pd

at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()

all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' not in x['type'].unique()) \
                                .sum()

at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()

all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'BR' not in x['type'].unique()) \
                                    .sum()

我发送的示例df的预期输出将是：

at_least_one_breaking_change = 3
all_commits_including_breaking = 3
at_least_one_non_breaking_change = 2
all_commits_including_non_breaking = 1

我在这方面有点卡住了，任何建议或想法都会非常感激。

pandas

来源：https://stackoverflow.com/questions/76451633/count-of-id-with-change-generates-wrong-values

2条答案

按热度按时间

q35jwt9p1#

你可以使用pd.crosstab：

m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)

at_least_one_breaking_change = sum(m['BR'])
all_commits_including_breaking = sum(m['BR'] & ~m['NBR'])

at_least_one_non_breaking_change = sum(m['NBR'])
all_commits_including_non_breaking = sum(m['NBR'] & ~m['BR'])

输出：

>>> at_least_one_breaking_change
3

>>> all_commits_including_breaking
2

>>> at_least_one_non_breaking_change
2

>>> all_commits_including_non_breaking
1

>>> m
type            BR    NBR
api_spec_id              
123           True  False
213           True   True
345          False   True
678           True  False

赞(0）回复(0）举报 2023-06-20

smdnsysy2#

我看过并运行了你的代码，它的输出是：

此代码中的条件有点错误。
看看更新

import pandas as pd

at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()

all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' in x['type'].unique()) \
                                .sum()

at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()

all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'Breaking' in x['type'].unique()) \
                                    .sum()

此外，没有"Breaking"类型。

赞(0）回复(0）举报 2023-06-20

我来回答

pandas 更改的id计数生成错误值

2条答案

相关问题

热门标签

最新问答