pandas 基于另一列值创建唯一值列

tyg4sfes  于 2023-02-27  发布在  其他
关注(0)|答案(3)|浏览(142)

所以,我有这个数据框:

NAME           TEST
0   Homer Simpson  PASSED
1   Homer Simpson  FAILED
2   Homer Simpson  FAILED
3   Marge Simpson  PASSED
4   Marge Simpson  PASSED
5    Lisa Simpson  PASSED
6    Bart Simpson  FAILED
7  Maggie Simpson  FAILED

我的目标是创建一个列,其中包含TEST列的值,这些值基于NAME列相加在一起。

NAME            TEST    RESUME
0   Homer Simpson   PASSED  [PASSED: 1, FAILED: 2]
1   Marge Simpson   PASSED  [PASSED: 2]
3   Lisa Simpson    PASSED  [PASSED: 1]
4   Bart Simpson    FAILED  [FAILED: 1]
5   Maggie Simpson  FAILED  [FAILED: 1]

到目前为止,我使用了:

df.groupby('ID')['TEST'].nunique()

但这只是把值加起来,我想要的是有这些值以及它们在每个名称中出现的次数。

Bart Simpson      1
Homer Simpson     2
Lisa Simpson      1
Maggie Simpson    1
Marge Simpson     1

你能帮帮我吗?谢谢!

w9apscun

w9apscun1#

您可以在GroupBy.agg中使用collections.Counter

from collections import Counter

out = df.groupby('NAME', as_index=False, sort=False).agg(TEST=('TEST','first'),
                                                         RESUME=('TEST',Counter))
print (out)
             NAME    TEST                      RESUME
0   Homer Simpson  PASSED  {'PASSED': 1, 'FAILED': 2}
1   Marge Simpson  PASSED               {'PASSED': 2}
2    Lisa Simpson  PASSED               {'PASSED': 1}
3    Bart Simpson  FAILED               {'FAILED': 1}
4  Maggie Simpson  FAILED               {'FAILED': 1}

对于联接值列表:

from collections import Counter

f = lambda x: [f'{k}:{v}' for k, v in Counter(x).items()]
df = df.groupby('NAME', as_index=False, sort=False).agg(TEST=('TEST','first'),
                                                        RESUME=('TEST',f))
print (df)
             NAME    TEST                RESUME
0   Homer Simpson  PASSED  [PASSED:1, FAILED:2]
1   Marge Simpson  PASSED            [PASSED:2]
2    Lisa Simpson  PASSED            [PASSED:1]
3    Bart Simpson  FAILED            [FAILED:1]
4  Maggie Simpson  FAILED            [FAILED:1]

对于连接字符串:

from collections import Counter

f = lambda x: ', '.join(f'{k}:{v}' for k, v in Counter(x).items())
df = df.groupby('NAME', as_index=False, sort=False).agg(TEST=('TEST','first'),
                                                        RESUME=('TEST',f))
print (df)
             NAME    TEST              RESUME
0   Homer Simpson  PASSED  PASSED:1, FAILED:2
1   Marge Simpson  PASSED            PASSED:2
2    Lisa Simpson  PASSED            PASSED:1
3    Bart Simpson  FAILED            FAILED:1
4  Maggie Simpson  FAILED            FAILED:1
qxgroojn

qxgroojn2#

我们可以使用collections标准库中的Counter

import pandas as pd
from collections import Counter

df.groupby('NAME', as_index=False).agg({'TEST':Counter})

这使得"TEST"列成为字典值,我们可以使用该字典值来实现逻辑,以确定学生是否通过考试
| 姓名|测试|
| - ------|- ------|
| 巴特·辛普森|计数器({"失败":1})|
| 荷马·辛普森|计数器({"失败":2,"通过":1})|
| 丽莎·辛普森|计数器({"通过":1})|
| 玛吉·辛普森|计数器({"失败":1})|
| 玛吉·辛普森|计数器({"通过":第2条)|

x8goxv8g

x8goxv8g3#

您可以用途:

(df[['NAME', 'TEST']].value_counts().reset_index(name='count')
 .assign(TEST=lambda d: d['TEST'].add(': '+d['count'].astype(str)))
 .groupby('NAME', as_index=False)['TEST'].agg(', '. join)
)

输出:

NAME                  TEST
0    Bart Simpson             FAILED: 1
1   Homer Simpson  FAILED: 2, PASSED: 1
2    Lisa Simpson             PASSED: 1
3  Maggie Simpson             FAILED: 1
4   Marge Simpson             PASSED: 2

或者:

out = (df[['NAME', 'TEST']]
       .value_counts().reset_index(name='count')
       .assign(count=lambda d: d['TEST'].add(': '+d['count'].astype(str)))
       .groupby('NAME', as_index=False, sort=False)
       .agg(**{'TEST': ('TEST', 'first'),
               'RESULT': ('count', ', '. join)
              })
     )

输出:

NAME    TEST                RESULT
0   Homer Simpson  FAILED  FAILED: 2, PASSED: 1
1   Marge Simpson  PASSED             PASSED: 2
2    Bart Simpson  FAILED             FAILED: 1
3    Lisa Simpson  PASSED             PASSED: 1
4  Maggie Simpson  FAILED             FAILED: 1

相关问题