如何创建一个pandas列,根据条件对值的示例进行计数?

vwkv1x7d  于 2023-05-05  发布在  其他
关注(0)|答案(2)|浏览(222)

我有一个提取的论坛帖子的数据框架,其中每个线程的开头帖子由thread_opening列中的数字1表示。post_number列计算thread_name列中每个唯一字符串的注解数(即每个线程)。它看起来像这样:
| 用户名|线程名|后文本|岗位编号|螺纹开口|
| --------------|--------------|--------------|--------------|--------------|
| BoxCutter|我们做的时候没关系。|...|1|1|
| 文档_33|我们做的时候没关系。|...|二|0|
| 听令|我们做的时候没关系。|...|三|0|
| 耶兹|科学类|...|1|1|
| 美因兰德|科学类|...|二|0|
| 400z|科学类|...|三|0|
| 法纳姆|科学类|...|四|0|
我正在尝试创建一个新的专栏,让我们称之为'thread_comments_count',它统计每个线程中的帖子数量。本质上,对于thread_opening = 1的每一行,我希望计算post_number。而在thread_opening = 0的情况下,thread_comments_count值将= 0。

5lhxktic

5lhxktic1#

如果thread_opening已经存在,可以使用groupby.cumcount

df['post_number'] = df.groupby(df['thread_opening'].cumsum()).cumcount()

如果你想要总数,只有当thread_opening是1时:

df['thread_comments_count'] = (df.groupby('thread_name')['thread_name']
                                 .transform('size')
                                 .where(df['thread_opening'].eq(1), 0)
                              )

输出:

user_name               thread_name post_text  post_number  thread_opening  thread_comments_count
0   BoxCutter  It's Okay When We Do It.     .....            1               1                      3
1   Docket_33  It's Okay When We Do It.     .....            2               0                      0
2   Hearmeout  It's Okay When We Do It.     .....            3               0                      0
3         yez                   Science     .....            1               1                      4
4  Mainländer                   Science     .....            2               0                      0
5        400z                   Science     .....            3               0                      0
6     Farnham                   Science     .....            4               0                      0
nkhmeac6

nkhmeac62#

numpy.whereSeries.value_countsSeries.map一起使用:

df['thread_comments_count'] = np.where(df['thread_opening'].eq(1),
                                    df['thread_name'].map(df['thread_name'].value_counts()),
                                    0)
print (df)

     user_name                thread_name post_text  post_number  \
0   BoxCutter   It's Okay When We Do It.     .....             1   
1   Docket_33   It's Okay When We Do It.     .....             2   
2   Hearmeout   It's Okay When We Do It.     .....             3   
3         yez                    Science     .....             1   
4  Mainländer                    Science     .....             2   
5        400z                    Science     .....             3   
6     Farnham                    Science     .....             4   

   thread_opening  thread_comments_count  
0               1                      3  
1               0                      0  
2               0                      0  
3               1                      4  
4               0                      0  
5               0                      0  
6               0                      0

相关问题