Python Pandas按标志分组和从属的第二个标志

rkue9o1l 于 2023-01-07 发布在 Python

关注(0)|答案(3)|浏览(116)

这是我的问题的一个简化的例子。2我有一个带有文件名和修改日期的 Dataframe 。3我需要一个由文件名表示的最新文件的标志。4最新= 1;非最新= 0
我有这个代码到目前为止：

import pandas as pd
    
   df = pd.DataFrame({
     'FileName' : ['file1.txt', 'file2.txt', 'file3.txt', 'file1.txt', 'file4.txt', 'file3.txt'],
     'FileModDate' : ['2022-02-22 10:28:18', '2022-02-22 11:28:18', '2022-02-22 12:28:18', '2022-02-22 14:28:18', '2022-02-22 08:28:18', '2022-02-22 15:28:18'],
     'DataDate' : ['2024-02-22 10:28:18', '2021-02-22 11:28:18', '2021-02-22 12:28:18', '2021-02-22 14:28:18', '2021-02-22 08:28:18', '2021-02-22 15:28:18']
})
    
    df=df.sort_values('FileModDate',ascending=False)
    
    print (df)
    
    
    grouped=df.groupby('FileName').first()
    grouped['LatestFile']=1
    print (grouped)

结果为：

FileModDate  LatestFile
FileName
file1.txt  2022-02-22 14:28:18           1
file2.txt  2022-02-22 11:28:18           1
file3.txt  2022-02-22 15:28:18           1
file4.txt  2022-02-22 08:28:18           1

我不知道这是不是正确的方法，我怎样才能把行放到这个 Dataframe 中，而这个 Dataframe 不在grouby.first（）中。
因此，结果应如下所示：

FileModDate  LatestFile
FileName
file1.txt  2022-02-22 14:28:18           1
file2.txt  2022-02-22 11:28:18           1
file3.txt  2022-02-22 15:28:18           1
file4.txt  2022-02-22 08:28:18           1
file3.txt  2022-02-22 12:28:18           0
file1.txt  2022-02-22 10:28:18           0

贝斯特齐奥
编辑：
根据第一个标志，我还需要一个标志：
如果LatestFile = 1，则标志DataDate应仅为1，因此输出为：

FileName          FileModDate             DataDate  LatestFile  DataDateFlag
0  file1.txt  2022-02-22 10:28:18  2024-02-22 10:28:18           0             0
1  file2.txt  2022-02-22 11:28:18  2021-02-22 11:28:18           1             1
2  file3.txt  2022-02-22 12:28:18  2021-02-22 12:28:18           0             0
3  file1.txt  2022-02-22 14:28:18  2021-02-22 14:28:18           1             1
4  file4.txt  2022-02-22 08:28:18  2021-02-22 08:28:18           1             1
5  file3.txt  2022-02-22 15:28:18  2021-02-22 15:28:18           1             1

我试过这样的方法：

df["DataDateFlag"] = (
    df
    .groupby("FileName")["DataDate"]
    .transform("max")
    .eq(df["DataDate"])
    .astype(int)
    .filter(df["LatestFile"]==1)
)

pandas

来源：https://stackoverflow.com/questions/75005173/python-pandas-group-by-flags-and-depending-second-flag

3条答案

按热度按时间

ecbunoof1#

您可以将每个组转换为其最大日期：这将保存对数组排序的需求，并允许您直接与实际日期进行比较：

import pandas as pd

df = pd.DataFrame({
     'FileName' : ['file1.txt', 'file2.txt', 'file3.txt', 'file1.txt', 'file4.txt', 'file3.txt'],
     'FileModDate' : ['2022-02-22 10:28:18', '2022-02-22 11:28:18', '2022-02-22 12:28:18', '2022-02-22 14:28:18', '2022-02-22 08:28:18', '2022-02-22 15:28:18']
})

df["LatestFile"] = (
    df
    .groupby("FileName")["FileModDate"]
    .transform("max")
    .eq(df["FileModDate"])
    .astype(int)
)

输出（按原始顺序）：

FileName          FileModDate  LatestFile
0  file1.txt  2022-02-22 10:28:18           0
1  file2.txt  2022-02-22 11:28:18           1
2  file3.txt  2022-02-22 12:28:18           0
3  file1.txt  2022-02-22 14:28:18           1
4  file4.txt  2022-02-22 08:28:18           1
5  file3.txt  2022-02-22 15:28:18           1

赞(0）回复(0）举报 2023-01-07

wgx48brx2#

可以使用布尔值代替0/1：

df['LatestFile'] = df['FileModDate'] == df.groupby('FileName')['FileModDate'].transform(max)

输出：

FileName          FileModDate  LatestFile
5  file3.txt  2022-02-22 15:28:18        True
3  file1.txt  2022-02-22 14:28:18        True
2  file3.txt  2022-02-22 12:28:18       False
1  file2.txt  2022-02-22 11:28:18        True
0  file1.txt  2022-02-22 10:28:18       False
4  file4.txt  2022-02-22 08:28:18        True

赞(0）回复(0）举报 2023-01-07

46qrfjad3#

如果您的数据已经按日期排序，则可以使用groupby.cumcount枚举组项以选择第一个（0）：

df['LatestFile'] = df.groupby('FileName').cumcount().eq(0).astype(int)

输出：

FileName          FileModDate  LatestFile
5  file3.txt  2022-02-22 15:28:18           1
3  file1.txt  2022-02-22 14:28:18           1
2  file3.txt  2022-02-22 12:28:18           0
1  file2.txt  2022-02-22 11:28:18           1
0  file1.txt  2022-02-22 10:28:18           0
4  file4.txt  2022-02-22 08:28:18           1

赞(0）回复(0）举报 2023-01-07

我来回答

Python Pandas按标志分组和从属的第二个标志

3条答案

相关问题

热门标签

最新问答