我有一个类似以下内容的数据集:
ID DATE TAGS3800 1999-07-02 DS1190 1999-07-02 CS3131 1999-07-02 CS3131 1999-07-04 CS3131 1999-07-05 D
ID DATE TAG
S3800 1999-07-02 D
S1190 1999-07-02 C
S3131 1999-07-02 C
S3131 1999-07-04 C
S3131 1999-07-05 D
我试图计算每个id的记录之间的最小和最大时间间隔(以天为单位)。例如:
ID MIN_TIME_GAP MAX_TIME_GAP S3131 1 3
ID MIN_TIME_GAP MAX_TIME_GAP
S3131 1 3
列日期的格式为datetine64[ns]。我如何在Pandas身上做到这一点?
ru9i0ody1#
尝试:
# if they aren't sorted already:df = df.sort_values(by="DATE")x = df.groupby("ID").agg( MIN_TIME_GAP=("DATE", lambda x: np.min(x.diff())), MAX_TIME_GAP=("DATE", lambda x: x.max() - x.min()),)print(x.dropna())
# if they aren't sorted already:
df = df.sort_values(by="DATE")
x = df.groupby("ID").agg(
MIN_TIME_GAP=("DATE", lambda x: np.min(x.diff())),
MAX_TIME_GAP=("DATE", lambda x: x.max() - x.min()),
)
print(x.dropna())
印刷品:
MIN_TIME_GAP MAX_TIME_GAPID S3131 1 days 3 days
MIN_TIME_GAP MAX_TIME_GAP
ID
S3131 1 days 3 days
编辑:要将时间增量转换为天,请执行以下操作:
# convert to days:x["MIN_TIME_GAP"] = x["MIN_TIME_GAP"].dt.daysx["MAX_TIME_GAP"] = x["MAX_TIME_GAP"].dt.daysprint(x)
# convert to days:
x["MIN_TIME_GAP"] = x["MIN_TIME_GAP"].dt.days
x["MAX_TIME_GAP"] = x["MAX_TIME_GAP"].dt.days
print(x)
MIN_TIME_GAP MAX_TIME_GAPID S3131 1 3
1条答案
按热度按时间ru9i0ody1#
尝试:
印刷品:
编辑:要将时间增量转换为天,请执行以下操作:
印刷品: