pandas 如何获取时间序列中每天开始和结束时的值

gblwokeq  于 2023-02-27  发布在  其他
关注(0)|答案(3)|浏览(146)

I have a time series with 2 columns and ~10k rows:
| Time | Value |
| ------------ | ------------ |
| 2022-01-01 09:53:34 | 1.9342 |
| 2022-01-01 19:03:21 | 2.3213 |
| 2022-01-02 10:14:32 | 3.4332 |
| 2022-01-02 11:31:51 | 0.387 |
| ... | |
I want to summarize the data by day and get the start, end, min and max for each day:
| Date | Start | Min | Max | End |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 2022-01-01 | 1.9342 | 1.9342 | 2.3213 | 2.3213 |
| 2022-01-02 | 3.4332 | 0.387 | 3.4332 | 0.387 |
| ... | | | | |
I could probably do this with a bunch of nested for loops iterating through the initial dataframe but looking for a more elegant solution.
So far, I can get the min and max for each day by doing the following:

# convert date time to two columns to separate the date and times
df["date_only"] = df.time.dt.date
df["time_only"] = df.time.dt.time
df.drop(columns="time", inplace=True)

# group by date, temporarily drop the time, and get the min and max values for each day
min_df = df.drop(columns="time_only").groupby(["date_only"]).min()
max_df = df.drop(columns="time_only").groupby(["date_only"]).max()

#concat the columns afterwards

I'm struggling to find a way to get the start and end values for each day though. If I group by both the date_only and time_only columns, I can get the time but can't seem to reference the value at that time.
I could get the start and end for each date and go back to the initial df to .loc with for loops or am I missing a much more obvious and elegant solution?

zphenhs4

zphenhs41#

下面是使用df.groupby()pandas.DataFrame.agg的一种可行方法

df['Date'] = pd.to_datetime(df['Time'])
df = df.groupby(df['Date'].dt.date).agg(
            Start=('Value', 'first'),
            Min=('Value', 'min'),
            Max=('Value', 'max'),
            End=('Value', 'last')
        ).reset_index()
print(df)
Date   Start     Min     Max     End
0  2022-01-01  1.9342  1.9342  2.3213  2.3213
1  2022-01-02  3.4332  0.3870  3.4332  0.3870
6g8kf2rb

6g8kf2rb2#

您可以对字典使用agg方法,如下所示

df["date"] = df["Time"].dt.date
df.set_index("date", inplace=True)

summary = df.groupby(df.index).agg({"Value": ["first", "min", "max", "last"]})
summary.columns = ["Start", "Min", "Max", "End"]
az31mfrm

az31mfrm3#

import pandas as pd

df = pd.DataFrame(
    {'Time': ['2022-01-01 09:53:34', '2022-01-01 19:03:21', 
              '2022-01-02 10:14:32', '2022-01-02 11:31:51'],
     'Value': [1.9342, 2.3213, 3.4332, 0.387]}
)

df['Time'] = pd.to_datetime(df['Time'] , infer_datetime_format=True)
df.set_index('Time', inplace=True)

Start = df.resample("D").agg({'Value':'first'}).rename(columns={'Value':'Start'})
Min = df.resample("D").agg({'Value':'min'}).rename(columns={'Value':'Min'})
Max = df.resample("D").agg({'Value':'max'}).rename(columns={'Value':'Max'})
End = df.resample("D").agg({'Value':'last'}).rename(columns={'Value':'End'})

print( pd.concat([Start, Min, Max, End], axis=1) )

#              Start     Min     Max     End
# Time                                      
# 2022-01-01  1.9342  1.9342  2.3213  2.3213
# 2022-01-02  3.4332  0.3870  3.4332  0.3870

相关问题