I have a time series with 2 columns and ~10k rows:
| Time | Value |
| ------------ | ------------ |
| 2022-01-01 09:53:34 | 1.9342 |
| 2022-01-01 19:03:21 | 2.3213 |
| 2022-01-02 10:14:32 | 3.4332 |
| 2022-01-02 11:31:51 | 0.387 |
| ... | |
I want to summarize the data by day and get the start, end, min and max for each day:
| Date | Start | Min | Max | End |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 2022-01-01 | 1.9342 | 1.9342 | 2.3213 | 2.3213 |
| 2022-01-02 | 3.4332 | 0.387 | 3.4332 | 0.387 |
| ... | | | | |
I could probably do this with a bunch of nested for loops iterating through the initial dataframe but looking for a more elegant solution.
So far, I can get the min and max for each day by doing the following:
# convert date time to two columns to separate the date and times
df["date_only"] = df.time.dt.date
df["time_only"] = df.time.dt.time
df.drop(columns="time", inplace=True)
# group by date, temporarily drop the time, and get the min and max values for each day
min_df = df.drop(columns="time_only").groupby(["date_only"]).min()
max_df = df.drop(columns="time_only").groupby(["date_only"]).max()
#concat the columns afterwards
I'm struggling to find a way to get the start
and end
values for each day though. If I group by both the date_only
and time_only
columns, I can get the time but can't seem to reference the value
at that time.
I could get the start
and end
for each date
and go back to the initial df
to .loc
with for loops or am I missing a much more obvious and elegant solution?
3条答案
按热度按时间zphenhs41#
下面是使用
df.groupby()
和pandas.DataFrame.agg
的一种可行方法6g8kf2rb2#
您可以对字典使用agg方法,如下所示
az31mfrm3#