pandas 是否可以使用布尔掩码来查找DateTime值是否福尔斯不同 Dataframe 中的其他两个DateTime值之间

dkqlctbz  于 2023-06-20  发布在  其他
关注(0)|答案(2)|浏览(135)

我想过滤我拥有的数据点,直到我只有参与者睡着的数据点。我有我的dataframe与日期时间值和我正在研究的值,和一个不同的dataframe,有当参与者开始睡觉,当他们结束睡眠。我想知道是否有一种方法可以通过迭代大 Dataframe 或具有开始和结束睡眠时间的 Dataframe 来实现这一点,而不是必须在布尔掩码中写出每个开始和停止时间,或者任何其他比手动输入175个夜晚更好的方法。
开始/停止数据框看起来像这样,我为每个参与者都有一个:
df_sleep1:

date            start       stop
5/30/2023   5/29/2023 22:15 5/30/2023 7:22
5/31/2023   5/30/2023 23:19 5/31/2023 6:46
6/1/2023    6/1/2023 0:02   6/1/2023 8:31

包含所有数据的dataframe看起来像这样,我想在其中添加一个“asleep”列:
df:

DateTime            HeartRate        Participant      Asleep
0   2023-05-29 23:44:00 76.0             1
1   2023-05-30 06:44:00 76.0             1
2   2023-05-30 20:45:00 84.0             1
3   2023-05-31 04:45:00 84.0             2
4   2023-06-1 20:46:00  81.0             2

我所尝试的:

dt = df['DateTime'].to_numpy()

start1 = df_sleep1['Start'].to_numpy()[:, None]
end1 = df_sleep1['Stop'].to_numpy()[:, None]
    
mask1 = ((start1 <= dt) & (dt <= end1) & (df['Participant'] == 1))
df['Sleep'] = mask1.any(axis=0)
def sleepFunction(row):
    if (df_sleep1['Start'] <= dt) & (dt <= df_sleep1['Stop']) & (df['Participant'] == 1):
        return True
    else:
        return False

df['sleepState'] = df.apply(lambda row: sleepFunction(row), axis = 1)

两者都给予了类似的错误,关于 Dataframe /数组的形状不匹配,这不是我想做的事情。

z9zf31ra

z9zf31ra1#

如果间隔不重叠,则有效的方法是使用merge_asof:按参与者在开始日期合并,然后确保该日期在结束日期之后。

# map the Participant ID to the df_sleep DataFrame
all_sleep = pd.concat({1: df_sleep1}, names=['Participant']).reset_index(level=0)

# ensure having datetime types
all_sleep[['start', 'stop']] = all_sleep[['start', 'stop']].apply(pd.to_datetime)
df['DateTime'] = pd.to_datetime(df['DateTime'])

# merge by date and participant
df['Asleep'] = (
 pd.merge_asof(df.sort_values(by='DateTime').reset_index(),
               all_sleep.sort_values(by='start'),
               left_on='DateTime', right_on='start',
               by='Participant'
              )
   .assign(Asleep=lambda d: d['DateTime'].le(d['stop']))
   .set_index('index')['Asleep']
)

输出:

DateTime  HeartRate  Participant  Asleep
0 2023-05-29 23:44:00       76.0            1    True
1 2023-05-30 06:44:00       76.0            1    True
2 2023-05-30 20:45:00       84.0            1   False
3 2023-05-31 04:45:00       84.0            2   False
4 2023-06-01 20:46:00       81.0            2   False
m3eecexj

m3eecexj2#

是的,您可以使用布尔掩码来查找DateTime值是否福尔斯不同 Dataframe 中的其他两个DateTime值之间。这里有一个方法:

# merge the sleep start and stop dataframe into one
df_sleep = pd.concat([df_sleep1, df_sleep2, df_sleep3])

# convert the date columns to datetime format
df_sleep['start'] = pd.to_datetime(df_sleep['start'])
df_sleep['stop'] = pd.to_datetime(df_sleep['stop'])
df['DateTime'] = pd.to_datetime(df['DateTime'])

# create a boolean mask where the DateTime value falls between the start and stop times
mask = (df['DateTime'].between(df_sleep['start'], df_sleep['stop'])) & (df['Participant'] == 1)

# set the 'Asleep' column to True where the mask is True, False otherwise
df.loc[mask, 'Asleep'] = True
df.loc[~mask, 'Asleep'] = False

这将在df Dataframe 中创建一个“Asleep”列,当参与者睡着时,值为True,当参与者没有睡着时,值为False。

相关问题