Pandas时间序列与日期索引,我如何找到两个小时之间的值〈一个数字?

pgx2nnw8  于 2022-12-09  发布在  其他
关注(0)|答案(3)|浏览(96)

I have a dataframe with datetime index that I am analysing. I have a column with prices.
Example

2022-01-01 00:00:00  |  32.21
2022-01-01 01:00:00  |  10.20
2022-01-01 02:00:00  |  42.12
2022-01-01 03:00:00  |  01.05

I am looking to make another column that lists how many hours it has been since the price was under a certain, constant value. With the above example, with values under 30, it would like:

2022-01-01 00:00:00  |  32.21 | 4
2022-01-01 01:00:00  |  10.20 | 0
2022-01-01 02:00:00  |  42.12 | 1
2022-01-01 03:00:00  |  01.05 | 0

How can I do this? I thought about putting the index and price into a list of lists or tuple, calculate, then put it back, but I assume there is a better way in Pandas?
Thanks,
Gregersdk

j8yoct9x

j8yoct9x1#

不确定我是否正确理解了你想要什么,但这可能是你想要的:

df["constant_since"] = df["date"].apply( lambda x: datetime.datetime.now() - datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S"))

它将根据价格与当前价格相比的时间长度添加新行,如果需要,您可以将'datetime.datetime.now()'更改为另一个任意值。

zte4gxcn

zte4gxcn2#

在data.csv中包含以下内容:

2022-01-01 00:00:00,32.21
2022-01-01 01:00:00,10.20
2022-01-01 02:00:00,42.12
2022-01-01 03:00:00,01.05

假设您 * 每小时有一个条目 *,您可以尝试以下操作:

import pandas as pd

df = pd.read_csv("data.csv", header=None, index_col=0, names=["value"])
df["above_30"] = df.value > 30

res = []
for i, above in enumerate(df.above_30):
    if i == 0:
        res.append(4) # for first row
    elif above:
        res.append(res[-1] + 1)
    else:
        res.append(0)

df["result"] = res

结果应该是:

>>> df
                     value  above_30  result
2022-01-01 00:00:00  32.21      True       4
2022-01-01 01:00:00  10.20     False       0
2022-01-01 02:00:00  42.12      True       1
2022-01-01 03:00:00   1.05     False       0
y1aodyip

y1aodyip3#

如果我没理解错的话,这里有一个例子,说明如何使用for循环来完成这个任务。

import pandas as pd

# example dataframe
df = pd.DataFrame({
    'date':['1/1/2022 00:00:00','1/1/2022 01:00:00','1/1/2022 02:00:00','1/1/2022 03:00:00'],
    'value' : [30,10,40,10]
})
df.date = pd.to_datetime(df.date)
df.set_index('date',inplace=True)

# empty list to be populated
l = []
# counter variable
count = 0

for i in range(df.shape[0]):
    
    # increase the counter at each iteration
    count = count + 1
    
    # reset the counter if the condition is met
    if df.value[i] >= 30:
        count = 0
    
    # append the counter at each iteration to the list "l"
    l.append(count)

# add a new column "count" using the list "l"
df['count'] = l

# output
df.head()

输出量:

value count
date        
2022-01-01 00:00:00 30  0
2022-01-01 01:00:00 10  1
2022-01-01 02:00:00 40  0
2022-01-01 03:00:00 10  1

相关问题