pandas DF中的Group by(带条件)

e0bqpujr  于 2023-05-27  发布在  其他
关注(0)|答案(1)|浏览(155)

Python新手,请原谅我!
我试图创建股票的OHLC数据。我的数据源是15分钟,我想AGG高达4小时OLHC数据。我为每一天的四小时蜡烛制作了一面旗帜。这是必要的,因为美国股市在8:30 CST开盘,每4小时蜡烛不是真正的4小时。
我的DF:

Volume|Open|Close|High|Low|Ticker|Finaldate|Time|FLAG
5887869|130.465|131.05|131.44|130.465|AAPL|1/9/2023|8:30:00|1
4094933|131.05|132.015|132.06|130.53|AAPL|1/9/2023|8:45:00|1
3942758|132.01|132.13|132.63|131.3218|AAPL|1/9/2023|9:00:00|1
3481345|132.145|132.565|132.735|131.96|AAPL|1/9/2023|9:15:00|1
2526194|132.575|132.735|132.8375|132.4636|AAPL|1/9/2023|9:30:00|1
2756892|132.73|132.74|133.14|132.66|AAPL|1/9/2023|9:45:00|1
2679936|132.74|133.2101|133.34|132.67|AAPL|1/9/2023|10:00:00|1
2366166|133.23|133.085|133.41|132.97|AAPL|1/9/2023|10:15:00|1
2040521|133.09|132.97|133.18|132.8|AAPL|1/9/2023|10:30:00|1
1643790|132.9679|132.945|133.12|132.81|AAPL|1/9/2023|10:45:00|1
1597548|132.94|132.82|132.955|132.57|AAPL|1/9/2023|11:00:00|1
1360285|132.8101|132.745|132.8415|132.6|AAPL|1/9/2023|11:15:00|1
1337147|132.74|132.26|132.83|132.25|AAPL|1/9/2023|11:30:00|1
2046411|132.26|132.459|132.53|132.255|AAPL|1/9/2023|11:45:00|1
1378193|132.46|132.42|132.64|132.26|AAPL|1/9/2023|12:00:00|1
1207053|132.4201|132.495|132.569|132.35|AAPL|1/9/2023|12:15:00|1
2149979|132.495|131.62|132.5101|131.57|AAPL|1/9/2023|12:30:00|2
2455293|131.62|131.395|131.625|131.19|AAPL|1/9/2023|12:45:00|2
2363314|131.39|131.47|131.726|131.2411|AAPL|1/9/2023|13:00:00|2
1812985|131.465|131.495|131.66|131.3|AAPL|1/9/2023|13:15:00|2
1727193|131.49|131.09|131.5667|130.95|AAPL|1/9/2023|13:30:00|2
1609277|131.09|130.67|131.16|130.59|AAPL|1/9/2023|13:45:00|2
1751837|130.68|130.37|130.82|130.36|AAPL|1/9/2023|14:00:00|2
1773304|130.3661|130.445|130.7|130.34|AAPL|1/9/2023|14:15:00|2
2017340|130.45|130.5|130.655|130.16|AAPL|1/9/2023|14:30:00|2
4098579|130.51|130.19|130.52|129.89|AAPL|1/9/2023|14:45:00|2

通过'Finaldate'和'Flag'进行分组的正确语法是什么?:'打开':'第一个',
'High':'max',
'Low':'min',
'关闭':'最后',
'Volume':'sum','Time:'min,' Ticker','Finaldate',

5us2dqdw

5us2dqdw1#

既然你是新来的:通常,您的工作是提供一些代码和代码示例。下一个问题,请花点时间。
您可以使用Pandas来实现这一点-它附带了所有这些功能-请参阅https://pandas.pydata.org/
让我们首先将您的示例转换为 Dataframe

import pandas as pd

example="""Volume|Open|Close|High|Low|Ticker|Finaldate|Time|FLAG
5887869|130.465|131.05|131.44|130.465|AAPL|1/9/2023|8:30:00|1
4094933|131.05|132.015|132.06|130.53|AAPL|1/9/2023|8:45:00|1
3942758|132.01|132.13|132.63|131.3218|AAPL|1/9/2023|9:00:00|1
3481345|132.145|132.565|132.735|131.96|AAPL|1/9/2023|9:15:00|1
2526194|132.575|132.735|132.8375|132.4636|AAPL|1/9/2023|9:30:00|1
2756892|132.73|132.74|133.14|132.66|AAPL|1/9/2023|9:45:00|1
2679936|132.74|133.2101|133.34|132.67|AAPL|1/9/2023|10:00:00|1
2366166|133.23|133.085|133.41|132.97|AAPL|1/9/2023|10:15:00|1
2040521|133.09|132.97|133.18|132.8|AAPL|1/9/2023|10:30:00|1
1643790|132.9679|132.945|133.12|132.81|AAPL|1/9/2023|10:45:00|1
1597548|132.94|132.82|132.955|132.57|AAPL|1/9/2023|11:00:00|1
1360285|132.8101|132.745|132.8415|132.6|AAPL|1/9/2023|11:15:00|1
1337147|132.74|132.26|132.83|132.25|AAPL|1/9/2023|11:30:00|1
2046411|132.26|132.459|132.53|132.255|AAPL|1/9/2023|11:45:00|1
1378193|132.46|132.42|132.64|132.26|AAPL|1/9/2023|12:00:00|1
1207053|132.4201|132.495|132.569|132.35|AAPL|1/9/2023|12:15:00|1
2149979|132.495|131.62|132.5101|131.57|AAPL|1/9/2023|12:30:00|2
2455293|131.62|131.395|131.625|131.19|AAPL|1/9/2023|12:45:00|2
2363314|131.39|131.47|131.726|131.2411|AAPL|1/9/2023|13:00:00|2
1812985|131.465|131.495|131.66|131.3|AAPL|1/9/2023|13:15:00|2
1727193|131.49|131.09|131.5667|130.95|AAPL|1/9/2023|13:30:00|2
1609277|131.09|130.67|131.16|130.59|AAPL|1/9/2023|13:45:00|2
1751837|130.68|130.37|130.82|130.36|AAPL|1/9/2023|14:00:00|2
1773304|130.3661|130.445|130.7|130.34|AAPL|1/9/2023|14:15:00|2
2017340|130.45|130.5|130.655|130.16|AAPL|1/9/2023|14:30:00|2
4098579|130.51|130.19|130.52|129.89|AAPL|1/9/2023|14:45:00|2""".splitlines()

data = [row.split("|") for row in example]

df = pd.DataFrame(data[1:], columns=data[0])

现在让我们将列转换为正确的类型。我猜是美国的一月一号

for c in ["Volume", "FLAG"]:
    df[c] = df[c].astype(int)

for c in ["Open", "Close", "High", "Low"]:
    df[c] = df[c].astype(float)

df["Finaldate"] = pd.to_datetime(df["Finaldate"], dayfirst=False)

现在你可以分组了。“最终日期”是组索引的一部分,不会出现在结果中。

result = df.groupby(['Finaldate', 'FLAG']).agg(
    {
        'Open': 'first',
        'High': 'max',
        'Low': 'min',
        'Close': 'last',
        'Volume': 'sum',
        'Time': 'min'
    }
).reset_index()

print(result)

Finaldate  FLAG     Open      High      Low    Close    Volume      Time
0 2023-01-09     1  130.465  133.4100  130.465  132.495  40347041  10:00:00
1 2023-01-09     2  132.495  132.5101  129.890  130.190  21759101  12:30:00

注意:你没有指定如何处理Ticker --我可能会将其添加到组中。

相关问题