pandas 计算十分位排序

iqih9akk  于 2023-08-01  发布在  其他
关注(0)|答案(2)|浏览(89)

数据集:
| 股票报价机|隔夜回程| overnight_return |
| --|--| ------------ |
| CLXT| 0.019556| 0.019556 |
| CLXT| 0.039778| 0.039778 |
| ETNB| -0.006186 | -0.006186 |
| ETNB| 0.024590| 0.024590 |
我在检验一个关于隔夜回报率的假设。我想对每个Dateticker列中的所有唯一值应用排名,然后对排名进行z评分。我想用十分位数来排列它们。
获取一个日期的z分数的代码:

import scipy.stats as stats
stats.zscore(equity_daily[equity_daily.Date == "2017-07-20"].overnight_return.rank().dropna().values)

字符串
为了根据当天所有股票的排名来获得每天的z分数,我得到了透视表,然后创建了一个包含z分数的新表:

equity_daily.pivot(columns = "ticker", values = "overnight_return", index = "Date")


但发生了以下错误:
ValueError:索引包含重复条目,无法整形
预期结果:
| 股票报价机|隔夜回程|十分位秩| Decile_rank |
| --|--|--| ------------ |
| CLXT| 0.019556| 0| 0 |
| CLXT| 0.039778|二个| 2 |
| ETNB| -0.006186 |九| 9 |
| ETNB| 0.024590|八| 8 |

qmelpv7a

qmelpv7a1#

没有更多的数据样本,很难测试自己,但是...
尝试使用pivot_table()而不是pivot()pivot不进行聚合

2wnc66cl

2wnc66cl2#

from alphalens.tears import (create_returns_tear_sheet,
                      create_information_tear_sheet,
                      create_turnover_tear_sheet,
                      create_summary_tear_sheet,
                      create_full_tear_sheet,
                      create_event_returns_tear_sheet,
                      create_event_study_tear_sheet)

from alphalens.utils import get_clean_factor_and_forward_returns

def z_score(x):
    """Helper function for Normalization"""
    return stats.zscore(x)

equity_daily["overnight_rank"] = equity_daily.groupby("Date")["overnight_return"].rank(method = "first")
equity_daily["overnight_normalized"] = equity_daily.groupby("Date")["overnight_rank"].apply(z_score)
equity_daily["overnight_normalized"] = equity_daily.overnight_normalized.shift(-1)
equity_daily = equity_daily.dropna()

factor = equity_daily[["Date", "ticker", "overnight_normalized"]].\
                groupby([pd.Grouper(key = "Date"), "ticker"]).sum()

prices = equity_daily.pivot(columns = "ticker", values = "Close", index = "Date")

factor_data = get_clean_factor_and_forward_returns(
    factor = factor,
    prices = prices,
    groupby = None,
    binning_by_group = False,
    quantiles = 10,
    bins = None,
    periods = (1, 5, 10),
    filter_zscore = 20,
    groupby_labels = None,
    max_loss = 0.35
)

字符串

相关问题