我想计算所有行中以Sum of EBIT [CY 2]
开头的列的平均值和标准差。我可以通过将10列相加并除以10来计算平均值。
如下所示:
pub fn industry_beta_f(raw_data:DataFrame, marginal_tax_rate:Expr) -> DataFrame{
let df = raw_data.clone().lazy()
.with_columns([
((col("Sum of EBIT [CY 2011] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2012] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2013] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2014] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2015] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2016] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2017] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2018] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2019] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2020] ($USDmm, Historical rate)")) / lit(10.0)).alias("moments_mean"),
(col("Sum of EBIT [CY 2011] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2012] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2013] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2014] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2015] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2016] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2017] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2018] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2019] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2020] ($USDmm, Historical rate)")).std(1).alias("moments_std"),
])
.with_columns([
when(col("moments_mean").gt(lit(0.0)))
.then(col("moments_std") / col("moments_mean"))
.otherwise(f64::NAN)
.alias("Standard deviation in operating income (last 10 years)")
])
.select([col("Industry Name"),
col("Number of firms"),
col("Standard deviation in operating income (last 10 years)")])
.collect()
.unwrap();
return df
}
字符串
我在计算以Sum of EBIT [CY 2]
开头的列的所有行的标准差时遇到了麻烦。因为使用std()
公式,它计算的是每列的标准差,而不是跨行的标准差。
Current Output
Expected Output的
两个输出之间存在巨大的差距。因为,对于当前输出,std
是跨列计算的,而对于预期输出,std
是跨行计算的。
1条答案
按热度按时间8hhllhi21#
您可以使用DataFrame的内置
mean_horizontal
方法计算均值。标准差不支持开箱即用,因此有点棘手。首先计算平均值,然后计算平方误差和,然后将其除以列数-1,如下所示:
字符串