我需要验证我写的代码是否正确。为此,我必须使用以下公式:
(nvl(units_inflow,0)- nvl(units_inflow_can,0)-nvl(units_outflow,0)+nvl(units_outflow_can,0))*nav_value
这段代码在oraclesql中,我需要在pyspark中执行相同的操作。到现在为止,就像 nvl
是用在上面的代码,我用过 fill()
在pyspark中,将空值替换为0。
在我的t3Dataframe中有5列,即
["units_inflow","units_inflow_can","units_outflow","units_outflow_can","nav_value"]
到目前为止,我编写的代码是:
t3= t3.na.fill(value=0,subset=["units_inflow","units_inflow_can","units_outflow","units_outflow_can"])
z = t3.select("units_inflow").groupby().sum().show()
y = t3.select("units_inflow_can").groupby().sum().show()
x = t3.select("units_outflow").groupby().sum().show()
w = t3.select("units_outflow_can").groupby().sum().show()
u = t3.select("nav_value").groupby().sum().collect()
print(u)
尽管做了这些之后,我无法得到输出。我想我在代码转换的某个地方出错了。考虑到每列输出的总和,我在计算器中分别进行了算术运算。
1条答案
按热度按时间0s7z1bwu1#
Oracle
nvl
功能与coalesce
,您只需通过替换nvl
功能:或使用sql表达式: