当我在Pyspark中多次使用withColumn()更新列的值时,收到StackOverflowException错误。
我的代码与我得到了StackOverflowException是:
df = df.withColumn("element", when(df["element"] == 1,"first").otherwise(df["element"]))
df = df.withColumn("element", when(df["element"] == 2,"second").otherwise(df["element"]))
df = df.withColumn("element", when(df["element"] == 3,"third").otherwise(df["element"]))
df = df.withColumn("element", when(df["element"] == 4,"fourth").otherwise(df["element"]))
Spark文档建议使用select()函数,所以我尝试了一下:
df = df.select("*", (when(df["element"] == 1,"first")).alias("element"))
df = df.select("*", (when(df["element"] == 2,"second")).alias("element"))
df = df.select("*", (when(df["element"] == 3,"third")).alias("element"))
df = df.select("*", (when(df["element"] == 4,"fourth")).alias("element"))
但是我收到了一个错误,因为列"element"没有更新,另一个同名的列被创建了。错误如下:
Py4JJavaError: An error occurred while calling o3723.apply.
: org.apache.spark.sql.AnalysisException: Reference 'element' is ambiguous, could be: element, element.;
at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:259)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
at org.apache.spark.sql.Dataset.resolve(Dataset.scala:229)
at org.apache.spark.sql.Dataset.col(Dataset.scala:1282)
at org.apache.spark.sql.Dataset.apply(Dataset.scala:1249)
at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
我怎么能这么做?
先谢谢你!
1条答案
按热度按时间zsohkypk1#
我想你可以多次使用
.when
,然后再使用.otherwise
.另外,你应该给新列起个不同的名字:使用
.select
:输出示例: