环境
带scala的spark 2.4.5
问题
我有两张table:
表用户
列用户id:用户的id
列用户名:用户的名称
表条件
列条件\u id:此条件的id
列表达式:作为 user_df.filter()
( _df
意味着它是一个 DataFrame
)将选择表达式的计算结果为true的用户。此列的示例: "user_id == 1 or user_id == 2"
我想要的是:(伪代码)
-- variable for the final result, initialized as an empty list
-- has type `list of user` (not `list of (list of user)`)
final_users = []
for each cond in table condition:
expr = cond.expr -- now variable expr is a string
-- an example: (note that a string literal is passed to the function filter)
-- partial_users = user_df.filter("user_id == 1 or user_id == 2")
partial_users = user_df.filter(expr)
partial_users.withColumn("condition_id", cond.condition_id) -- add a new column
final_users.extend(partial_users)
如何在spark(scala)中实现这一点((不带自定义项)
暂无答案!
目前还没有任何答案,快来回答吧!