我有一个Dataframe,它包含行的seq。我想在不改变顺序的情况下一行一行地迭代。
我试着用下面的代码。
scala> val df = Seq(
| (0,"Load","employeeview", "employee.empdetails", null ),
| (1,"Query","employeecountview",null,"select count(*) from employeeview"),
| (2,"store", "employeecountview",null,null)
| ).toDF("id", "Operation","ViewName","DiectoryName","Query")
df: org.apache.spark.sql.DataFrame = [id: int, Operation: string ... 3 more fields]
scala> df.show()
+---+---------+-----------------+-------------------+--------------------+
| id|Operation| ViewName| DiectoryName| Query|
+---+---------+-----------------+-------------------+--------------------+
| 0| Load| employeeview|employee.empdetails| null|
| 1| Query|employeecountview| null|select count(*) f...|
| 2| store|employeecountview| null| null|
+---+---------+-----------------+-------------------+--------------------+
scala> val dfcount = df.count().toInt
dfcount: Int = 3
scala> for( a <- 0 to dfcount-1){
// first Iteration I want id =0 Operation="Load" ViewName="employeeview" DiectoryName="employee.empdetails" Query= null
// second iteration I want id=1 Operation="Query" ViewName="employeecountview" DiectoryName="null" Query= "select count(*) from employeeview"
// Third Iteration I want id= 2 Operation= "store" ViewName="employeecountview" DiectoryName="null" Query= "null"
//ignore below sample code
// val Operation = get(Operation(i))
// if (Operation=="Load"){
// based on operation type i am calling appropriate function and passing entire row as a parameter
// } else if(Operation= "Query"){
//
// } else if(Operation= "store"){
// }
}
注意:加工顺序不得更改(这里唯一的标识是id,因此我们必须执行行0、1、2等)
提前谢谢。
2条答案
按热度按时间fkaflof61#
看看这个:
编辑1:
ktca8awb2#
这是我使用数据集的解决方案。这将提供类型安全和更干净的代码。但必须以业绩为基准,变化不大。
为了测试,我只返回了一个字符串。可以返回任何基元类型。这将返回: