我是scala初学者。我尝试在表的列中查找空值的计数,并在map中添加列名和count作为键值对。下面的代码无法按预期工作。请指导我如何修改此代码使其工作
def nullCheck(databaseName:String,tableName:String) ={
var map = scala.collection.mutable.Map[String, Int]()
validationColumn = Array(col1,col2)
for(i <- 0 to validationColumn.length) {
val nullVal = spark.sql(s"select count(*) from $databaseName.$tableName where validationColumn(i) is NULL")
if(nullval == 0)
map(validationColumn(i)) = nullVal
map
}
函数应该返回((col1,count),(col2,count))作为Map
1条答案
按热度按时间oalqel3c1#
这可以通过创建一个动态sql字符串,然后Map它来实现。您的方法多次读取相同的数据
这是解决办法。我使用了一个“示例”Dataframe。
我们的validationcolumns是
name
以及age
在我们计算空值的地方,我们把它们放在一个列表中scala> val validationColumns = List("name","age")
我们创建一个sql字符串来驱动整个计算scala> val sqlStr = "select " + validationColumns.map(x => "sum(" + x + "_count) AS " + x + "_sum" ).mkString(",") + " from (select " + validationColumns.map(x => "case when " + x + " = '$$' then 1 else 0 end AS " + x + "_count").mkString(",") + " from " +" (select" + validationColumns.map(x => " nvl( " + x +",'$$') as " + x).mkString(",") + " from example_table where " + validationColumns.map(x => x + " is null ").mkString("or ") + " ) layer1 ) layer2 "
解析为==>"select sum(name_count) AS name_sum,sum(age_count) AS age_sum from (select case when name = '$$' then 1 else 0 end AS name_count,case when age = '$$' then 1 else 0 end AS age_count from (select nvl( name,'$$') as name, nvl( age,'$$') as age from example_table where name is null or age is null ) layer1 ) layer2 "
现在我们创建一个Dataframe的临时视图inputDf.createOrReplaceTempView("example_table")
唯一要做的就是执行sql
创造一个Map
这是由validationColumns zip spark.sql(sqlStr).collect.map(_.toSeq).flatten.toList toMap
和结果Map(name -> 1, age -> 2)
//很明显你可以让它类型安全