scala—如何从Dataframe列的逗号分隔字符串中提取db name和table name

ttvkxqim  于 2021-07-12  发布在  Spark
关注(0)|答案(1)|浏览(264)

我有一个Dataframe列 table_name 具有以下字符串值:

tradingpartner.parent_supplier,lookup.store,lab_promo_invoice.tl_cc_mbr_prc_wkly_inv,lab_promo_invoice.mpp_club_card_promotion_funding_view,lab_promo_invoice.supplier_sale_apportionment_cc,tradingpartner.supplier,stores.rpm_zone_location_mapping,lookup.calendar

如何从上面的字符串中提取db name和table name并将其存储为 DB name 在一列和 tablename 在另一个专栏里。
我希望输出如下

qc6wkl3g

qc6wkl3g1#

一个可能的解决方案是定义两个不同的udf来实现这个目标。
从这个输入Dataframe开始,称为 dfInput :

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|table_name                                                                                                                                                                                                                                                                        |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|tradingpartner.parent_supplier,lookup.store,lab_promo_invoice.tl_cc_mbr_prc_wkly_inv,lab_promo_invoice.mpp_club_card_promotion_funding_view,lab_promo_invoice.supplier_sale_apportionment_cc,tradingpartner.supplier,stores.rpm_zone_location_mapping,lookup.calendar,sauces,plant|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

第一个自定义项,称为 dbName ,负责从输入字符串列中获取字符串中的所有数据库名称:

def dbNames(k: String): String = {

  // this String is the returning value 
  // containing all the databases from the input string
  var dbNames=""

  // split the input String by comma
  val arrays = k.split(",")
  for (str <- arrays){

    // if in the input String there is a value like
    // database.table take just the database value
    if(str.contains(".")) {
      val indexOfPoint = str.indexOf(".")
      dbNames += str.substring(0, indexOfPoint) + ", "
    }
  }
  // delete last occurence of char ", "
  return dbNames.dropRight(2)
}
val dbName = udf[String, String](dbNames)

第二个自定义项,称为 tableName ,负责从输入字符串列中获取字符串中的所有表名:

def tableNames(k: String): String = {

  // this String is the returning value 
  // containing all the tables from the input string
  var tableNames=""

  // split the input String by comma
  val arrays = k.split(",")
  for (str <- arrays){

    // if in the input String there is a value like
    // database.table take just the table value
    // else is intended to be just the table name
    if(str.contains(".")) {
      val indexOfPoint = str.indexOf(".")
      tableNames += str.substring(indexOfPoint+1) + ", "
    }
    else tableNames += str + ", "
  }
  // delete last occurence of char ", "
  return tableNames.dropRight(2)
}
val tableName = udf[String, String](tableNames)

然后,为了获得预期的输出,我们需要如下调用UDF:

val dfOutput = dfInput.withColumn("DBName", dbName(col("table_name")))
                      .withColumn("Table", tableName(col("table_name")))
dfOutput.show(true)
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|table_name                                                                                                                                                                                                                                                                        |DBName                                                                                                         |Table                                                                                                                                                                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|tradingpartner.parent_supplier,lookup.store,lab_promo_invoice.tl_cc_mbr_prc_wkly_inv,lab_promo_invoice.mpp_club_card_promotion_funding_view,lab_promo_invoice.supplier_sale_apportionment_cc,tradingpartner.supplier,stores.rpm_zone_location_mapping,lookup.calendar,sauces,plant|tradingpartner, lookup, lab_promo_invoice, lab_promo_invoice, lab_promo_invoice, tradingpartner, stores, lookup|parent_supplier, store, tl_cc_mbr_prc_wkly_inv, mpp_club_card_promotion_funding_view, supplier_sale_apportionment_cc, supplier, rpm_zone_location_mapping, calendar, sauces, plant|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

相关问题