在使用scala时从json值中删除额外的“”

nwsw7zdq  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(328)

我一直在尝试使用scala清理json对象,但无法从json值示例“last\u nm”中删除多余的“:”smith“libby”mary“
字符串中多余的逗号产生了问题。
这是我用来清理json文件的代码

val readjson = sparkSession.sparkContext.textFile("dev.json")
    val json=readjson.map(element=>element.replace("\"\":\"\"","\":\"")
   .replace("\"\",\"\"","\",\"")
   .replace("\"\":","\":")
   .replace(",\"\"",",\"")
   .replace("\"{\"\"","{\"")
   .replace("\"\"}\"","\"}")
   .replaceAll("\\u0009"," "))
   .saveAsTextFile("JSON")

下面是我要清理的json字符串(为便于阅读,添加了空格):

{
  "SEQ_NO":597216,
  "PROV_DEMOG_SK":597216,
  "PROV_ID":"QMP000003371283",
  "FRST_NM":"",
  "LAST_NM":"SMITH "LIBBY" MARY",
  "FUL_NM":"",
  "GENDR_CD":"",
  "PROV_NPI":"",
  "PROV_STAT":"Incomplete",
  "PROV_TY":"03",
  "DT_OF_BRTH":"",
  "PROFPROFL_DESGTN":"",
  "ETL_LAST_UPDT_DT_TM":"2020-04-28 11:43:31.000000",
  "PROV_CLSFTN_CD":"A",
  "SRC_DATA_KEY":50,
  "OPRN_CD":"I",
  "REC_SET":"F"
}

我应该在代码中添加什么来从json字符串的最后一个值中删除额外的“”。

lg40wkob

lg40wkob1#

检查以下代码

df.map(_.replaceAll(" \""," ").replaceAll("\" "," ")).show(false)

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                                                                                                                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"SEQ_NO":597216,"PROV_DEMOG_SK":597216,"PROV_ID":"QMP000003371283","FRST_NM":"","LAST_NM":"SMITH LIBBY MARY","FUL_NM":"","GENDR_CD":"","PROV_NPI":"","PROV_STAT":"Incomplete","PROV_TY":"03","DT_OF_BRTH":"","PROFPROFL_DESGTN":"","ETL_LAST_UPDT_DT_TM":"2020-04-28 11:43:31.000000","PROV_CLSFTN_CD":"A","SRC_DATA_KEY":50,"OPRN_CD":"I","REC_SET":"F"}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

相关问题