解析spark中以换行符分隔的json文件时不产生输出

wrrgggsh  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(371)

换行分隔的json文件示例如下所示。

[
    {"name": "Vishay Electronics", 
    "specifications": " feature low on-resistance and high Zener switching speed\n1/lineup from small signal products to 800V high voltage products\n3  MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems.",
     "url": "https://www.mouser.in/", 
     "image": "https://www.mouser.in/", 
     "downtime": "11PT",
     "inputvolt": "8", 
     "date": "2013-04-01", 
     "upTime": "15M", 
     "description": " feature low on-resistance and high zener speed\n1/lineup from small signal products to 800V high voltage products\n3  MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems."
     },

     {"name": "Vishay Electronics", 
    "specifications": " feature low on-resistance and high zener speed\n1/lineup zener from small signal products to 800V high voltage products\n3  MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems.",
     "url": "https://www.mouser.in/", 
     "image": "https://www.mouser.in/", 
     "downtime": "5PT",
     "inputvolt": "8", 
     "date": "2013-04-01", 
     "upTime": "15M", 
     "description": " feature low on-resistance and high switching speed\n1/lineup from small signal products to 800V high voltage products\n3  MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems."
     },

     {"name": "Vishay Electronics", 
    "specifications": " feature low on-resistance and high switching speed\n1/lineup from small signal products to 800V high voltage products\n3  MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems.",
     "url": "https://www.mouser.in/", 
     "image": "https://www.mouser.in/", 
     "downtime": "2PT",
     "inputvolt": "8", 
     "date": "2013-04-01", 
     "upTime": "15M", 
     "description": " feature low on-resistance and high switching speed\n1/lineup from small signal products to 800V high voltage products\n3  MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems."
     }

    ]

当我在线验证json时 https://jsonlint.com/ 看起来不错。

当我在spark和printschema中读取文件时…它看起来很好。

问题来了。
当我运行下面的代码时,它将生成0输出,而不是给出2条记录。
代码。

val df = spark.read.option("multiLine", true).json("D:/bittu/testmyjson.json")
    df.printSchema()
   df.filter($"specifications".contains("%zener%")).show(truncate = false)

但工作不正常。

.
我们如何处理这种情况。。。。请分享你的想法。非常感谢你的评论

ecbunoof

ecbunoof1#

而不是包含使用 .like 或使用 contains 删除 % (as包含对子字符串的检查,并且没有zener后跟/前缀为%的数据)

df.filter($"specifications".like("%zener%")).show(truncate = false)

//using contains remove %
df.filter($"specifications".like("%zener%")).show(truncate = false)

/*
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------------------+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+----------------------+
|date      |description                                                                                                                                                                                                                                          |downtime|image                 |inputvolt|name              |specifications                                                                                                                                                                                                                                         |upTime|url                   |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------------------+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+----------------------+
|2013-04-01| feature low on-resistance and high switching speed
1/lineup from small signal products to 800V high voltage products
3  MOSFETs are highly reliable
standard AEC-Q101
 package lineup flexibly meets the requirements of various in-vehicle systems.|5PT     |https://www.mouser.in/|8        |Vishay Electronics| feature low on-resistance and high zener speed
1/lineup zener from small signal products to 800V high voltage products
3  MOSFETs are highly reliable
standard AEC-Q101
 package lineup flexibly meets the requirements of various in-vehicle systems.|15M   |https://www.mouser.in/|
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------------------+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+----------------------+

* /

不区分大小写的匹配使用 lower 规格栏上的功能,然后执行 like or contains 滤波器 Example: ```
df.filter(lower($"specifications").like("%zener%")).select("specifications").show(false)

df.filter(lower($"specifications").contains("zener")).select("specifications").show(false)

/*
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|specifications |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| feature low on-resistance and high Zener switching speed
1/lineup from small signal products to 800V high voltage products
3 MOSFETs are highly reliable
standard AEC-Q101
package lineup flexibly meets the requirements of various in-vehicle systems.|
| feature low on-resistance and high zener speed
1/lineup zener from small signal products to 800V high voltage products
3 MOSFETs are highly reliable
standard AEC-Q101
package lineup flexibly meets the requirements of various in-vehicle systems. |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

  • /

相关问题