输入json文件
{
"CarBrands": [{
"carid": "100bw",
"filter_condition": " (YEAR == \"2009\" AND FACTS BETWEEN 0001 AND 200 AND STORE==\"UK"\" AND RESALE in (\"2015\")) ",
},
{
"carid": "25xw",
"filter_condition": " (YEAR == \"2010\" AND FACTS NOT IN (234,435,456) AND FACTS between 220 AND 500 AND RESALE in (\"2017\")) ",
},
{
"carid": "masy",
"filter_condition": " (YEAR == \"2010\" AND STORE==\"USA"\" AND (FACTS BETWEEN 600 AND 700 OR FACTS BETWEEN 810 AND 920) AND RESALE in (\"2018\")) ",
},
{
"carid": "mxw",
"filter_condition": " (YEAR == \"2013\" AND FACTS ==\"1541\" AND RESALE in (\"2019\")) ",
}
]
}
请注意:我们有一个事实表,上面提到的过滤条件来自jsonapi。
以下是需要实现的目标
Select * from Car_transactions where car_facts = (FACTS BETWEEN 0001 AND 200 ) OR (FACTS NOT IN (234,435,456) AND FACTS between 220 AND 500)
OR (FACTS BETWEEN 600 AND 700 OR FACTS BETWEEN 810 AND 920) OR FACTS =541
import sparkSession.implicits._
val tagsDF = sparkSession.read.option("multiLine", true).option("inferSchema", true).json("src/main/resources/carbrands.json");
val df = tagsDF.select(($"CarBrands") as "car_brands")
3条答案
按热度按时间tez616oj1#
你可以用
regexp_extract
正则表达式模式匹配一个或多个“在…之间转售”子句,如下所示:q3qa4bjr2#
你可以用这个试试
regex_extract
在spark scala符合您需求的模式。
eqoofvh93#
编辑: