json 在Snowflake / Snowpark中动态横向展平和旋转到列

uurv41yg  于 2024-01-09  发布在  其他
关注(0)|答案(1)|浏览(249)

我正在寻找扁平化一列名为'EVENT_PARAMS_JSON'在 snowflake 具有JSON值。

select EVENT_PARAMS_JSON from GA4_EVENT_DETAILS limit 1;

字符串
导致低于输出。

[
    {
    "key": "firebase_screen",
    "value":
        {
        "double_value": null, "float_value": null, "int_value": null, "string_value": "rewards"
        }
    },
    {
    "key": "firebase_previous_screen",
    "value":
        {
        "double_value": null, "float_value": null, "int_value": null, "string_value": "fuelpay_dashboard_offsite"
        }
    },
    {
    "key": "firebase_screen_class",
    "value":
        {
        "double_value": null, "float_value": null, "int_value": null, "string_value": "View"
        }
    },
    {
    "key": "firebase_previous_class",
    "value":
        {
        "double_value": null, "float_value": null, "int_value": null, "string_value": "View"
        }
    },
    {
    "key": "firebase_screen_id",
    "value":
        {
        "double_value": null, "float_value": null, int_value": "4511208285398238808", string_value": null
        }
    },
    {
    "key": "engaged_session_event",
    "value":
        {
        "double_value": null, "float_value": null, "int_value": 1, "string_value": null
        }
    },
    {
    "key": "theme",
    "value":
        {
        "double_value": null, "float_value": null, "int_value": null, "string_value": "lightMode"
        }
    }
]

我希望我的输出表有关键字作为列名和值作为行。请注意,每个关键字有多个值,正确的值是一个不是空白/null。

x1c 0d1x的数据
我在Snowpark中尝试了这个方法,但得到了一个错误:

100357 (P0000): Python Interpreter Error:

raise error_class(
snowflake.snowpark.exceptions.SnowparkSQLException: (1304): 01b0d218-0001-087c-0000-24952743d6f2: 001007 (22023): SQL compilation error:
invalid type [VARCHAR(16777216)] for parameter '1'
import snowflake.snowpark as snowpark

def main(session: snowpark.Session): 
    # Your code goes here, inside the "main" handler.
    tableName = 'GA4_EVENT_DETAILS'
    df = session.table(tableName).filter((col("app_info_id") == 'au.com.caltex.flagship')&   (col("event_name") == 'screen_view')& (col("event_date") == '20230602')&(col("EVENT_PARAMS_FIREBASE_SCREEN_CLASS")=='View'))

    # Print a sample of the dataframe to standard output.
    df.show()
    df = df.join_table_function("flatten", df["EVENT_PARAMS_JSON"]).drop(["SEQ", "PATH", "INDEX", "THIS"])

的字符串

1qczuiv0

1qczuiv01#

为了说明这主要是一个SQL问题,让我们得到一个CTE,原始输入字符串与你注意到的相同,然后像Mat建议的那样解析_JSON它,看看这是第一级问题:
所以这里有一个CTE,它有字符串,JSON作为VARAINT(维斯PARSE_JSON)

with data as (
    select
        '[{"key":"firebase_screen","value":{"string_value":"rewards","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_previous_screen","value":{"string_value":"fuelpay_dashboard_offsite","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_screen_class","value":{"string_value":"View","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_previous_class","value":{"string_value":"View","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_screen_id","value":{"string_value":null,"int_value":"4511208285398238808","float_value":null,"double_value":null}},{"key":"engaged_session_event","value":{"string_value":null,"int_value":1,"float_value":null,"double_value":null}},{"key":"theme","value":{"string_value":"lightMode","int_value":null,"float_value":null,"double_value":null}}]' as raw_string,
    parse_json(raw_string) as json
)

字符串
现在,如果我们尝试将原始字符串文本变平:

select j.*
from data as d,
lateral flatten(input=>d.raw_string, mode=>'array') as j;


x1c 0d1x的数据
参数“input”的类型[VARCHAR(860)]无效
与您的错误相匹配:
因此,通过PARSE_JSON使用VARAINT类型,这是FLATTEN的预期输入:

select j.*
from data as d,
lateral flatten(input=>d.json, mode=>'array') as j;


成功案例:



所以现在我们有了可以处理的数据,如果我们从值中提取键和值:

select j.value:key
    ,j.value:value
from data as d,
lateral flatten(input=>d.json, mode=>'array') as j;


我们得到:



所以现在我们要把四种值类型提取出来:

select j.value:key
    ,j.value:value:double_value
    ,j.value:value:float_value
    ,j.value:value:int_value
    ,j.value:value:string_value
from data as d,
lateral flatten(input=>d.json, mode=>'array') as j;



所以现在我们要COALESCE或NVL这些值,以保持单一的非空值:

select j.value:key
    ,COALESCE(
        j.value:value:double_value::text, 
        j.value:value:float_value::text,
        j.value:value:int_value::text, 
        j.value:value:string_value::text) as val
from data as d,
lateral flatten(input=>d.json, mode=>'array') as j;


给出:
x1c4d 1x的
所以这一切都很好,现在正如你所说,你想要行键,成为列标题,这是通过PIVOT完成的,在那里你必须知道键/列的名称,通过手动枢轴,你必须知道键的名称,或者动态地,现在你的结果可以根据数据随机变化。
我将展示手动透视方法,因为它是我喜欢的方法。

with data as (
    select
        '[{"key":"firebase_screen","value":{"string_value":"rewards","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_previous_screen","value":{"string_value":"fuelpay_dashboard_offsite","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_screen_class","value":{"string_value":"View","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_previous_class","value":{"string_value":"View","int_value":null,"float_value":null,"double_value":null}},{"key":"firebase_screen_id","value":{"string_value":null,"int_value":"4511208285398238808","float_value":null,"double_value":null}},{"key":"engaged_session_event","value":{"string_value":null,"int_value":1,"float_value":null,"double_value":null}},{"key":"theme","value":{"string_value":"lightMode","int_value":null,"float_value":null,"double_value":null}}]' as raw_string,
    parse_json(raw_string) as json
), data_as_rows as (
    select j.seq, 
        j.value:key::text as col_key
        ,COALESCE(
            j.value:value:double_value::text, 
            j.value:value:float_value::text,
            j.value:value:int_value::text, 
            j.value:value:string_value::text) as val
    from data as d,
    lateral flatten(input=>d.json, mode=>'array') as j
)
select 
    max(iff(col_key='firebase_screen', val, null)) as firebase_screen
    ,max(iff(col_key='firebase_previous_screen', val, null)) as firebase_previous_screen
    ,max(iff(col_key='firebase_screen_class', val, null)) as firebase_screen_class
    ,max(iff(col_key='firebase_previous_class', val, null)) as firebase_previous_class
    ,max(iff(col_key='firebase_screen_id', val, null)) as firebase_screen_id
    ,max(iff(col_key='engaged_session_event', val, null)) as engaged_session_event
    ,max(iff(col_key='theme', val, null)) as theme
from data_as_rows
group by seq


给出:



从FLATTEN中选择SEQ允许将相关的行缝合在一起,因此它被用于GROUP BY中。
现在,你需要将这个SQL重新构建到你的Python指令中,让snowflake将它转换回等价的SQL,这样你的结果就“可以工作”了。

相关问题