我在“qwerty”表中有一个变量列调用“request”,它在类似json的
{
"ID": "123123",
"workflowHistory": [
{
"id": "666",
"workflowType": "CCC",
"entityId": "123123",
"creator": {
"id": "503081",
"displayName": "AGENT2",
"email": "AGENT2@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:58:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "555",
"workflowType": "AAA",
"entityId": "123123",
"creator": {
"id": "503080",
"displayName": "AGENT1",
"email": "AGENT1@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:55:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "444",
"workflowType": "xyz",
"entityId": "123123",
"creator": {
"id": "503080",
"displayName": "AGENT1",
"email": "AGENT1@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:19:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "333",
"workflowType": "BBB",
"entityId": "123123",
"creator": {
"id": "503079",
"displayName": "AGENT0",
"email": "AGENT0@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:10:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "222",
"workflowType": "ZZZ",
"entityId": "123123",
"creator": {
"id": "503079",
"displayName": "AGENT0",
"email": "AGENT0@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:08:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
}
]
}
另外,“qwerty”表有第一个\u日期和pk article \u id(与相同)request:workflowhistory.id),我正在尝试获取具有以下列的输出:
身份证件
代理的上次创建日期
首先为代理创建日期
由agentn-1生成的上一个createddate
由agentn+1生成的下一个createddate
我想输出如下:
输出
为此,我构建了一个查询,如下所示:
将工作流解析为(
SELECT ARTICLE_ID,
HARVEST_DATE,
value:createdDate::timestamp_tz AS create_date,
value:creator:email AS email,
value:workflowType AS workflowType,
value:value AS value
FROM 'QWERTY', lateral flatten( input => REQUEST:workflowHistory )
),
lag_Agent_timing AS
(SELECT
WorkFlow_Parsed.ARTICLE_ID AS ARTICLE_ID,WorkFlow_Parsed.email,LAG(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS lag_date_value
FROM WorkFlow_Parsed),
lead_agent_timing AS
(SELECT
WorkFlow_Parsed.ARTICLE_ID AS ARTICLE_ID,WorkFlow_Parsed.email,LEAD(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS lead_date_value
FROM WorkFlow_Parsed)
SELECT
DISTINCT
WorkFlow_Parsed.ARTICLE_ID AS _ARTICLE_ID,
WorkFlow_Parsed.email AS _email,
last_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS last_date_value,
first_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS first_date_value,
MAX(lag_Agent_timing.lag_date_value),
MIN(lead_agent_timing.lead_date_value)
FROM WorkFlow_Parsed
JOIN lag_Agent_timing ON WorkFlow_Parsed.ARTICLE_ID=lag_Agent_timing.ARTICLE_ID AND lag_Agent_timing.email=WorkFlow_Parsed.email
JOIN lead_agent_timing ON WorkFlow_Parsed.ARTICLE_ID=lead_agent_timing.ARTICLE_ID AND lead_agent_timing.email=WorkFlow_Parsed.email
GROUP BY _ARTICLE_ID,_email
但出现错误:“[sys\u vw.create\u date\u 1]不是有效的group by表达式”`
我怎么能修好它?
2条答案
按热度按时间ojsjcaue1#
[sys\u vw.create\u date\u 1]不是有效的group by表达式
错误来自于你使用
GROUP BY
在决赛中SELECT
查询。它指出您正在引用/使用Workflow_Parsed.create_date
在查询中作为非组列,但它不是GROUP BY _ARTICLE_ID, _email
表达式,即与[Workflow_Parsed.create_date] is not a valid group by expression
如果您稍微简化一下查询,您将收到。snowflake不允许在窗口函数表达式上进行聚合,如果您想混合
GROUP BY
使用窗口函数,尝试将查询嵌套在结构中,例如SELECT cols, aggregate(cols) FROM (SELECT cols, window(cols)) GROUP BY cols
将两者分开(即首先对所有行应用窗口函数,然后对它生成的整个结果进行分组)。我不确定窗口函数在您的示例查询中尝试了什么,因为我没有看到代理的
n ± 1
关系在其中的任何位置,但是根据您描述的需求和包含的示例输出,以下应该可以工作(它只使用标量子查询,没有窗口函数):对于问题中包含的单个json行输入,这将生成:
jq6vz3qz2#
我分享了如何使用推荐语法的代码