在bigquery上使用一个用户定义的函数从杂乱的数据集中提取电子邮件时,我面临着一个问题:temp user defined function(udf)的主体中不允许使用数组\u agg()。
CREATE TEMP FUNCTION GET_EMAIL(emails ARRAY<STRING>, index INT64) AS (
ARRAY_AGG(
DISTINCT
(SELECT * FROM
UNNEST(
SPLIT(
REPLACE(
LOWER(
ARRAY_TO_STRING(emails, ",")
)," ", ""
)
)
) AS e where e like '%@%'
) IGNORE NULLS
)[SAFE_OFFSET(index)]
);
SELECT GET_EMAIL(["bob@hotmail.com,test@gmail.com", "12345", "bon@yahoo.com"],1) as email_1
我尝试绕过数组\u agg,方法是选择from unnest with offset,然后选择offset作为索引的位置。
但是,现在有一个列限制(在标量子查询select子句中不超过一列)建议改用select as struct。
我尝试了select as结构:
CREATE TEMP FUNCTION GET_EMAIL(emails ARRAY<STRING>, index INT64) AS (
(SELECT AS STRUCT DISTINCT list.e, list.o FROM
UNNEST(
SPLIT(
REPLACE(
LOWER(
ARRAY_TO_STRING(emails, ", ")
)," ", ""
)
)
) AS list
WITH OFFSET as list.o
WHERE list.e like '%@%' AND list.o = index)
);
SELECT GET_EMAIL(["bob@hotmail.com,test@gmail.com", "12345", "bob@yahoo.com"],1) as email_1
但它不喜欢我的distinct,即使删除它,它也会抱怨解析e和o。
所以我没主意了,我可能结了个结。有人能建议如何在自定义项框架内完成这项工作吗?谢谢。
1条答案
按热度按时间3xiyfsfu1#
以下版本有效
有结果的
或以下版本(只是对原始查询的轻微更正)
显然是同样的结果