我的表中有一个字符串列,如下所示:
accountNumber:123456 {"accountNumber":"123456"}
accountNumber:123456
{"accountNumber":"123456"}
我需要一个只从这些字符串中提取123456的动态方法。你能提出一个解决办法吗。
okxuctiv1#
使用 REGEXP_SUBSTR(…) 使用正则表达式模式提取子字符串的内置函数。如果每列值中只有一个数字,则数字模式或数字字符范围语法就足够了:
REGEXP_SUBSTR(…)
SELECT'accountNumber:123456' i1,regexp_substr(i1, '[0-9]+') r1,'{"accountNumber":"123456"}' i2,regexp_substr(i2, '[0-9]+') r2;+----------------------+--------+----------------------------+--------+ | I1 | R1 | I2 | R2 ||----------------------+--------+----------------------------+--------|| accountNumber:123456 | 123456 | {"accountNumber":"123456"} | 123456 |+----------------------+--------+----------------------------+--------+
SELECT
'accountNumber:123456' i1,
regexp_substr(i1, '[0-9]+') r1,
'{"accountNumber":"123456"}' i2,
regexp_substr(i2, '[0-9]+') r2;
+----------------------+--------+----------------------------+--------+
| I1 | R1 | I2 | R2 |
|----------------------+--------+----------------------------+--------|
| accountNumber:123456 | 123456 | {"accountNumber":"123456"} | 123456 |
如果数字正好是6位数宽,请使用 {n} 重复语法:
{n}
select'accountNumber:123456,anotherNumber:123' i1,regexp_substr(i1, '[0-9]{6}') r1,'{"accountNumber":"123456", "anotherNumber": 123}' i2,regexp_substr(i2,'[0-9]{6}') r2;+----------------------------------------+--------+--------------------------------------------------+--------+| I1 | R1 | I2 | R2 ||----------------------------------------+--------+--------------------------------------------------+--------|| accountNumber:123456,anotherNumber:123 | 123456 | {"accountNumber":"123456", "anotherNumber": 123} | 123456 |+----------------------------------------+--------+--------------------------------------------------+--------+
select
'accountNumber:123456,anotherNumber:123' i1,
regexp_substr(i1, '[0-9]{6}') r1,
'{"accountNumber":"123456", "anotherNumber": 123}' i2,
regexp_substr(i2,'[0-9]{6}') r2;
+----------------------------------------+--------+--------------------------------------------------+--------+
|----------------------------------------+--------+--------------------------------------------------+--------|
| accountNumber:123456,anotherNumber:123 | 123456 | {"accountNumber":"123456", "anotherNumber": 123} | 123456 |
如果数字只能跟在文字后面 accountNumber ,您可以介绍(捕获组):
accountNumber
select'accountNumber:123456,anotherNumber:123,somethingElse:456789' i1,regexp_substr(i1, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r1,'{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}' i2,regexp_substr(i2, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r2;+-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+| I1 | R1 | I2 | R2 ||-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------|| accountNumber:123456,anotherNumber:123,somethingElse:456789 | 123456 | {"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789} | 123456 |+-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+
'accountNumber:123456,anotherNumber:123,somethingElse:456789' i1,
regexp_substr(i1, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r1,
'{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}' i2,
regexp_substr(i2, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r2;
+-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+
|-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------|
| accountNumber:123456,anotherNumber:123,somethingElse:456789 | 123456 | {"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789} | 123456 |
构建一个完全正确的正则表达式需要更多关于数据中所有可能的方差的知识。尝试在regex101、regexr等网站上用一个好的测试集以交互方式构建模式,这样可以更容易地开发它们。注意:如果您的数据实际上始终是json格式的,那么snowflake允许将它们解析为 VARIANT 更自然地查询它们的数据类型:
VARIANT
selectparse_json('{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}'):accountNumber::integer account_number;+----------------+ | ACCOUNT_NUMBER ||----------------|| 123456 |+----------------+
parse_json('{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}'):accountNumber::integer account_number;
+----------------+
| ACCOUNT_NUMBER |
|----------------|
| 123456 |
1条答案
按热度按时间okxuctiv1#
使用
REGEXP_SUBSTR(…)
使用正则表达式模式提取子字符串的内置函数。如果每列值中只有一个数字,则数字模式或数字字符范围语法就足够了:
如果数字正好是6位数宽,请使用
{n}
重复语法:如果数字只能跟在文字后面
accountNumber
,您可以介绍(捕获组):构建一个完全正确的正则表达式需要更多关于数据中所有可能的方差的知识。尝试在regex101、regexr等网站上用一个好的测试集以交互方式构建模式,这样可以更容易地开发它们。
注意:如果您的数据实际上始终是json格式的,那么snowflake允许将它们解析为
VARIANT
更自然地查询它们的数据类型: