如何从sql查询中的create/update/insert语句中提取表名?

dvtswwa3  于 2021-07-29  发布在  Java
关注(0)|答案(2)|浏览(652)

我正试图解析正在创建、插入或从存储在表列中的以下sql查询更新的表。
我们称之为表列 query . 下面是一些示例数据,以演示数据的外观变化。

with sample_data as (
  select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
  select 2 as id, 'CREATE OR REPLACE TABLE tbl1 ...' as query union all
  select 3 as id, 'DROP TABLE IF EXISTS tbl1; CREATE TABLE tbl1 ...' as query union all
  select 4 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
  select 5 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
  select 6 as id, 'UPDATE tbl3 SET col1 = ...' as query union all
  select 7 as id, '/*some garbage comments*/ UPDATE tbl3 SET col1 = ...' as query union all  
  select 8 as id, 'DELETE tbl4 ...' as query
),

以下是查询的格式(我们正在尝试提取 table_name ):

1 some optional statements like drop table 创建 some comments or optional statement like OR REPLACE

table_name everything else #2 some optional statements like drop table 插入 some comments 进入
some comments table_name #3 some optional statements like drop table 更新 some comments table_name everything else

erhoui1w

erhoui1w1#

正则表达式

为了构造合适的正则表达式,让我们从以下相对简单/可读的版本开始: ((CREATE( OR REPLACE)?|DROP) TABLE( IF EXISTS)?|UPDATE|DELETE|INSERT INTO) ([^\s\/*]+) 上面的所有空格都可以替换为“至少一个空格字符”,即。 \s+ . 但我们也需要允许评论。对于一个看起来像 /*anything*/ 正则表达式看起来像 \/\*.*\*\/ (其中注解字符转义为 \ “任何事”都是 .* 在中间)。考虑到可能有多个这样的注解(可选地用空格分隔),我们最终得到 (\s*\/\*.*\*\/\s*?)*\s+ . 把这个插在任何有空间的地方都可以: ((CREATE((\s*\/\*.*\*\/\s*?)*\s+OR(\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(\s*\/\*.*\*\/\s*?)*\s+TABLE((\s*\/\*.*\*\/\s*?)*\s+IF(\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(\s*\/\*.*\*\/\s*?)*\s+INTO)(\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+) 需要做进一步的改进:括号中的表达式已经用于选择,例如。 (CHOICE1|CHOICE2) . 但是这种语法将它们作为捕获组来包含。实际上,表名只需要一个捕获组,这样就可以通过 ?: ,例如。 (?:CHOICE1|CHOICE2) . 这将提供: (?:(?:CREATE(?:(?:\s*\/\*.*\*\/\s*?)*\s+OR(?:\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(?:\s*\/\*.*\*\/\s*?)*\s+TABLE(?:(?:\s*\/\*.*\*\/\s*?)*\s+IF(?:\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(?:\s*\/\*.*\*\/\s*?)*\s+INTO)(?:\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+) ###在线正则表达式演示
下面是一个使用示例的演示:regex101 demo

sql语句

用于regexp\u extract的googlebigquery文档表示,它将返回与捕获组匹配的子字符串。所以我希望这样的事情能奏效:

with sample_data as (
  select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
  select 2 as id, 'CREATE OR REPLACE TABLE tbl1 ...' as query union all
  select 3 as id, 'DROP TABLE IF EXISTS tbl1; CREATE TABLE tbl1 ...' as query union all
  select 4 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
  select 5 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
  select 6 as id, 'UPDATE tbl3 SET col1 = ...' as query union all
  select 7 as id, '/*some garbage comments*/ UPDATE tbl3 SET col1 = ...' as query union all  
  select 8 as id, 'DELETE tbl4 ...' as query
)

SELECT
  *, REGEXP_EXTRACT(query, r"(?:(?:CREATE(?:(?:\s*\/\*.*\*\/\s*?)*\s+OR(?:\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(?:\s*\/\*.*\*\/\s*?)*\s+TABLE(?:(?:\s*\/\*.*\*\/\s*?)*\s+IF(?:\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(?:\s*\/\*.*\*\/\s*?)*\s+INTO)(?:\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+)") AS table_name
FROM sample_data;

(以上未经测试,如有任何问题,请在评论中告知我。)

vwoqyblh

vwoqyblh2#

我认为这真的取决于你的数据,但你可能会发现一些成功使用这样的方法:

with data as (
  select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
  select 2 as id, 'INSERT INTO tbl2 ...' as query union all
  select 3 as id, 'UPDATE tbl3 ...' as query union all
  select 4 as id, 'DELETE tbl4 ...' as query
),
splitted as (
  select id, split(query, ' ') as query_parts from data
)
select
  id,
  case 
    when query_parts[safe_offset(0)] in('CREATE', 'INSERT') then query_parts[safe_offset(2)]
    when query_parts[safe_offset(0)] in('UPDATE', 'DELETE') then query_parts[safe_offset(1)]
    else 'Error'
  end as table_name
from splitted

当然,这取决于你的语言的简洁性和语法 query 列。另外,如果您的表名符合 project.table.dataset 你需要进一步分裂。

相关问题