由于174个union all语句,sql server查询ssis转换超时

kqhtkvqz  于 2021-06-24  发布在  Hive
关注(0)|答案(3)|浏览(689)

我在hive和sqlserver中有一个表,其中存储的数据如下。我正在使用ssis将此数据移入sql server。查询时间太长。description列中有大约175个独立的值,这将导致174个union all语句,因此查询将在大约2小时后超时。
sql错误[08s01]:org.apache.thrift.transport.ttTransportException:java.net.sockettimeoutexception:读取超时*
有没有更好的方法来编写这个查询?
谢谢!
Hive:

ID  | Description
----+------------------------------
 1  | Desc1;Desc2;Desc3;Desc4
 2  | Desc1;Desc3;Desc4;Desc5;Desc6
 ...
230 | Desc8;Desc163;Desc9;Desc2;Desc172

sql服务器:

CaseID | GroupID | Description
-------+---------+--------------
   1   |    63   | Desc1
   1   |    44   | Desc2
   1   |    57   | Desc3
   1   |    78   | Desc4
   ...
   2   |    78   | Desc1
   2   |    57   | Desc3

查询:

select 
       case 
             when cas.description like '%Desc1%' then 63 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all 
select 
       case 
             when cas.description like '%Desc2%' then 44
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select 
       case 
             when cas.description like '%Desc3%' then 57 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select 
       case 
             when cas.description like '%Desc4%' then 78 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
...
select 
       case 
             when cas.description like '%Desc175%' then 12 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
shstlldc

shstlldc1#

只运行一次查询。所以没有工会,就不谈这个案子了。使用多播并在ssis中拆分它。

w46czmvw

w46czmvw2#

您可以展开代码并使用 case 要转换为数字:

select (case when code = 'Desc1' then 63
             when code = 'Desc2' then 44
             . . .
        end) as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from svc_case cas join
     account acc
     on acc.id = cas.id lateral view
     explode(split(cas.description, ';')) codes as code
where acc.recordid = '03443FGT';

我不知道你为什么 description <> 'NULL' . 我猜你真的想 is not null --这对于横向连接是不必要的。
另外,如果您有一个引用表,每个代码一行 groupid ,则可以通过加入该函数来进一步简化代码。

iaqfqrcu

iaqfqrcu3#

这是暗箭伤人,但是有两件事可以改进这个查询。首先,让我们解决所有这些问题 UNION ALL s。如果我正确理解了您的查询,您可以取消打印数据以实现相同的目的:

SELECT V.groupid,
       cas.id AS caseid,
       current_timestamp as INSERT_DT
FROM dbo.svc_case cas
     JOIN dbo.account acc on acc.id = cas.id
     CROSS APPLY (VALUES(CASE WHEN cas.description LIKE '%Desc1%' THEN 63 END),
                        (CASE WHEN cas.description LIKE '%Desc2%' THEN 44 END),
                        (CASE WHEN cas.description LIKE '%Desc3%' THEN 57 END),
                        (CASE WHEN cas.description LIKE '%Desc4%' THEN 78 END),
                        --I assume there are 174 more of these
                        (CASE WHEN cas.description LIKE '%Desc178%' THEN 1 END))V(groupid) --The last one isn't correct, but to show how the `APPLY` ends

那你就有你的 WHERE ,因为 LENGTH . LENGTH 实际上不是一个t-sql操作符,所以我希望您实际使用的是sqlserver(如果您不是,这是一个浪费答案的问题,因为上面是特定于t-sql的)。考虑到这一点 LEN(NULL) 退货 NULL ,然后使用 <> '' . 考虑到你已经有了 <> 'NULL' 尽管你可以用 NOT IN :

WHERE cas.description NOT IN('NULL','')
  AND acc.recordid = '03443FGT'

但是,我建议不要存储文本字符串值 'NULL' 在你的专栏中,你应该修正这个问题,并实际存储 NULL ,不是 'NULL' ; 这两个是不同的值,表现非常不同。

相关问题