postgresql 相当于google bigquery中的字符串包含

twh00eeo 于 2023-02-15 发布在 PostgreSQL

关注(0)|答案(6)|浏览(202)

我有一个如下所示的表

我希望创建two new binary columns，指示主题是否具有steroids和aspirin。
我试了下面的方法，但不起作用

select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') 
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%') 
then 1 else 0 end as aspirin,
from db.Team01.Table_1

SELECT 
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')

我希望输出如下所示

postgresql

来源：https://stackoverflow.com/questions/58197814/equivalent-of-string-contains-in-google-bigquery

6条答案

按热度按时间

qij5mzcb1#

以下是BigQuery标准SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id

如果要应用于问题中示例数据-结果为

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1

注意：我使用LIKE on steroids，而不是以冗长冗余的文本结尾的简单LIKE，它是REGEXP_CONTAINS

赞(0）回复(0）举报 2023-02-15

yws3nbqq2#

在Postgres中，我建议使用filter子句：

select subject_id,
       count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
       count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

在BigQuery中，我建议使用countif()：

select subject_id,
       countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
       countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

您可以使用sum(case when . . . end)作为一种更通用的方法。然而，每个数据库都有一种更“本地”的方式来表达这种逻辑。顺便说一下，FILTER子句 * 是 * 标准SQL，只是没有被广泛采用。

赞(0）回复(0）举报 2023-02-15

pokxtpni3#

使用条件聚集。此解决方案适用于大多数（如果不是全部）RDBMS：

SELECT
    subject_id,
    MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
    MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id

NB：不清楚为什么要使用LIKE，因为看起来好像有完全匹配的内容;我把LIKE条件转化为等式。

赞(0）回复(0）举报 2023-02-15

k75qkfdt4#

您缺少group-by

select subject_id,
    sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
       then 1 else 0 end) as steroids,
    sum(case when lower(drug) in ('peptide','paracetamol') 
       then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

使用like关键字

select subject_id,
 sum(case when lower(drug) like '%cortisol%'
        or lower(drug) like '%cortisone%'
        or lower(drug) like '%dexamethasone%'   
    then 1 else 0 end) as steroids,
    sum(case when lower(drug) like '%peptide%'
        or lower(drug) like '%paracetamol%'
    then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

赞(0）回复(0）举报 2023-02-15

cunj1qz15#

另一个可能更直观的解决方案是使用BigQuery Contains_Substr返回布尔结果。

赞(0）回复(0）举报 2023-02-15

holgip5t6#

我没有使用过BigQuery，但一直在阅读研究它的文档。我在设计阶段研究选择排序规则的影响时遇到了这个问题。
我要么错了，要么这是一个新的功能，因为上面的答案。

包含_替代品

https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#contains_substr
执行规范化的、不区分大小写的搜索，以查看表达式中是否存在作为子字符串的值。如果值存在，则返回TRUE，否则返回FALSE。
在比较值之前，使用NFKC规范化对值进行规范化和大小写折叠。不支持通配符搜索。

赞(0）回复(0）举报 2023-02-15

我来回答

postgresql 相当于google bigquery中的字符串包含

6条答案

包含_替代品

相关问题

热门标签

最新问答