GoogleBigQuery—定义子表,然后使用sql查询该表

jjhzyzn0  于 2021-08-01  发布在  Java
关注(0)|答案(2)|浏览(241)

我有一个包含许多列的表,我想计算每列的唯一值。我知道我能做到

  1. SELECT sho_01, COUNT(*) from sho GROUP BY sho_01
  2. UNION ALL
  3. SELECT sho_02, COUNT(*) from sho GROUP BY sho_02
  4. UNION ALL
  5. ....

在这里 sho 是table和table吗 sho_01 ,.... 是单独的列。顺便说一下,这是bigquery,所以他们使用 UNION ALL .
接下来,我想做同样的事情,但是对于 sho ,说 SELECT * FROM sho WHERE id in (1,2,3) . 有没有一种方法可以先创建子表,然后查询子表?像这样的

  1. SELECT * FROM (SELECT * FROM sho WHERE id IN (1,2,3)) AS t1;
  2. SELECT sho_01, COUNT(*) from t1 GROUP BY sho_01
  3. UNION ALL
  4. SELECT sho_02, COUNT(*) from t1 GROUP BY sho_02
  5. UNION ALL
  6. ....

谢谢

tkclm6bt

tkclm6bt1#

据推测,这些列都是同一类型的。如果是这样,您可以使用数组来简化此过程:

  1. select el.which, el.val, count(*)
  2. from (select t1.*,
  3. array[struct('sho_01' as which, sho_01 as val),
  4. struct('sho_2', show_02),
  5. . . .
  6. ] as ar
  7. from t
  8. ) t cross join
  9. unnest(ar) el
  10. group by el.which, el.val;

然后,您可以通过添加 where 合同前条款 group by .

xqnpmsa8

xqnpmsa82#

下面是bigquery标准sql,允许您避免手动键入列名,甚至提前知道列名

  1. # standardSQL
  2. SELECT
  3. TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') column,
  4. SPLIT(kv, ':')[OFFSET(1)] value,
  5. COUNT(1) cnt
  6. FROM `project.dataset.table` t,
  7. UNNEST(SPLIT(TRIM(TO_JSON_STRING(t), '{}'))) kv
  8. GROUP BY column, value
  9. -- ORDER BY column, value

相关问题