如何在BigQuerySQL中使用另一个表来确定列的大小?

wb1gzix0  于 2021-07-26  发布在  Java
关注(0)|答案(3)|浏览(305)

我有一个专栏 grams 在table上 info 它可以是任何正整数。另外,我还有一张table map 它有两列 price 以及 grams ,其中gram可以取一些谨慎的值(比如说50),并按升序排列。
我想在表中添加一列 info 命名 cost 通过抓取 price 从表 map 以至于 info.grams <= map.grams (最小)。换言之,我想把我的梦想变成现实 info.grams 基于 map.grams 并获取 price .
我知道什么?
我可以用 CASE WHEN 使变硬 info.grams 像下面这样,然后连接两个表并获取 price . 但是,由于谨慎的值是不受限制的,我想找到一个干净的方法来做它,而不会使我的查询一团糟。

CASE WHEN grams<=1 THEN 1
WHEN grams<=5 THEN 5
WHEN grams<=10 THEN 10
WHEN grams<=20 THEN 20
WHEN grams<=30 THEN 30
...
enxuqcxy

enxuqcxy1#

下面是bigquery标准sql
您可以使用range\ U bucket函数进行此操作


# standardSQL

SELECT i.*, 
  price_map[SAFE_OFFSET(RANGE_BUCKET(grams, grams_map))] price
FROM `project.dataset.info` i,  
(
  SELECT AS STRUCT 
    ARRAY_AGG(grams + 1 ORDER BY grams) AS grams_map,
    ARRAY_AGG(price ORDER BY grams) AS price_map
  FROM `project.dataset.map`
)

您可以使用下面的示例中的示例数据来测试上面的播放


# standardSQL

WITH `project.dataset.info` AS (
  SELECT 1 AS grams UNION ALL 
  SELECT 3 UNION ALL 
  SELECT 5 UNION ALL 
  SELECT 7 UNION ALL 
  SELECT 10 UNION ALL 
  SELECT 13 UNION ALL 
  SELECT 15 
), `project.dataset.map` AS (
  SELECT 5 AS grams, 0.99 price UNION ALL
  SELECT 10, 1.99 UNION ALL
  SELECT 15, 2.99 
)
SELECT i.*, 
  price_map[SAFE_OFFSET(RANGE_BUCKET(grams, grams_map))] price
FROM `project.dataset.info` i,  
(
  SELECT AS STRUCT 
    ARRAY_AGG(grams + 1 ORDER BY grams) AS grams_map,
    ARRAY_AGG(price ORDER BY grams) AS price_map
  FROM `project.dataset.map`
)

有结果的

Row grams   price    
1   1       0.99     
2   3       0.99     
3   5       0.99     
4   7       1.99     
5   10      1.99     
6   13      2.99     
7   15      2.99
yqkkidmi

yqkkidmi2#

哦,使用标准sql来实现这一点是很好的 lead() 以及 join :

select i.*, m.*
from info i left join
     (select m.*, lead(grams) over (order by trams) as next_grams
      from map m
     ) m
     on i.grams >= m.grams and
        (i.grams < next_grams or next_grams is null);

然而,bigquery的一个限制是它不支持非等外联接。所以,你可以把 map 表到数组并使用 unnest() 做你想做的事:

with info as (
      select 1 as grams union all select 5 union all select 10 union all select 15
     ),
     map as (
      select 5 as grams, 'a' as bucket union all
      select 10 as grams, 'b' as bucket union all
      select 15 as grams, 'c' as bucket 
     )
select i.*,
       (select map
        from unnest(m.map) map
        where map.grams >= i.grams
        order by map.grams
        limit 1
       ) m
from info i cross join
     (select array_agg(map order by grams) as map
      from map
     ) m;
olqngx59

olqngx593#

除了戈登的米哈伊尔的答案。我想建议第三种选择,使用first\u value(),它是bigquery中的一个内置方法,并使用window的知识。
从原理出发,如果我们在info和map表之间使用left join,分别使用grams作为主键,那么对于map表中没有指定的每个gram,我们都会有null值。出于这个原因,我们将使用这个表(带有空值)来用下一个可用的价格对所有的克进行定价。为了实现这一点,我们将使用first\ u value()。根据文件:
返回当前窗口框架中第一行的值表达式的值。
因此,我们将为price为null的每一行选择当前行和下一个非值行之间的第一个非null值。语法如下:


# sample data info

WITH info AS (
  SELECT 1 AS grams UNION ALL 
  SELECT 2 UNION ALL
  SELECT 3 UNION ALL 
  SELECT 4 UNION ALL 
  SELECT 5 UNION ALL
  SELECT 6 UNION ALL
  SELECT 7 UNION ALL
  SELECT 8 UNION ALL
  SELECT 9 UNION ALL
  SELECT 10 UNION ALL 
  SELECT 11 UNION ALL
  SELECT 13 UNION ALL 
  SELECT 15 UNION ALL
  SELECT 16 UNION ALL
  SELECT 18 UNION ALL
  SELECT 19 UNION ALL
  SELECT 20 

), 

# sample data map

map AS (
  SELECT 5 AS grams, 1.99 price UNION ALL
  SELECT 10, 2.99 UNION ALL
  SELECT 15, 3.99 UNION ALL
  SELECT 20, 4.99
), 

# using left join, so there are rows with price = null

t AS (
SELECT i.grams, price
FROM info i LEFT JOIN map  USING(grams)
ORDER BY grams
)
SELECT grams, first_value(price IGNORE NULLS)OVER (ORDER BY grams ASC ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING) AS price 
FROM t ORDER BY grams

以及输出,

Row grams   price
1   1        1.99
2   2        1.99
3   3        1.99
4   4        1.99
5   5        1.99
6   6        2.99
7   7        2.99
8   8        2.99
9   9        2.99
10  10       2.99
11  11       3.99
12  13       3.99
13  15       3.99
14  16       4.99
15  18       4.99
16  19       4.99
17  20       4.99

最后一个select语句执行我们上面描述的操作。此外,我想指出: UNBOUNDED FOLLOWING :窗框在分区的末端结束。
CURRENT ROW :窗口框架从当前行开始。

相关问题