postgresql SQL -跨列的Group By值

nmpmafwu  于 2023-11-18  发布在  PostgreSQL
关注(0)|答案(4)|浏览(175)

我有一个类似于netflow表的东西,并希望以这样一种方式对其进行分组,即它按(src_ip,src_port,dst_ip,dst_port)进行分组,其中可以在src和dst字段之间交换值。
| src_ip| src_port| dst_ip| dst_port| bytes_sent|
| --|--|--|--|--|
| 192.168.1.1 | 123 |192.168.10.5| 321 | 111 |
| 192.168.10.5 | 321 |192.168.1.1| 123 | 222 |
| 10.0.0.5 | 50 |172.0.0.5| 55 | 500 |
| 172.0.0.5 | 55 |10.0.0.5| 50 | 300 |
| 192.168.1.1 | 123 |192.168.10.5| 321 | 1000 |
| 192.168.1.1 | 123 |192.168.10.5| 20 | 999 |
我想从这个表中得到以下结果:
| src_ip| src_port| dst_ip| dst_port| bytes_sent| bytes_recv|
| --|--|--|--|--|--|
| 192.168.1.1 | 123 |192.168.10.5| 321 | 1111 | 222 |
| 10.0.0.5 | 50 |172.0.0.5| 55 | 500 | 300 |
| 192.168.1.1 | 123 |192.168.10.5| 20 | 999 | 0 |
基本上,试图在一行中捕获双向流量。因此,类似于按(src_ip,src_port)和(dst_ip,dst_port)分组,其中这些值可以反转。实现这一点的最佳方法是什么?

anhgbhbe

anhgbhbe1#

为了决定哪个IP,端口和方向,你必须有一个规则,在你的聚合结果中,你认为谁是发送者,谁是接收者。让我们把较小的IP作为源,把较大的IP作为目的地。然后,它只是一次又一次地决定哪个原始列放在哪个结果列中的相同CASE表达式。一旦完成,聚合你的数据。

with 
  data as
  (
    select
      case when src_ip < dst_ip then  src_ip      else  dst_ip      end as source_ip,
      case when src_ip < dst_ip then  dst_ip      else  src_ip      end as dest_ip,
      case when src_ip < dst_ip then  src_port    else  dst_port    end as source_port,
      case when src_ip < dst_ip then  dst_port    else  src_port    end as dest_port,
      case when src_ip < dst_ip then  bytes_sent  else  0           end as sent,
      case when src_ip < dst_ip then  0           else  bytes_sent  end as received
    from mytable
  )
select
  source_ip, source_port, dest_ip, dest_port,
  sum(sent) as bytes_sent,
  sum(received) as bytes_received
from data
group by source_ip, source_port, dest_ip, dest_port
order by source_ip, source_port, dest_ip, dest_port;

字符串

2izufjch

2izufjch2#

通过组合使用以下语句GROUP BY、CASE和SUM函数来聚合函数,可以实现所需的输出
可以按如下方式执行以下查询:

SELECT
    CASE WHEN src_ip < dst_ip THEN src_ip ELSE dst_ip END AS src_ip,
    CASE WHEN src_ip < dst_ip THEN src_port ELSE dst_port END AS src_port,
    CASE WHEN src_ip < dst_ip THEN dst_ip ELSE src_ip END AS dst_ip,
    CASE WHEN src_ip < dst_ip THEN dst_port ELSE src_port END AS dst_port,
    SUM(CASE WHEN src_ip < dst_ip THEN bytes_sent ELSE 0 END) AS bytes_sent,
    SUM(CASE WHEN src_ip < dst_ip THEN 0 ELSE bytes_sent END) AS bytes_recv
FROM your_table
GROUP BY
    CASE WHEN src_ip < dst_ip THEN src_ip ELSE dst_ip END,
    CASE WHEN src_ip < dst_ip THEN src_port ELSE dst_port END,
    CASE WHEN src_ip < dst_ip THEN dst_ip ELSE src_ip END,
    CASE WHEN src_ip < dst_ip THEN dst_port ELSE src_port END;

字符串
上面的CASE语句将根据src和dst值的词法顺序确定它们的顺序,以确保分组在两个方向上是一致的。而SUM和CASE语句用于分别聚合不同方向(正向和反向)的bytes_sent值。

6ju8rftf

6ju8rftf3#

您可以使用self-join

with cte as (
   select row_number() over (order by greatest(n.src_ip, n.dst_ip)) r, n.src_ip, n.src_port, n.dst_ip, n.dst_port, min(n.bytes_sent) bytes_sent from netflow n
   group by n.src_ip, n.src_port, n.dst_ip, n.dst_port
)
select n.src_ip, n.src_port, n.dst_ip, n.dst_port, n.bytes_sent, 
   coalesce(n1.bytes_sent, 0) bytes_recieved
from cte n left join cte n1 on n1.src_port = n.dst_port
where not exists (select 1 from cte n2 where n2.r < n.r and n2.dst_port = n.src_port)

字符串
See fiddle

lfapxunr

lfapxunr4#

假设最小的IP是源IP,最大的是目的IP。
您可以使用LEASTGREATEST函数来确保对于最小和最大IP地址的每个组合,将选择一个条目:

with cte as (
  select least(src_ip, dst_ip) as smallestIP, greatest(src_ip, dst_ip) as largestIP
  from mytable src
  group by least(src_ip, dst_ip), greatest(src_ip, dst_ip)
),
routes as (
  select distinct src_ip, src_port, dst_ip, dst_port 
  from (
    select src_ip, src_port, dst_ip, dst_port 
    from mytable t
    inner join cte c on t.src_ip = c.smallestIP
    union all
    select dst_ip as src_ip, dst_port as src_port, src_ip as dst_ip, src_port as dst_port
    from mytable t
    inner join cte c on t.dst_ip = c.smallestIP
  ) as s
)
select r.src_ip, r.src_port, r.dst_ip, r.dst_port,
       sum(case when r.src_ip = t.src_ip and r.src_port = t.src_port
                     and r.dst_ip = t.dst_ip and r.dst_port = t.dst_port
                then bytes_sent else 0 end ) as bytes_sent,
       sum(case when r.src_ip = t.dst_ip and r.src_port = t.dst_port
                     and r.dst_ip = t.src_ip and r.dst_port = t.src_port
                then bytes_sent else 0 end ) as bytes_recv
from routes r
inner join mytable t on (
                        r.src_ip = t.src_ip and r.src_port = t.src_port
                        and r.dst_ip = t.dst_ip and r.dst_port = t.dst_port)
                      or (
                        r.src_ip = t.dst_ip and r.src_port = t.dst_port
                        and r.dst_ip = t.src_ip and r.dst_port = t.src_port
                      )
group by r.src_ip, r.src_port, r.dst_ip, r.dst_port

字符串
Demo here

相关问题