sql—匹配两个jsonb列的百分比,

htzpubme  于 2021-07-26  发布在  Java
关注(0)|答案(2)|浏览(295)

我试图比较一个表中的两个jsonb列,目前它是在应用程序中完成的,但是这不允许在不加载整个数据集的情况下进行适当的搜索、筛选和排序。如果我们能在数据库里做这个比较就更好了。
以下是数据和计算的示例。

employer = {
  "autism": "1",
  "social": "1",
  "dementia": "0",
  "domestic": "1",
}

employers_keys = ["autism","social","domestic"]

candidate = {
  "autism": "0",
  "social": "1",
  "dementia": "0",
  "domestic": "1",
}

candidate_keys = ["social","domestic"]

remainder_keys = employer_key - candidate_key = ["autism"]

1-(remainder_keys.length/employer_keys.length) = 1-(1/3) = 2/3 = 66%

这个过程在ruby中非常简单,jsonb->array->select->calculation
但是,我希望在sql或db级别的函数中执行此操作,例如
函数compare\u json(employer,candidate)返回小数点。
更具体地说

Select candidates.id,
       st_distance_sphere(st_makepoint(employer.long, employer.lat), st_makepoint(candidates.long, candidates.lat)) /
       1000 / 8 * 5 as distance
from (select * from users where id = 8117) employer,
     (select * from users where role_id = 5) candidates
where st_distance_sphere(st_makepoint(employer.long, employer.lat), st_makepoint(candidates.long, candidates.lat)) /
      1000 / 8 * 5 < 25
order by distance

上面的sql计算单个雇主和多个候选人之间的距离,内联查询employer.skills(1行),candidate.skills(n行)。
所以输出应该是。。
候选人id、距离、技能匹配(雇主.技能、候选人.技能)
与编辑之前一样,欢迎提供任何指导。

xkrw2x1b

xkrw2x1b1#

下面是一种纯sql方法:它将employer对象转换为记录集,然后执行条件聚合:

select 1 - avg( ((d.candidate ->> e.k)::int is distinct from 1)::int ) res
from (values(
    '{ "autism": "1", "social": "1", "dementia": "0", "domestic": "1" }'::jsonb,
    '{ "autism": "0", "social": "1", "dementia": "0", "domestic": "1" }'::jsonb
)) d(employer, candidate)
cross join lateral jsonb_each_text(d.employer) e(k, v)
where e.v::int = 1

通过替换 values() 带参数的行构造函数。
db小提琴演示:

|                    res |
| ---------------------: |
| 0.66666666666666666667 |
ws51t4hk

ws51t4hk2#

好吧,这就是我要做的。

CREATE OR REPLACE FUNCTION JSON_COMPARE(employer_json jsonb, candidate_json jsonb, OUT _result numeric)
AS
$$
BEGIN
    select 1 - avg(((d.candidate ->> e.k)::int is distinct from 1)::int)
    into _result
    from (values (employer_json, candidate_json)) d(employer, candidate)
             cross join lateral jsonb_each_text(d.employer) e(k, v)
    where e.v::int = 1;
    RETURN;
END;
$$
    LANGUAGE PLPGSQL;

这是gmb超快速答案的微小变化。通过使用少量索引并正确限制候选列表的大小,我们可以获得合理的性能。
我是新手,所以我对gmb的支持票没有显示出来,但再次感谢。

相关问题