例如,我有两个rdd:firstmaprdd-(0-14,list(0,4,19,19079,42697,4444748))
secondmaprdd-(0-14,列表(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,78,79,80,81,82,83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94))
我要找到十字路口。我试过,var interresult=firstmaprdd.intersection(secondmaprdd),它在输出文件中没有显示结果。
我也尝试过,基于键的cogroup,maprdd.cogroup(secondmaprdd.filter(x=>),但是我不知道如何找到两个值之间的交集,是x=>x.。\u1.intersect(x.。\u2),有人能帮我学语法吗?
即使这样,也会抛出编译时错误maprdd.cogroup(secondmaprdd.filter(x=>x.。\u1.intersect(x.。\u2))
var mapRDD = sc.parallelize(map.toList)
var secondMapRDD = sc.parallelize(secondMap.toList)
var interResult = mapRDD.intersection(secondMapRDD)
这可能是因为arraybuffer[list[]]值,因为该值交叉点不起作用。有什么黑客可以移除它吗?
我试过这么做
var interResult = mapRDD.cogroup(secondMapRDD).filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r)) => (k, l.toList.intersect(r.toList))}
仍然得到一个空名单!
1条答案
按热度按时间2vuwiymt1#
既然你在找
intersect on values
,你需要join
两个RDD,获取所有匹配的值,然后对值进行交集。示例代码:
输出将是