在hadoop上计算偶数/奇数对的和？

pdsfdshx 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(496)

我想为hadoop创建一个并行scanleft（计算关联运算符的前缀和）函数（特别是烫伤；请参阅下文了解如何做到这一点）。
给定hdfs文件中的一个数字序列（每行一个），我想计算一个具有连续偶数/奇数对之和的新序列。例如：
输入顺序：
0,1,2,3,4,5,6,7,8,9,10
输出顺序：
0+1, 2+3, 4+5, 6+7, 8+9, 10
即
1,5,9,13,17,10
我想为了做到这一点，我需要为hadoop编写inputformat和inputspits类，但我不知道如何做到这一点。
见第3.3节。下面是scala中的一个示例算法：

// for simplicity assume input length is a power of 2

def scanadd(input : IndexedSeq[Int]) : IndexedSeq[Int] =   
if (input.length == 1)
  input 
else { 
//calculate a new collapsed sequence which is the sum of sequential even/odd pairs 
val collapsed = IndexedSeq.tabulate(input.length/2)(i => input(2 * i) + input(2*i+1))

//recursively scan collapsed values
val scancollapse = scanadd(collapse)

//now we can use the scan of the collapsed seq to calculate the full sequence

val output = IndexedSeq.tabulate(input.length)(
i => i.evenOdd match {             

//if an index is even then we can just look into the collapsed sequence and get the value
// otherwise we can look just before it and add the value at the current index

   case Even => scancollapse(i/2) 
   case Odd => scancollapse((i-1)/2) + input(i)  
}

output
}

我知道这可能需要一些优化，才能很好地与hadoop配合使用。我认为直接翻译这段代码会导致hadoop代码效率低下。例如，在hadoop中显然不能使用indexedseq。如果您有任何具体问题，我将不胜感激。不过，我认为它可能会很好地发挥作用。

hadoop scala functional-programming cascading scalding

来源：https://stackoverflow.com/questions/14164723/calculate-sums-of-even-odd-pairs-on-hadoop

2条答案

按热度按时间

vxf3dgd41#

这是我找到的编写inputformat和recordreader的最佳教程。最后我把整张分割成了一张可写的唱片。

赞(0）回复(0）举报 2021-06-04

vmjh9lq92#

多余的。你是说这个密码？

val vv = (0 to 1000000).grouped(2).toVector
vv.par.foldLeft((0L, 0L, false))((a, v) => 
    if (a._3) (a._1, a._2 + v.sum, !a._3) else (a._1 + v.sum, a._2, !a._3))

赞(0）回复(0）举报 2021-06-03

我来回答

在hadoop上计算偶数/奇数对的和？

2条答案

相关问题

热门标签

最新问答