我们可以用hadoop运行现有的程序,还是只需要用mapreduce样式修改它?

xu3bshqb  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(263)

我写了下面的python程序来处理excel文件中的数据。但是现在有没有可能用hadoopmapreduce运行相同的程序,传统的程序和mapreduce程序有何不同?

import xlrd

with xlrd.open_workbook('interference.xlsx') as book:

# 0 corresponds for 1st worksheet, usually named 'Book1'

sheet = book.sheet_by_index(0)

# gets col C values

B = [ B for B in sheet.col_values(1) ]

# gets col D values

D = [ D for D in sheet.col_values(3) ]

# combines D and E elements to tuples, combines tuples to list

# ex. [ ('Incoming', 18), ('Outgoing', 99), ... ]

data = list( zip(B,D) )

# gets total no of get request attempts for each UID

x=1
for uid in data:
    while x <=44 :
        attempts = sum( tup[1] for tup in data if tup[0] == x )
        print("Total attempts for UID",x, attempts)
        x=x+1
0qx6xfy6

0qx6xfy61#

在hadoop中不可能运行与mapreduce作业相同的程序。
mapreduce是一种基于将计算分为两个阶段的编程范式:第一阶段(mapping)将问题划分为多个子问题并解决每个子问题;第二阶段(减少)将所有子问题的结果放在一起,得到最终的解决方案。
我建议您看看wordcount程序,它是hadoop与hello world的等价物:http://wiki.apache.org/hadoop/wordcount

相关问题