我写了下面的python程序来处理excel文件中的数据。但是现在有没有可能用hadoopmapreduce运行相同的程序,传统的程序和mapreduce程序有何不同?
import xlrd
with xlrd.open_workbook('interference.xlsx') as book:
# 0 corresponds for 1st worksheet, usually named 'Book1'
sheet = book.sheet_by_index(0)
# gets col C values
B = [ B for B in sheet.col_values(1) ]
# gets col D values
D = [ D for D in sheet.col_values(3) ]
# combines D and E elements to tuples, combines tuples to list
# ex. [ ('Incoming', 18), ('Outgoing', 99), ... ]
data = list( zip(B,D) )
# gets total no of get request attempts for each UID
x=1
for uid in data:
while x <=44 :
attempts = sum( tup[1] for tup in data if tup[0] == x )
print("Total attempts for UID",x, attempts)
x=x+1
1条答案
按热度按时间0qx6xfy61#
在hadoop中不可能运行与mapreduce作业相同的程序。
mapreduce是一种基于将计算分为两个阶段的编程范式:第一阶段(mapping)将问题划分为多个子问题并解决每个子问题;第二阶段(减少)将所有子问题的结果放在一起,得到最终的解决方案。
我建议您看看wordcount程序,它是hadoop与hello world的等价物:http://wiki.apache.org/hadoop/wordcount