我有这样一个矩阵:
1,1,2
2,3,4
6,4,6
1,2,4
3,6,3
4,6,2
4,5,8
3,4,4
和向量
1,3
4,5
5,4
6,2
它们存储在两个不同的文件中。我需要把它们乘以列。矩阵是m体(i,j,v),其中i是行号,j是列号,v是值。向量是v体的向量(j,v)。
我写了一个Map
# !/usr/bin/env python
import sys
# class to store matrix records
class MatrixRecord(object):
def __init__( self ):
self.i= None
self.j= None
self.v= None
# class to store vector records
class VectorRecord(object):
def __init__( self ):
self.j= None
self.v= None
# lists to store objects
listOfMatrixRecords = []
listOfVectorRecords = []
# input comes from STDIN (standard input)
for line in sys.stdin:
# remove leading and trailing whitespace and split
splittedLine = line.strip().split(",")
# if it's matrix element - body looks like
# 1,3,6
if(len(splittedLine) == 3):
x = MatrixRecord();
x.i = splittedLine[0]
x.j = splittedLine[1]
x.v = splittedLine[2]
listOfMatrixRecords.append(x) #add it to matrix records list
#if it's vector element - body looks like
# 2,4
else:
y = VectorRecord();
y.j = splittedLine[0]
y.v = splittedLine[1]
listOfVectorRecords.append(y) #add it to vector records list
# get matrix records and multiply them by vector values
vectorPosition = {record.j for record in listOfVectorRecords} #gets j properties of objects from vector
matrixPosition = {record.j for record in listOfMatrixRecords} #gets j properties of objects from matrix
for duplicate in vectorPosition & matrixPosition: #checks for duplicates between matrix and vector
for x in listOfMatrixRecords:
if x.j == duplicate: # if there's a duplicate, it means that we must multiply
for y in listOfVectorRecords:
if y.j == x.j:
x.v = int(x.v) * int(y.v);
# return result to stdout, reducer will take it as input
for x in listOfMatrixRecords:
print ('%s\t%s' % (x.i,x.v))
但只有当所有内容都存储在一个输入文件(而不是多个)中时,它才能工作,因为每个文件都会创建新的Map器,因此
listOfMatrixRecords = []
listOfVectorRecords = []
从不包含所有矩阵/向量记录。
有没有一种方法可以为hadoop流媒体编写定制的shuffle方法?
我像这样启动hadoop:
hadoop jar "D:\hadoop-2.7.1\share\hadoop\tools\lib\hadoop-streaming-2.7.1.jar" -mapper "python D:\map.py" -reducer "python D:\reducer.py" -input /input/* -output /output
暂无答案!
目前还没有任何答案,快来回答吧!