hadoop流媒体当矩阵被存储在许多文件中时，如何用向量乘矩阵

yks3o0rb 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(281)

我有这样一个矩阵：

1,1,2
2,3,4
6,4,6
1,2,4
3,6,3
4,6,2
4,5,8
3,4,4

和向量

1,3
4,5
5,4
6,2

它们存储在两个不同的文件中。我需要把它们乘以列。矩阵是m体（i，j，v），其中i是行号，j是列号，v是值。向量是v体的向量（j，v）。
我写了一个Map


# !/usr/bin/env python

import sys

# class to store matrix records

class MatrixRecord(object):
    def __init__( self ):
        self.i= None
        self.j= None
        self.v= None

# class to store vector records

class VectorRecord(object):
    def __init__( self ):
        self.j= None
        self.v= None

# lists to store objects

listOfMatrixRecords = []
listOfVectorRecords = []

# input comes from STDIN (standard input)

for line in sys.stdin:
    # remove leading and trailing whitespace and split
    splittedLine = line.strip().split(",")  

    # if it's matrix element - body looks like
    # 1,3,6
    if(len(splittedLine) == 3):
        x = MatrixRecord();
        x.i = splittedLine[0]
        x.j = splittedLine[1]
        x.v = splittedLine[2]
        listOfMatrixRecords.append(x) #add it to matrix records list
    #if it's vector element - body looks like
    # 2,4
    else: 
        y = VectorRecord();
        y.j = splittedLine[0]
        y.v = splittedLine[1]
        listOfVectorRecords.append(y) #add it to vector records list

# get matrix records and multiply them by vector values

vectorPosition = {record.j for record in listOfVectorRecords} #gets j properties of objects from vector
matrixPosition = {record.j for record in listOfMatrixRecords} #gets j properties of objects from matrix

for duplicate in vectorPosition & matrixPosition: #checks for duplicates between matrix and vector
    for x in listOfMatrixRecords:
        if x.j == duplicate:    # if there's a duplicate, it means that we must multiply
            for y in listOfVectorRecords:
                if y.j == x.j:
                    x.v = int(x.v) * int(y.v);

# return result to stdout, reducer will take it as input

for x in listOfMatrixRecords:
    print ('%s\t%s' % (x.i,x.v))

但只有当所有内容都存储在一个输入文件（而不是多个）中时，它才能工作，因为每个文件都会创建新的Map器，因此

listOfMatrixRecords = []
listOfVectorRecords = []

从不包含所有矩阵/向量记录。
有没有一种方法可以为hadoop流媒体编写定制的shuffle方法？
我像这样启动hadoop：

hadoop jar "D:\hadoop-2.7.1\share\hadoop\tools\lib\hadoop-streaming-2.7.1.jar" -mapper "python D:\map.py" -reducer "python D:\reducer.py" -input /input/* -output /output

hadoop mapreduce python hadoop-streaming matrix-multiplication

来源：https://stackoverflow.com/questions/37430180/hadoop-streaming-how-to-multiply-matrix-by-vector-when-theyre-stored-in-many-fi

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

hadoop流媒体当矩阵被存储在许多文件中时，如何用向量乘矩阵

暂无答案！

相关问题

热门标签

最新问答