流媒体时Map函数键

6tdlim6h 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(408)

我对hadoop还很陌生，我遇到了一些我在google上找不到的东西。
在java版本的“helloworld”hadoop程序中，即单词计数，mapper函数接受一个键、值对，这符合我对mapreduce工作原理的理解。据我所知，在单词计数示例中，键是行号，值是文本行本身：

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    //Tokenize the line and print out token,1 for each
}

但是，在同一程序的python流示例中，pythonMap器似乎没有读取键：


# !/usr/bin/env python

import sys

# input comes from STDIN (standard input)

for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()
    # split the line into words
    words = line.split()
    for word in words:
        print '%s\t%s' % (word, 1)

pythonMap器似乎只能从stdin读取值部分。如何在pythonMap器中也获得键（行号）？
提前谢谢！！

hadoop hadoop-streaming

来源：https://stackoverflow.com/questions/33222143/hadoop-map-function-key-when-streaming