使用生成器读取文件

k7fdbhmy 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(509)

我试图通过本教程了解如何使用python编写hadoop程序http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
这是mapper.py：


# !/usr/bin/env python

"""A more advanced Mapper, using Python iterators and generators."""

import sys

def read_input(file):
    for line in file:
        # split the line into words
        yield line.split()

def main(separator='\t'):
    # input comes from STDIN (standard input)
    data = read_input(sys.stdin)
    for words in data:
        # write the results to STDOUT (standard output);
        # what we output here will be the input for the
        # Reduce step, i.e. the input for reducer.py
        #
        # tab-delimited; the trivial word count is 1
        for word in words:
            print '%s%s%d' % (word, separator, 1)

if __name__ == "__main__":
    main()

我不明白这个词的用法 yield . read_input 一次生成一行。然而， main 只打电话 read_input 一次，对应于文件的第一行。剩下的行怎么读呢？

hadoop python Generator yield

来源：https://stackoverflow.com/questions/18275948/hadoop-program-with-python-use-of-generators-to-read-files

1条答案

按热度按时间

prdp8dxp1#

事实上， main 电话 read_input 好几次。

data = read_input(sys.stdin)

# Causes a generator to be assigned to data.

for words in data:

在for循环的每个循环中， data ，它是由返回的生成器 read_input ，称为。输出 data 分配给 words .
基本上， for words in data 是“调用数据并将输出分配给字，然后执行循环块”的缩写。

赞(0）回复(0）举报 2021-06-03

我来回答

使用生成器读取文件

1条答案

相关问题

热门标签

最新问答