hadoop mapreduce:textinputformat和处理行？

kuuvgm7e 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(481)

我不确定我是否理解textinputformat的工作方式。文件上说：
纯文本文件的输入格式。文件被分成几行。
所以我假设，当我简单地将作为map函数输入的值转换为string时，我的文件中将有一行的字符串表示。

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

    String line = value.toString(); \\ one line of my input file?
    ...

    }

然而，在进一步处理该行时，它实际上不是我文件中的一行。我的文件city.dat如下所示：

Andorra la Vella|ad|Andorra la Vella|20430|42.51|1.51
Canillo|ad|Canillo|3292|42.57|1.6
...

有人能告诉我如何在map函数中处理这个文件的行吗？

hadoop mapreduce text Line textinput

来源：https://stackoverflow.com/questions/13208445/hadoop-mapreduce-textinputformat-and-processing-lines

1条答案

按热度按时间

fcipmucu1#

textinputformat用作纯文本文件的输入格式。文件被分成几行。换行符或回车符用于表示行结束。键是文件中的位置，值是文本行。。如果行尾不是换行符或回车符，则必须编写自己的输入格式。
看看这个博客的第三点，它肯定会在文章的结尾处把文章分解。http://blog.cloudera.com/blog/2011/01/lessons-learned-from-clouderas-hadoop-developer-training-course/
我建议通过像ultraedit一样将文件打开到texteditor并检查新行字符来 checkout 您的文件。
看看是否有用。

赞(0）回复(0）举报 2021-06-03

我来回答

hadoop mapreduce:textinputformat和处理行？

1条答案

相关问题

热门标签

最新问答