如何从hdfs读取csv文件？

avkwfej4 于 2021-06-04 发布在 Hadoop

关注(0)|答案(2)|浏览(763)

我的数据保存在csv文件中。我想读取hdfs中的csv文件。
有人能帮我查一下密码吗？？
我是hadoop新手。提前谢谢。

hadoop hdfs csv mahout

来源：https://stackoverflow.com/questions/17145463/how-to-read-a-csv-file-from-hdfs

2条答案

按热度按时间

kgqe7b3p1#

这需要的类是filesystem、fsdatainputstream和path。客户应该是这样的：

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
        System.out.println(inputStream.readChar());         
    }

fsdatainputstream有几个 read 方法。选择一个适合你需要的。
如果是mr，就更简单了：

public static class YourMapper extends
                    Mapper<LongWritable, Text, Your_Wish, Your_Wish> {

                public void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {

                    //Framework does the reading for you...
                    String line = value.toString();      //line contains one line of your csv file.
                    //do your processing here
                    ....................
                    ....................
                    context.write(Your_Wish, Your_Wish);
                    }
                }
            }

赞(0）回复(0）举报 2021-06-04

8hhllhi22#

如果您想使用mapreduce，可以使用textinputformat逐行读取并解析mapper的map函数中的每一行。
另一种选择是开发（或找到已开发的）csv输入格式，用于从文件中读取数据。
这里有一个古老的教程http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html 但新版本的逻辑是一样的
如果使用单个进程从文件中读取数据，则它与从任何其他文件系统中读取文件是相同的。这里有一个很好的例子https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs
hth公司

赞(0）回复(0）举报 2021-06-04

我来回答

如何从hdfs读取csv文件？

2条答案

相关问题

热门标签

最新问答