使用hadoop将xml转换为csv

bkhjykvo  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(352)

伙计们,我正在尝试使用hadoop将xml文件转换为csv,所以我在mapper类中使用以下代码

protected void map(LongWritable key, Text value,
                 @SuppressWarnings("rawtypes") Mapper.Context context)
  throws
  IOException, InterruptedException {
String document = value.toString();
System.out.println("‘" + document + "‘");
    try {
  XMLStreamReader reader =
      XMLInputFactory.newInstance().createXMLStreamReader(new
          ByteArrayInputStream(document.getBytes()));
  String propertyName = "";
  String propertyValue = "";
  String currentElement = "";
  while (reader.hasNext()) {
    int code = reader.next();
    switch (code) {
      case XMLStreamConstants.START_ELEMENT: //START_ELEMENT:
        currentElement = reader.getLocalName();
        break;
      case XMLStreamConstants.CHARACTERS:  //CHARACTERS:
        if (currentElement.equalsIgnoreCase("author")) {
          propertyName += reader.getText();
         } else if (currentElement.equalsIgnoreCase("price"))
        {
            String name=reader.getText();
            name.trim();
          propertyName += name;
          propertyName.trim();
         }
 }
        console.write(null,new Text(propertyName));
 }
 }

但我得到的结果是这样的

Gambardella, Matthew
      XML Developer's Guide
      44.95
      2000-10-01

Ralls, Kim
      Midnight Rain
      5.95
      2000-12-16

你能帮我吗

toe95027

toe950271#

程序的输出取决于如何从mapper收集/写入数据。
在这种情况下,您应该使用textoutputformat&keyout将是空可写的,valueout将是文本。值out应该是从csv提取的值的串联。
从代码中看,您似乎是在从xml中读取每个值之后编写输出。

相关问题