伙计们,我正在尝试使用hadoop将xml文件转换为csv,所以我在mapper类中使用以下代码
protected void map(LongWritable key, Text value,
@SuppressWarnings("rawtypes") Mapper.Context context)
throws
IOException, InterruptedException {
String document = value.toString();
System.out.println("‘" + document + "‘");
try {
XMLStreamReader reader =
XMLInputFactory.newInstance().createXMLStreamReader(new
ByteArrayInputStream(document.getBytes()));
String propertyName = "";
String propertyValue = "";
String currentElement = "";
while (reader.hasNext()) {
int code = reader.next();
switch (code) {
case XMLStreamConstants.START_ELEMENT: //START_ELEMENT:
currentElement = reader.getLocalName();
break;
case XMLStreamConstants.CHARACTERS: //CHARACTERS:
if (currentElement.equalsIgnoreCase("author")) {
propertyName += reader.getText();
} else if (currentElement.equalsIgnoreCase("price"))
{
String name=reader.getText();
name.trim();
propertyName += name;
propertyName.trim();
}
}
console.write(null,new Text(propertyName));
}
}
但我得到的结果是这样的
Gambardella, Matthew
XML Developer's Guide
44.95
2000-10-01
Ralls, Kim
Midnight Rain
5.95
2000-12-16
你能帮我吗
1条答案
按热度按时间toe950271#
程序的输出取决于如何从mapper收集/写入数据。
在这种情况下,您应该使用textoutputformat&keyout将是空可写的,valueout将是文本。值out应该是从csv提取的值的串联。
从代码中看,您似乎是在从xml中读取每个值之后编写输出。