如何读取pcap的hadoop sequencefile并将单个pcap写入hdfs？

ccgok5k5 于 2021-07-13 发布在 Hadoop

关注(0)|答案(0)|浏览(190)

我的任务是从填充了pcap的hdfs中读取一个sequencefile，然后将一个特定的pcap写回hdfs（在hdfs中它将被下载到用户的浏览器）。
我是spark/scala的新手，但根据我目前所知，我相信我需要这样的伪代码：

// Read whole PCAP archive:
import org.apache.hadoop.io.Text
import org.apache.hadoop.io.IntWritable
val result = sc.sequenceFile("hdfs://path/to/pcap/archive", classOf[Text], classOf[IntWritable]).map{ turn into array of binary chunks, each representing a PCAP }
// Pick out PCAP from it:
val pcap_to_write = [select individual PCAP from result somehow]
// Write that PCAP back to HDFS:
val out = fs.create(new Path("hdfs://output/path/for/pcap"))
out.write(pcap_to_write);
out.close();

我假设这里面有错误，或者我的概念错了，任何提示/建议都非常感谢。

hadoop hdfs scala apache-spark pyspark

来源：https://stackoverflow.com/questions/67270831/how-to-read-hadoop-sequencefile-of-pcaps-and-write-an-individual-pcap-from-it-to

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

如何读取pcap的hadoop sequencefile并将单个pcap写入hdfs？

暂无答案！

相关问题

热门标签

最新问答