1. Before adding new files, fetch the last modified time (hadoop fs -ls /your-path). Lets say it as mTime.
2. Next upload new files into hdfs path
3. Now filter the files that are greater than mTime. These files are to be processed. Make your program to process only these files.
FileOutputFormat.setOutputPath(job, new Path(hdfsFilePath
+ timestamp_start); // start at 12 midnight for example: 1427241600 (GMT) --you can write logic to get epoch time
2条答案
按热度按时间brjng4g31#
要做到这一点,您需要编写一个java代码。这些步骤可能有助于:
这只是开发代码的一个提示。:)
m1m5dgzv2#
如果它是mapreduce,那么您可以每天创建附加时间戳的输出目录。
喜欢