public interface InProgressFileWriter<IN, BucketID> extends PartFileInfo<BucketID> {
/**
* Write a element to the part file.
*
* @param element the element to be written.
* @param currentTime the writing time.
* @throws IOException Thrown if writing the element fails.
*/
void write(final IN element, final long currentTime) throws IOException;
/**
* @return The state of the current part file.
* @throws IOException Thrown if persisting the part file fails.
*/
InProgressFileRecoverable persist() throws IOException;
/**
* @return The state of the pending part file. {@link Bucket} uses this to commit the pending
* file.
* @throws IOException Thrown if an I/O error occurs.
*/
PendingFileRecoverable closeForCommit() throws IOException;
/** Dispose the part file. */
void dispose();
// ------------------------------------------------------------------------
/** A handle can be used to recover in-progress file.. */
interface InProgressFileRecoverable extends PendingFileRecoverable {}
/** The handle can be used to recover pending file. */
interface PendingFileRecoverable {}
3条答案
按热度按时间lymnna711#
也许你可以看看write函数的实现。有几个实现会真实的地把数据写到hdfs文件中。StreamingFileSink本身不会把它们保存在缓冲区中,但是在FileOutputStream中会有一些缓冲区
}
7gcisfzg2#
StreamingFileSink时间数据真实的写入HDFS,不会保留在缓冲区中
为了支持“恰好一次"语义,进程中文件将仅在检查点期间重命名为正式文件,然后您可以执行后继数据分析
StreamingFileSink不支持实时数据查询,可以通过减小cp的间隔来提高数据可见性的实时性能,但如果cp间隔过小,则容易导致小文件问题
i2byvkas3#
您是否考虑过启用文件压缩选项?https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#file-compaction