本文整理了Java中org.apache.hadoop.hbase.io.hfile.HFile
类的一些代码示例,展示了HFile
类的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。HFile
类的具体详情如下:
包路径:org.apache.hadoop.hbase.io.hfile.HFile
类名称:HFile
[英]File format for hbase. A file of sorted key/value pairs. Both keys and values are byte arrays.
The memory footprint of a HFile includes the following (below is taken from the TFile documentation but applies also to HFile):
Some constant overhead of reading or writing a compressed block.
Each compressed block requires one compression/decompression codec for I/O.
HFile index, which is proportional to the total number of Data Blocks. The total amount of memory needed to hold the index can be estimated as (56+AvgKeySize)*NumBlocks.
Suggestions on performance optimization.
Minimum block size. We recommend a setting of minimum block size between 8KB to 1MB for general usage. Larger block size is preferred if files are primarily for sequential access. However, it would lead to inefficient random access (because there are more data to decompress). Smaller blocks are good for random access, but require more memory to hold the block index, and may be slower to create (because we must flush the compressor stream at the conclusion of each data block, which leads to an FS I/O flush). Further, due to the internal caching in Compression codec, the smallest possible block size would be around 20KB-30KB.
The current implementation does not offer true multi-threading for reading. The implementation uses FSDataInputStream seek()+read(), which is shown to be much faster than positioned-read call in single thread mode. However, it also means that if multiple threads attempt to access the same HFile (using multiple scanners) simultaneously, the actual I/O is carried out sequentially even if they access different DFS blocks (Reexamine! pread seems to be 10% faster than seek+read in my testing -- stack).
Compression codec. Use "none" if the data is not very compressable (by compressable, I mean a compression ratio at least 2:1). Generally, use "lzo" as the starting point for experimenting. "gz" overs slightly better compression ratio over "lzo" but requires 4x CPU to compress and 2x CPU to decompress, comparing to "lzo".
For more on the background behind HFile, see HBASE-61.
File is made of data blocks followed by meta data blocks (if any), a fileinfo block, data block index, meta data block index, and a fixed size trailer which records the offsets at which file changes content type.
<data blocks><meta blocks><fileinfo><
data index><meta index><trailer>
Each block has a bit of magic at its start. Block are comprised of key/values. In data blocks, they are both byte arrays. Metadata blocks are a String key and a byte array value. An empty file looks like this:
<fileinfo><trailer>
. That is, there are not data nor meta blocks present.
TODO: Do scanners need to be able to take a start and end row? TODO: Should BlockIndex know the name of its file? Should it have a Path that points at its file say for the case where an index lives apart from an HFile instance?
[中]hbase的文件格式。已排序的键/值对的文件。键和值都是字节数组。
HFile的内存占用包括以下内容(以下摘自TFile文档,但也适用于HFile):
*读取或写入压缩块的一些恒定开销。
*每个压缩块需要一个用于I/O的压缩/解压缩编解码器。
*用于缓冲密钥的临时空间。
*用于缓冲该值的临时空间。
*HFile索引,与数据块总数成比例。保存索引所需的内存总量可以估计为(56+AvgKeySize)*NumBlocks。
关于性能优化的建议。
*最小块大小。对于一般用途,我们建议将最小块大小设置为8KB到1MB之间。如果文件主要用于顺序访问,则首选较大的块大小。但是,这将导致效率低下的随机访问(因为有更多的数据需要解压缩)。较小的块有利于随机访问,但需要更多内存来保存块索引,并且创建速度可能较慢(因为我们必须在每个数据块结束时刷新压缩流,这会导致FS I/O刷新)。此外,由于压缩编解码器中的内部缓存,最小可能的块大小约为20KB-30KB。
*当前的实现不提供真正的多线程读取。该实现使用FSDataInputStream seek()+read(),这比单线程模式下的定位读取调用快得多。但是,这也意味着,如果多个线程试图同时访问同一个HFile(使用多个扫描仪),那么即使它们访问不同的DFS块,实际的I/O也会按顺序执行(重新检查!pread似乎比我的测试堆栈中的seek+read快10%)。
*压缩编解码器。如果数据不是很可压缩,请使用“无”(可压缩,我指的是压缩比至少为2:1)。通常,使用“lzo”作为实验的起点。与“lzo”相比,“gz”的压缩比稍好一些,但需要4倍的CPU进行压缩,2倍的CPU进行解压缩。
有关HFile背后背景的更多信息,请参见HBASE-61。
文件由数据块和元数据块(如果有)组成,数据块后接一个fileinfo块、数据块索引、元数据块索引和一个固定大小的尾部,该尾部记录文件更改内容类型时的偏移量。
<data blocks><meta blocks><fileinfo><
data index><meta index><trailer>
每个区块在开始时都有一点魔力。块由键/值组成。在数据块中,它们都是字节数组。元数据块是字符串键和字节数组值。空文件如下所示:
<fileinfo><trailer>
。也就是说,不存在数据或元块。
TODO:扫描仪是否需要能够获取起始行和结束行?TODO:BlockIndex是否应该知道其文件名?它是否应该有一个指向其文件的路径(比如索引与HFile实例分开的情况)?
代码示例来源:origin: apache/hbase
public StoreFileReader(FileSystem fs, Path path, CacheConfig cacheConf,
boolean primaryReplicaStoreFile, AtomicInteger refCount, boolean shared, Configuration conf)
throws IOException {
this(HFile.createReader(fs, path, cacheConf, primaryReplicaStoreFile, conf), refCount, shared);
}
代码示例来源:origin: apache/hbase
/**
* Returns the factory to be used to create {@link HFile} writers.
* Disables block cache access for all writers created through the
* returned factory.
*/
public static final WriterFactory getWriterFactoryNoCache(Configuration
conf) {
return HFile.getWriterFactory(conf, CacheConfig.DISABLED);
}
代码示例来源:origin: apache/hbase
/**
* check configured hfile format version before to do compaction
* @throws IOException throw IOException
*/
private void checkHFileFormatVersionForMob() throws IOException {
if (HFile.getFormatVersion(master.getConfiguration()) < HFile.MIN_FORMAT_VERSION_WITH_TAGS) {
LOG.error("A minimum HFile version of " + HFile.MIN_FORMAT_VERSION_WITH_TAGS
+ " is required for MOB compaction. Compaction will not run.");
throw new IOException("A minimum HFile version of " + HFile.MIN_FORMAT_VERSION_WITH_TAGS
+ " is required for MOB feature. Consider setting " + HFile.FORMAT_VERSION_KEY
+ " accordingly.");
}
}
代码示例来源:origin: apache/hbase
public static void doSmokeTest(FileSystem fs, Path path, String codec)
throws Exception {
Configuration conf = HBaseConfiguration.create();
HFileContext context = new HFileContextBuilder()
.withCompression(HFileWriterImpl.compressionByName(codec)).build();
HFile.Writer writer = HFile.getWriterFactoryNoCache(conf)
.withPath(fs, path)
.withFileContext(context)
.create();
// Write any-old Cell...
final byte [] rowKey = Bytes.toBytes("compressiontestkey");
Cell c = CellUtil.createCell(rowKey, Bytes.toBytes("compressiontestval"));
writer.append(c);
writer.appendFileInfo(Bytes.toBytes("compressioninfokey"), Bytes.toBytes("compressioninfoval"));
writer.close();
Cell cc = null;
HFile.Reader reader = HFile.createReader(fs, path, CacheConfig.DISABLED, true, conf);
try {
reader.loadFileInfo();
HFileScanner scanner = reader.getScanner(false, true);
scanner.seekTo(); // position to the start of file
// Scanner does not do Cells yet. Do below for now till fixed.
cc = scanner.getCell();
if (CellComparator.getInstance().compareRows(c, cc) != 0) {
throw new Exception("Read back incorrect result: " + c.toString() + " vs " + cc.toString());
}
} finally {
reader.close();
}
}
代码示例来源:origin: apache/hbase
/**
* Create a truncated hfile and verify that exception thrown.
*/
@Test
public void testCorruptTruncatedHFile() throws IOException {
Path f = new Path(ROOT_DIR, testName.getMethodName());
HFileContext context = new HFileContextBuilder().build();
Writer w = HFile.getWriterFactory(conf, cacheConf).withPath(this.fs, f)
.withFileContext(context).create();
writeSomeRecords(w, 0, 100, false);
w.close();
Path trunc = new Path(f.getParent(), "trucated");
truncateFile(fs, w.getPath(), trunc);
try {
Reader r = HFile.createReader(fs, trunc, cacheConf, true, conf);
} catch (CorruptHFileException che) {
// Expected failure
return;
}
fail("Should have thrown exception");
}
代码示例来源:origin: apache/hbase
fs.mkdirs(hfilePath);
Path path = new Path(pathStr);
HFile.WriterFactory wf = HFile.getWriterFactoryNoCache(TEST_UTIL.getConfiguration());
Assert.assertNotNull(wf);
HFileContext context = new HFileContext();
代码示例来源:origin: apache/hbase
FixedFileTrailer(int majorVersion, int minorVersion) {
this.majorVersion = majorVersion;
this.minorVersion = minorVersion;
HFile.checkFormatVersion(majorVersion);
}
代码示例来源:origin: apache/hbase
private void metablocks(final String compress) throws Exception {
Path mFile = new Path(ROOT_DIR, "meta.hfile");
FSDataOutputStream fout = createFSOutput(mFile);
HFileContext meta = new HFileContextBuilder()
.withCompression(HFileWriterImpl.compressionByName(compress))
.withBlockSize(minBlockSize).build();
Writer writer = HFile.getWriterFactory(conf, cacheConf)
.withOutputStream(fout)
.withFileContext(meta)
.create();
someTestingWithMetaBlock(writer);
writer.close();
fout.close();
FSDataInputStream fin = fs.open(mFile);
Reader reader = HFile.createReaderFromStream(mFile, fs.open(mFile),
this.fs.getFileStatus(mFile).getLen(), cacheConf, conf);
reader.loadFileInfo();
// No data -- this should return false.
assertFalse(reader.getScanner(false, false).seekTo());
someReadingWithMetaBlock(reader);
fs.delete(mFile, true);
reader.close();
fin.close();
}
代码示例来源:origin: apache/hbase
/**
* Returns true if the specified file has a valid HFile Trailer.
* @param fs filesystem
* @param path Path to file to verify
* @return true if the file has a valid HFile Trailer, otherwise false
* @throws IOException if failed to read from the underlying stream
*/
public static boolean isHFileFormat(final FileSystem fs, final Path path) throws IOException {
return isHFileFormat(fs, fs.getFileStatus(path));
}
代码示例来源:origin: apache/hbase
if (verbose)
out.println("region dir -> " + regionDir);
List<Path> regionFiles = HFile.getStoreFiles(FileSystem.get(getConf()),
regionDir);
if (verbose)
代码示例来源:origin: apache/hbase
@Test
public void testNullMetaBlocks() throws Exception {
for (Compression.Algorithm compressAlgo :
HBaseCommonTestingUtility.COMPRESSION_ALGORITHMS) {
Path mFile = new Path(ROOT_DIR, "nometa_" + compressAlgo + ".hfile");
FSDataOutputStream fout = createFSOutput(mFile);
HFileContext meta = new HFileContextBuilder().withCompression(compressAlgo)
.withBlockSize(minBlockSize).build();
Writer writer = HFile.getWriterFactory(conf, cacheConf)
.withOutputStream(fout)
.withFileContext(meta)
.create();
KeyValue kv = new KeyValue("foo".getBytes(), "f1".getBytes(), null, "value".getBytes());
writer.append(kv);
writer.close();
fout.close();
Reader reader = HFile.createReader(fs, mFile, cacheConf, true, conf);
reader.loadFileInfo();
assertNull(reader.getMetaBlock("non-existant", false));
}
}
代码示例来源:origin: co.cask.hbase/hbase
public static void doSmokeTest(FileSystem fs, Path path, String codec)
throws Exception {
Configuration conf = HBaseConfiguration.create();
HFile.Writer writer = HFile.getWriterFactoryNoCache(conf)
.withPath(fs, path)
.withCompression(codec)
.create();
writer.append(Bytes.toBytes("testkey"), Bytes.toBytes("testval"));
writer.appendFileInfo(Bytes.toBytes("infokey"), Bytes.toBytes("infoval"));
writer.close();
HFile.Reader reader = HFile.createReader(fs, path, new CacheConfig(conf));
reader.loadFileInfo();
byte[] key = reader.getFirstKey();
boolean rc = Bytes.toString(key).equals("testkey");
reader.close();
if (!rc) {
throw new Exception("Read back incorrect result: " +
Bytes.toStringBinary(key));
}
}
代码示例来源:origin: apache/hbase
private String createHFileForFamilies(byte[] family) throws IOException {
HFile.WriterFactory hFileFactory = HFile.getWriterFactoryNoCache(conf);
// TODO We need a way to do this without creating files
File hFileLocation = testFolder.newFile();
FSDataOutputStream out = new FSDataOutputStream(new FileOutputStream(hFileLocation), null);
try {
hFileFactory.withOutputStream(out);
hFileFactory.withFileContext(new HFileContext());
HFile.Writer writer = hFileFactory.create();
try {
writer.append(new KeyValue(CellUtil.createCell(randomBytes,
family,
randomBytes,
0L,
KeyValue.Type.Put.getCode(),
randomBytes)));
} finally {
writer.close();
}
} finally {
out.close();
}
return hFileLocation.getAbsoluteFile().getAbsolutePath();
}
代码示例来源:origin: apache/hbase
public static int getFormatVersion(Configuration conf) {
int version = conf.getInt(FORMAT_VERSION_KEY, MAX_FORMAT_VERSION);
checkFormatVersion(version);
return version;
}
代码示例来源:origin: apache/hbase
.withCompression(HFileWriterImpl.compressionByName(codec))
.build();
Writer writer = HFile.getWriterFactory(conf, cacheConf)
.withOutputStream(fout)
.withFileContext(meta)
fout.close();
FSDataInputStream fin = fs.open(ncHFile);
Reader reader = HFile.createReaderFromStream(ncHFile, fs.open(ncHFile),
fs.getFileStatus(ncHFile).getLen(), cacheConf, conf);
System.out.println(cacheConf.toString());
代码示例来源:origin: apache/hbase
private List<Path> getFilesRecursively(String fileBackupDir)
throws IllegalArgumentException, IOException {
FileSystem fs = FileSystem.get((new Path(fileBackupDir)).toUri(), new Configuration());
List<Path> list = new ArrayList<>();
RemoteIterator<LocatedFileStatus> it = fs.listFiles(new Path(fileBackupDir), true);
while (it.hasNext()) {
Path p = it.next().getPath();
if (HFile.isHFileFormat(fs, p)) {
list.add(p);
}
}
return list;
}
代码示例来源:origin: co.cask.hbase/hbase
if (verbose)
System.out.println("region dir -> " + regionDir);
List<Path> regionFiles = HFile.getStoreFiles(FileSystem.get(conf),
regionDir);
if (verbose)
代码示例来源:origin: apache/hbase
/**
* Test empty HFile.
* Test all features work reasonably when hfile is empty of entries.
* @throws IOException
*/
@Test
public void testEmptyHFile() throws IOException {
Path f = new Path(ROOT_DIR, testName.getMethodName());
HFileContext context = new HFileContextBuilder().withIncludesTags(false).build();
Writer w =
HFile.getWriterFactory(conf, cacheConf).withPath(fs, f).withFileContext(context).create();
w.close();
Reader r = HFile.createReader(fs, f, cacheConf, true, conf);
r.loadFileInfo();
assertFalse(r.getFirstKey().isPresent());
assertFalse(r.getLastKey().isPresent());
}
代码示例来源:origin: apache/hbase
public StoreFileReader(FileSystem fs, Path path, FSDataInputStreamWrapper in, long size,
CacheConfig cacheConf, boolean primaryReplicaStoreFile, AtomicInteger refCount,
boolean shared, Configuration conf) throws IOException {
this(HFile.createReader(fs, path, in, size, cacheConf, primaryReplicaStoreFile, conf), refCount,
shared);
}
代码示例来源:origin: harbby/presto-connectors
public static void doSmokeTest(FileSystem fs, Path path, String codec)
throws Exception {
Configuration conf = HBaseConfiguration.create();
HFileContext context = new HFileContextBuilder()
.withCompression(AbstractHFileWriter.compressionByName(codec)).build();
HFile.Writer writer = HFile.getWriterFactoryNoCache(conf)
.withPath(fs, path)
.withFileContext(context)
.create();
// Write any-old Cell...
final byte [] rowKey = Bytes.toBytes("compressiontestkey");
Cell c = CellUtil.createCell(rowKey, Bytes.toBytes("compressiontestval"));
writer.append(c);
writer.appendFileInfo(Bytes.toBytes("compressioninfokey"), Bytes.toBytes("compressioninfoval"));
writer.close();
Cell cc = null;
HFile.Reader reader = HFile.createReader(fs, path, new CacheConfig(conf), conf);
try {
reader.loadFileInfo();
HFileScanner scanner = reader.getScanner(false, true);
scanner.seekTo(); // position to the start of file
// Scanner does not do Cells yet. Do below for now till fixed.
cc = scanner.getKeyValue();
if (CellComparator.compareRows(c, cc) != 0) {
throw new Exception("Read back incorrect result: " + c.toString() + " vs " + cc.toString());
}
} finally {
reader.close();
}
}
内容来源于网络,如有侵权,请联系作者删除!