假设我在文件系统中有Parquet文件。如何获取Parquet地板模式并将其转换为avro模式?
z4bn682m1#
使用hadoop parquetfilereader获取parquet模式并将其传递给avroschemaconverter以将其转换为avro模式。scala代码示例:
import org.apache.avro.Schema import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.Path import org.apache.parquet.avro.AvroSchemaConverter import org.apache.parquet.hadoop.ParquetFileReader import org.apache.parquet.hadoop.util.HadoopInputFile object ParquetToAvroSchemaConverter { def main(args: Array[String]): Unit = { val path = new Path("###PATH_TO_PARQUET_FILE###") val avroSchema = convert(path) } def convert(parquetPath: Path): Schema = { val cfg = new Configuration // Create parquet reader val rdr = ParquetFileReader.open(HadoopInputFile.fromPath(parquetPath, cfg)) try { // Get parquet schema val schema = rdr.getFooter.getFileMetaData.getSchema println("Parquet schema: ") println("#############################################################") print(schema.toString) println("#############################################################") println // Convert to Avro val avroSchema = new AvroSchemaConverter(cfg).convert(schema) println("Avro schema: ") println("#############################################################") println(avroSchema.toString(true)) println("#############################################################") avroSchema } finally { rdr.close() } } }
你必须有你的下一个依赖项 SBT 项目:
SBT
libraryDependencies ++= Seq( "org.apache.parquet" % "parquet-avro" % "1.10.0", "org.apache.parquet" % "parquet-hadoop" % "1.10.0", "org.apache.hadoop" % "hadoop-client" % "2.7.3", )
1条答案
按热度按时间z4bn682m1#
使用hadoop parquetfilereader获取parquet模式并将其传递给avroschemaconverter以将其转换为avro模式。scala代码示例:
你必须有你的下一个依赖项
SBT
项目: