hdfs—通过为包含json的列定义模式来创建配置单元表的视图

t5zmwmid  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(340)

我将kafka流中的原始json字符串作为parquet存储到hdfs中
我在配置单元上为hdfs文件夹创建了一个外部表
现在我想为存储在配置单元表中的原始数据创建一个视图,
Kafka流到hdfs

  1. public static void main(String[] args) throws Exception {
  2. String brokers = "quickstart:9092";
  3. String topics = "simple_topic_6";
  4. String master = "local[*]";
  5. SparkSession sparkSession = SparkSession
  6. .builder().appName(EventKafkaToParquet.class.getName())
  7. .master(master).getOrCreate();
  8. SQLContext sqlContext = sparkSession.sqlContext();
  9. SparkContext context = sparkSession.sparkContext();
  10. context.setLogLevel("ERROR");
  11. Dataset<Row> rawDataSet = sparkSession.readStream()
  12. .format("kafka")
  13. .option("kafka.bootstrap.servers", brokers)
  14. .option("subscribe", topics).load();
  15. rawDataSet.printSchema();
  16. rawDataSet = rawDataSet.withColumn("employee", rawDataSet.col("value").cast(DataTypes.StringType));
  17. rawDataSet.createOrReplaceTempView("basicView");
  18. Dataset<Row> writeDataset = sqlContext.sql("select employee from basicView");
  19. writeDataset
  20. .repartition(1)
  21. .writeStream()
  22. .option("path","/user/cloudera/employee/")
  23. .option("checkpointLocation", "/user/cloudera/employee.checkpoint/")
  24. .format("parquet")
  25. .trigger(Trigger.ProcessingTime(5000))
  26. .start()
  27. .awaitTermination();
  28. }

配置单元上的外部表

  1. CREATE EXTERNAL TABLE employee_raw ( employee STRING )
  2. STORED AS PARQUET
  3. LOCATION '/user/cloudera/employee' ;

现在我想在employee\u raw表的顶部创建一个hive视图,它将输出作为

  1. firstName, lastName, street, city, state, zip

employee\u raw表的输出为

  1. hive> select * from employee_raw;
  2. OK
  3. {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
  4. {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
  5. {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
  6. {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
  7. {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
  8. Time taken: 0.123 seconds, Fetched: 5 row(s)

感谢您的意见

wwwo4jvm

wwwo4jvm1#

根据您的描述,我觉得您主要喜欢“从配置单元中的json字符串中提取值”,所以您可以在链接的线程中找到答案。

相关问题