如何将hdfsParquet文件加载到hdfs表

qacovj5a 于 2021-06-24 发布在 Hive

关注(0)|答案(0)|浏览(261)

我正在尝试将Parquet文件加载到hdfs文件表中。下面是我的顺序。首先从创建我的表开始。

from pyspark.sql import SparkSession
    #from pyspark.sql import SQLContext
spark = (SparkSession \
        .builder \
        .appName("Test") \
        .getOrCreate())

spark.sql("create  external table if not exists table1 ( _c0 string, _c1 string, _c2 string, _c3 string, _c4 string, _c5 string, _c6 string) STORED AS parquet location 'hdfs://my_data/hive/db1/table1'") 

# table created successfully

然后我加载我的csv文件，并确保模式与我创建的表相同：

dp=spark.read.load("/user/path/test.parquet", format="parquet").printSchema()

下面是Parquet文件的内容：

然后我把它写在我上面创建的表的路径上：

dp.write.save('hdfs://my_data/hive/db1/table1', format="parquet")

这将成功运行，但当我从表1中选择*时，不会显示任何值（见下文）：

有人知道为什么没有值被插入或者什么都没有显示吗？是的，Parquet文件中有数据。

Hive hdfs apache-spark pyspark

来源：https://stackoverflow.com/questions/57420775/how-to-load-a-hdfs-parquet-file-to-a-hdfs-table

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

如何将hdfsParquet文件加载到hdfs表

暂无答案！

相关问题

热门标签

最新问答