我试图阅读一个Hive表在pyspark,但我得到的标题以及我不想要的。文件.csv
Id,Name 1,A 2,B 3,C 4,D
Hive表我建立Hive表 tblproperties("skip.header.line.count"="1") 在hive中,我得到的数据是正确的,所以hive没有问题。当我在Pypark中阅读这张表时,我面临着这个问题。
tblproperties("skip.header.line.count"="1")
7vhp5slm1#
有Spark-11374吉拉报告了这一问题,并关闭了 won't fix .可能的方法有: 1.You can directly read the HDFS file:spark.read.option("header","true").option("delimiter",",").csv("<hdfs_path>").show()2.using hive query: ```spark.sql("select * from <table_name> where <col_name1> != 'id'").show()
won't fix
1.You can directly read the HDFS file:
spark.read.option("header","true").option("delimiter",",").csv("<hdfs_path>").show()
2.using hive query:
1条答案
按热度按时间7vhp5slm1#
有Spark-11374吉拉报告了这一问题,并关闭了
won't fix
.可能的方法有:
1.You can directly read the HDFS file:
spark.read.option("header","true").option("delimiter",",").csv("<hdfs_path>").show()
2.using hive query:
```spark.sql("select * from <table_name> where <col_name1> != 'id'").show()