scala脚本在sparkshell中创建df和temp表-问题

v09wglhw  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(494)
I have loaded multiple parquet files to create multiple DFs, but when I am using for loop, I am getting errors. 

Markdown and HTML are turned off in code blocks:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val url_1 = "s3://file_path/folder1.parquet/*"
val url_2 = "s3://file_path/folder2.parquet/*"
val url_3 = "s3://file_path/folder3.parquet/*"
for (url <- Array(url_1 ,url_2 ,url_3)) var parqfile=sqlContext.read.load(url)
for (item <- Array("tb1","tb2","tb3")) parqfile.registerTempTable(item)

但我不能这样做,因为它说我有:1:错误:简单表达式的非法开始
请帮忙。。。谢谢!

vvppvyoh

vvppvyoh1#

正确的方法是Spark2.x而不是1.6,但同样的原则也适用。更简单,使用df作为源。注意{}。

val tb1 = spark.sparkContext.parallelize(Seq(
    ("A", "X", "done"),
    ("A", "Y", "done"),
    ("C", "Y", "done"),
    ("B", "Y", "done")
  )).toDF("Company", "Type", "Status")
val tb2 = spark.sparkContext.parallelize(Seq(
    ("A", "X", "done"),
    ("B", "Y", "done")
  )).toDF("Company", "Type", "Status")
val tb3 = spark.sparkContext.parallelize(Seq(
    ("A", "X", "done")
  )).toDF("Company", "Type", "Status")

for (tb <- Array(tb1 , tb2 , tb3)) {
     tb.createOrReplaceTempView(s"tb")
}

tb2.show // etc.

相关问题