在emr中使用spark从s3读取avro失败

mepcadol  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(477)

在aws emr执行spark作业时,我在尝试从s3 bucket读取avro文件时遇到以下错误:版本:
电子病历-5.5.0
电子病历-5.9.0
代码如下:

val files  = 0 until numOfDaysToFetch map { i =>
  s"s3n://bravos/clicks/${fromDate.minusDays(i)}/*"
}
spark.read.format("com.databricks.spark.avro").load(files: _*)

例外情况:

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: 1037330823653531755-2017-10-16T03:06:00.avro
    at org.apache.hadoop.fs.Path.initialize(Path.java:205)
    at org.apache.hadoop.fs.Path.<init>(Path.java:171)
    at org.apache.hadoop.fs.Path.<init>(Path.java:93)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:241)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1732)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1713)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.globStatus(EmrFileSystem.java:362)
    at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:237)
    at org.apache.spark.deploy.SparkHadoopUtil.globPathIfNecessary(SparkHadoopUtil.scala:243)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:374)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)

`

afdcj2ne

afdcj2ne1#

我从/*中去掉了最后一个,它就起作用了

1aaf6o9v

1aaf6o9v2#

Path 不支持冒号。它将1037330823653531755-2017-10-16t03:解释为uri模式,然后对任何填充“/”感到不满意。即使如此,它也会在“无模式文件系统”10373308236531755-2017-10-16t03上失败
修正:不要在文件名中使用“:”。

相关问题