nativeazurefilesystem无法识别其他容器

fcwjkofz 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(331)

我的目标是从hdinsight示例的sparkshell访问位于创建集群的存储帐户内的容器中的blob。
我采取了以下步骤：
在容器上创建了hd insight集群https://mystorage.blob.core.windows.net：443/主容器。
在同一存储帐户上创建了另一个容器：https://mystorage.blob.core.windows.net：443/容器外。
在extracontainer中创建了一个名为person.json的文件：https://mystorage.blob.core.windows.net：443/extractanner/data/person.json
打开了spark shell会话
然后我执行了以下代码：

scala> import org.apache.hadoop.fs._

scala> val conf = sc.hadoopConfiguration
conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml

scala> val fs: FileSystem = FileSystem.newInstance(conf)
fs: org.apache.hadoop.fs.FileSystem = org.apache.hadoop.fs.azure.NativeAzureFileSystem@417e5282

scala> val files = fs.listFiles(new Path("wasbs://extracontainer@mystorage.blob.core.windows.net/data"), true)
java.io.FileNotFoundException: Filewasbs://extracontainer@mystorage.blob.core.windows.net/data does not exist.

然后我在主容器上创建了相同的文件夹和文件：https://mystorage.blob.core.windows.net：443/maincontainer/data/person.json，得到如下结果：

scala> val files = fs.listFiles(new Path("wasbs://extracontainer@mystorage.blob.core.windows.net/data"), true)
scala> while( files.hasNext() ) { println(files.next().getPath) }
wasb://maincontainer@mystorage.blob.core.windows.net/data/person.json

它显示主容器中的文件，而不是外容器中的文件。
有人知道发生了什么吗？
我还尝试使用 new Configuration() 我也有同样的行为。
使用时获得正确的行为 hadoop fs 命令行：

> hadoop fs -ls wasbs://extracontainer@mystorage.blob.core.windows.net/data/
Found 1 item
-rwxrwxrwx   1        977 2017-02-27 08:46 wasbs://extracontainer@mystorage.blob.core.windows.net/data/person.json

hadoop apache-spark Azure azure-hdinsight

来源：https://stackoverflow.com/questions/42483815/nativeazurefilesystem-not-recognizing-other-containers

1条答案

按热度按时间

jchrr9hc1#

根据您的描述，基于我的理解，我认为您希望使用spark从azure blob存储读取数据，但是 fs.defaultFS 已为您的设置hadoop配置 maincontainer 当您创建hdinsight示例时。
有两种方法来实现你的需求。
使用方法 addResource(new Path("wasbs://extracontainer@mystorage.blob.core.windows.net/data")) 或者 set("fs.defaultFS", "wasbs://extracontainer@mystorage.blob.core.windows.net/data") 班级 Configuration 覆盖 fs.defaultFS 用于切换资源引用的值，如果 fs.defaultFS 中的属性 core-site.xml 未标记 <final>true</final> . 所以首先，你需要搬到 /etc/hadoop/conf 去改变它。
参考类似的so线程从带有spark的azureblob读取数据，您可以尝试使用下面的代码来读取数据。

conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
conf.set("fs.azure.account.key.<youraccount>.blob.core.windows.net", "<yourkey>")

希望有帮助。

赞(0）回复(0）举报 2021-06-02

我来回答

nativeazurefilesystem无法识别其他容器

1条答案

相关问题

热门标签

最新问答