我想在配置单元中使用运行azurehdinsight on demand集群(3.6)的azuredatafactory(v1)执行一些数据转换。
由于hdinsight on demand集群在一段空闲时间后被销毁,并且我希望/需要保留有关配置单元表(例如分区)的元数据,因此我还使用azure sql server数据库配置了一个外部配置单元元存储。
现在,我希望将所有生产数据存储在一个单独的存储帐户上,而不是一个“默认”帐户,在该帐户中,data factory和hdinsight还为日志记录和其他运行时数据创建容器。
所以我有以下资源:
带有hdinsight on demand的数据工厂(作为链接服务)
配置单元元存储的sql server和数据库(在hdinsight on demand中配置)
data factory和hdinsight on demand cluster使用的默认存储帐户(blob存储,通用v1)
用于数据入口和配置单元表的附加存储帐户(blob存储,通用v1)
除了数据工厂 North Europe
,所有资源都在同一位置 West Europe
,这应该没问题(hdinsight集群必须与要使用的任何存储帐户位于同一位置)。所有与数据工厂相关的部署都是使用datafactorymanagementclient api完成的。
示例配置单元脚本(作为数据工厂中的配置单元活动部署)如下所示:
CREATE TABLE IF NOT EXISTS example_table (
deviceId string,
createdAt timestamp,
batteryVoltage double,
hardwareVersion string,
softwareVersion string,
)
PARTITIONED BY (year string, month string) -- year and month from createdAt
CLUSTERED BY (deviceId) INTO 256 BUCKETS
STORED AS ORC
LOCATION 'wasb://container@additionalstorage.blob.core.windows.net/example_table'
TBLPROPERTIES ('transactional'='true');
INSERT INTO TABLE example_table PARTITIONS (year, month) VALUES ("device1", timestamp "2018-01-22 08:57:00", 2.7, "hw1.32.2", "sw0.12.3");
按照这里和这里的文档,这应该相当简单:只需添加新的存储帐户作为附加链接服务(使用 additionalLinkedServiceNames
属性)。
但是,当配置单元脚本尝试访问存储在此帐户上的表时,这会导致以下异常:
IllegalStateException Error getting FileSystem for wasb : org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.KeyProviderException: ExitCodeException exitCode=2: Error reading S/MIME message
139827842123416:error:0D06B08E:asn1 encoding routines:ASN1_D2I_READ_BIO:not enough data:a_d2i_fp.c:247:
139827842123416:error:0D0D106E:asn1 encoding routines:B64_READ_ASN1:decode error:asn_mime.c:192:
139827842123416:error:0D0D40CB:asn1 encoding routines:SMIME_read_ASN1:asn1 parse error:asn_mime.c:517:
一些google告诉我,当密钥提供程序配置不正确时(即抛出异常是因为它试图解密密钥,即使它没有加密)。手动设置后 fs.azure.account.keyprovider.<storage_name>.blob.core.windows.net
至 org.apache.hadoop.fs.azure.SimpleKeyProvider
它似乎适用于读取和“简单”地将数据写入表,但在涉及元存储时再次失败(创建表、添加新分区……):
ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:783)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4434)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:316)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
[...]
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38593)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38561)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:38487)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1103)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1089)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2203)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:99)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:736)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:724)
[...]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:178)
at com.sun.proxy.$Proxy5.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:777)
... 24 more
我又试着用谷歌搜索了一遍,但没有找到有用的东西。我认为这可能与以下事实有关:metastore服务与配置单元分开运行,并且由于某些原因无法访问已配置的存储帐户密钥。。。但老实说,我认为这一切都应该在不手动修补hadoop/hive配置的情况下工作。
所以,我的问题是:我做错了什么?这应该怎么做?
1条答案
按热度按时间ht4b089n1#
您需要确保还将hadoop-azure.jar和azure-storage-5.4.0.jar添加到hadoop-env.sh中的hadoop类路径导出中。
export hadoop\u classpath=/usr/lib/hadoop client/hadoop azure.jar:/usr/lib/hadoop client/lib/azure-storage-5.4.0.jar:$hadoop\u classpath
您需要通过核心站点中的以下参数添加存储密钥。fs.azure.account.key.{storageaccount}.blob.core.windows.net
创建db和表时,需要使用存储帐户和用户id create table{tablename}指定位置。。。位置'wasbs://{container}@{storageaccount}.blob.core.windows.net/{filepath}'
如果尝试上述操作后仍有问题,请检查存储帐户是v1还是v2。我们遇到了一个问题,v2存储帐户与我们的hdp版本不兼容。