kerberos:spark-ugi凭据没有传递到配置单元

eimct9ow 于 2021-05-31 发布在 Hadoop

关注(0)|答案(1)|浏览(526)

我使用的是spark-2.4，我有一个启用kerberos的集群，我试图通过 spark-sql 贝壳。
简化的设置基本上是这样的：sparksqlshell在yarn集群中的一个主机上运行->运行一个主机的外部hivemetastore->s3来存储表数据。
当我启动 spark-sql 启用了调试日志记录的shell，这是我在日志中看到的：

> bin/spark-sql --proxy-user proxy_user

...
DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for proxy_user against hive/_HOST@REALM.COM at thrift://hive-metastore:9083
DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_host@REALM.COM (auth:KERBEROS) from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130)

这意味着spark调用从配置单元元存储中获取委派令牌，然后将其添加到ugi的凭据列表中。这是spark中的一段代码。我还在metastore日志中验证 get_delegation_token() 正在打电话。
现在当我运行一个简单的查询 create table test_table (id int) location "s3://some/prefix"; 我被一个错误击中了。我修改了hive metastore代码，并在hadoop中的文件系统初始化之前添加了这个代码（org/apache/hadoop/hive/metastore/warehouse.java）：

public static FileSystem getFs(Path f, Configuration conf) throws MetaException {
...
    try {
      // get the current user 
      UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
      LOG.info("UGI information: " + ugi);
      Collection<Token<? extends TokenIdentifier>> tokens = ugi.getCredentials().getAllTokens();
      // print all the tokens it has 
      for(Token token : tokens) {
        LOG.info(token);
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
...
}

在metastore日志中，这会打印正确的ugi信息：

UGI information: proxy_user (auth:PROXY) via hive/hive-metastore@REALM.COM (auth:KERBEROS)

但是ugi里没有代币。看起来spark代码加上了别名 hive.server2.delegation.token 但我在ugi里看不到。这让我怀疑ugi作用域是隔离的，不能在sparksql和hivemetastore之间共享。我该怎么解决这个问题？

hadoop apache-spark kerberos hive-metastore kerberos-delegation

来源：https://stackoverflow.com/questions/61355997/kerberos-spark-ugi-credentials-are-not-getting-passed-down-to-hive

1条答案

按热度按时间

58wvjzkj1#

spark没有获取您的kerberos标识—它要求每个fs发出一些“委托令牌”，让调用者单独与该服务交互。这更受限制，因此更安全。
这里的问题是，spark从每个可以发出委派令牌的文件系统收集委派令牌——而且由于您的s3连接器没有发出任何委派令牌，因此没有出现任何问题。
现在，apachehadoop3.3.0的s3a连接器可以设置为在委派令牌中发布aws凭据，或者，为了安全起见，向aws请求会话凭据并只发送那些凭据。但是（a）您需要一个具有这些依赖关系的spark构建，并且（b）hive需要使用这些凭证来与s3通信。

赞(0）回复(0）举报 2021-06-01

我来回答

kerberos:spark-ugi凭据没有传递到配置单元

1条答案

相关问题

热门标签

最新问答