如何在amazon emr上向presto添加连接器

8qgya5xd 于 2021-06-26 发布在 Hive

关注(0)|答案(3)|浏览(683)

我已经建立了一个安装了hive/presto的小型emr集群，我想查询s3上的文件并将它们导入rds上的postgres。
要在s3上运行查询并将结果保存在postgres的表中，我已完成以下操作：
从aws控制台启动了一个3节点的emr集群。
手动ssh到主节点，在hive中创建一个外部表，查看一个s3 bucket。
手动ssh到3个节点中的每个节点，并添加一个新的目录文件：

/etc/presto/conf.dist/catalog/postgres.properties

包括以下内容

connector.name=postgresql
connection-url=jdbc:postgresql://ip-to-postgres:5432/database
connection-user=<user>
connection-password=<pass>

编辑了这个文件

/etc/presto/conf.dist/config.properties

添加

datasources=postgresql,hive

通过在所有3个节点上手动运行以下命令来重新启动presto

sudo restart presto-server

这个设置似乎很有效。
在我的应用程序中，动态创建了多个数据库。似乎需要对每个数据库进行配置/目录更改，并且需要重新启动服务器才能看到新的配置更改。
我的应用程序（使用boto或其他方法）是否有合适的方法通过
为每个新数据库在所有节点/etc/presto/conf.dist/catalog/中添加新的目录文件
在/etc/presto/conf.dist/config.properties中的所有节点中添加新条目
在整个集群中优雅地重新启动presto（理想情况下是当它空闲时，但这不是一个主要问题）。

Hive presto amazon-web-services emr

来源：https://stackoverflow.com/questions/36068935/how-to-add-connectors-to-presto-on-amazon-emr

3条答案

按热度按时间

pxiryf3j1#

我相信你可以运行一个简单的bash脚本来实现你想要的。除了创建一个具有 --configurations 提供所需配置的参数。您可以从主节点运行下面的脚本。


# !/bin/sh
# "cluster_nodes.txt" with private IP address of each node.
aws emr list-instances --cluster-id <cluster-id> --instance-states RUNNING | grep PrivateIpAddress | sed 's/"PrivateIpAddress"://' | sed 's/\"//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' > cluster_nodes.txt
# For each IP connect with ssh and configure.
while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "Connecting $line"
    scp -i <PEM file> postgres.properties hadoop@$line:/tmp;
    ssh -i <PEM file> hadoop@$line "sudo mv /tmp/postgres.properties /etc/presto/conf/catalog;sudo chown presto:presto /etc/presto/conf/catalog/postgres.properties;sudo chmod 644 /etc/presto/conf/catalog/postgres.properties; sudo restart presto-server";
done < cluster_nodes.txt

赞(0）回复(0）举报 2021-06-26

ia2d9nvy2#

您可以通过管理控制台提供如下配置：

或者你可以用 awscli 通过以下配置：


# !/bin/bash
JSON=`cat <<JSON
[
  { "Classification": "presto-connector-postgresql",
    "Properties": {
      "connection-url": "jdbc:postgresql://ip-to-postgres:5432/database",
      "connection-user": "<user>",
      "connection-password": "<password>"
    },
    "Configurations": []
  }
]
JSON`
aws emr create-cluster --configurations "$JSON" # ... reset of params