elasticsearch pyspark连接处于不安全模式

zi8p0yeb  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(744)

我的最终目标是将数据从hdfs插入elasticsearch,但我面临的问题是连接性
我可以使用下面的curl命令连接到我的elasticsearch节点 curl -u username -X GET https://xx..xx.:9200/_cat/indices?v' --insecure 但当谈到与Spark的联系,我不能这样做。我插入数据的命令是 df.write.mode("append").format('org.elasticsearch.spark.sql').option("es.net.http.auth.user", "username").option("es.net.http.auth.pass", "password").option("es.index.auto.create","true").option('es.nodes', 'https://xx..xx.').option('es.port','9200').save('my-index/my-doctype') 我得到的错误是

org.elastisearch.hadoop.EsHadoopIllegalArgumentException:Cannot detect ES version - typical this happens if then network/Elasticsearch cluster is not accessible or when targetting a Wan/Cloud instance without the proper setting 'es.nodes.wan.only'
....
....
Caused by: org.elasticseach.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proy settings)- all nodes failed; tried [[xx.xxx.xx.xxx:9200]]
....
...

在这里,什么是pyspark的curl等价物——不安全的
谢谢

mtb9vblg

mtb9vblg1#

经过多次尝试和不同的配置选项。我找到了一种方法来连接运行在https上的elastisearch

dfToEs.write.mode("append").format('org.elasticsearch.spark.sql') \
        .option("es.net.http.auth.user", username) \
        .option("es.net.http.auth.pass", password) \
        .option("es.net.ssl", "true") \
        .option("es.net.ssl.cert.allow.self.signed", "true") \
        .option("mergeSchema", "true") \
        .option('es.index.auto.create', 'true') \
        .option('es.nodes', 'https://{}'.format(es_ip)) \
        .option('es.port', '9200') \
        .option('es.batch.write.retry.wait', '100s') \
        .save('{index}/_doc'.format(index=index))

(es.net.ssl, true)

我们还必须提供如下自签名证书

(es.net.ssl.cert.allow.self.signed, true)
yruzcnhs

yruzcnhs2#

你能试试下面的sparkconfs吗,

val sparkConf = new SparkConf()
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.es.index.auto.create", "true")
.set("spark.es.nodes", "yourESaddress")
.set("spark.es.port", "9200")
.set("spark.es.net.http.auth.user","")
.set("spark.es.net.http.auth.pass", "")
.set("spark.es.resource", indexName)
.set("spark.es.nodes.wan.only", "true")

你还是要面对问题, es.net.ssl = true 看看吧。
如果仍然出现错误,请尝试添加以下配置,

'es.resource' = 'ctrl_rater_resumen_lla/hb',
'es.nodes' = 'localhost',
'es.port' = '9200',
'es.index.auto.create' = 'true',
'es.index.read.missing.as.empty' = 'true',
'es.nodes.discovery'='true',
'es.net.ssl'='false'
'es.nodes.client.only'='false',
'es.nodes.wan.only' = 'true'
'es.net.http.auth.user'='xxxxx',
'es.net.http.auth.pass' = 'xxxxx'
'es.nodes.discovery' = 'false'

相关问题