hadoop和spark不使用janusgraph

t40tm48m  于 2021-07-15  发布在  Hadoop
关注(0)|答案(0)|浏览(480)

使用janusgraph(0.4.0)安装并配置了apachehadoop(3.3.0)和apachespark(3.0.1)。当我试图执行一个查询时,它不起作用。早些时候,它使用的是原生spark,但返回一条记录需要很长时间(以分钟为单位)。如何优化时间并使其运行?

gremlin> hdfs
==>storage[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-568319645_1, ugi=fusionops (auth:SIMPLE)]]]
gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-standalone-cluster.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g.V().limit(1).valueMap()
13:52:00 WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
13:52:00 WARN  org.apache.spark.SparkContext  - Another SparkContext is being constructed (or threw an exception in its constructor).  This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:52)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:60)
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:313)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
13:53:00 ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application has been killed. Reason: All masters are unresponsive! Giving up.
13:53:00 WARN  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend  - Application ID is not initialized yet.
13:53:00 WARN  org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint  - Drop UnregisterApplication(null) because has not yet connected to master
13:53:00 ERROR org.apache.spark.SparkContext  - Error initializing SparkContext.
java.lang.NullPointerException

read-cql-standalone-cluster.properties读取


# Copyright 2020 JanusGraph Authors

# 

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

# 

# http://www.apache.org/licenses/LICENSE-2.0

# 

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# 

# Hadoop Graph Configuration

# 

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true

# 

# JanusGraph Cassandra InputFormat configuration

# 

# These properties defines the connection properties which were used while write data to JanusGraph.

janusgraphmr.ioformat.conf.storage.backend=cql

# This specifies the hostname & port for Cassandra data store.

janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9042

# This specifies the keyspace where data is stored.

janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph

# This defines the indexing backend configuration used while writing data to JanusGraph.

janusgraphmr.ioformat.conf.index.search.backend=elasticsearch
janusgraphmr.ioformat.conf.index.search.hostname=127.0.0.1

# Use the appropriate properties for the backend when using a different storage backend (HBase) or indexing backend (Solr).

# 

# Apache Cassandra InputFormat configuration

# 

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true

# 

# SparkGraphComputer Configuration

# 

spark.master=spark://127.0.0.1:7077
spark.executor.memory=1g
spark.executor.extraClassPath=/opt/lib/janusgraph/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题