spark图上的gremlin遍历查询

nsc4cvqm 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(981)

我使用apachespark graphx框架从s3构建了一个属性图（6000万个节点，4000万条边）。我想对这个图进行遍历查询。
我的问题是like:-

g.V().has("name","xyz").out('parent').out().has('name','abc')

g.V().has('proc_name','serv.exe').out('file_create').
has('file_path',containing('Tsk04.txt')).in().in('parent').values('proc_name')

g.V().has('md5','935ca12348040410e0b2a8215180474e').values('files')

大多数查询都是形式化的 g.V().out().out().out() 这样的查询很容易在像neo4j，titan，aws neptune这样的图形数据库上实现，因为它们支持gremlin。
我们能以这种方式遍历spark图吗。我尝试了spark pregel api，但与gremlin相比，它有点复杂。
我之所以寻找spark graph是因为上述GraphDB的云解决方案成本高昂。

apache-spark spark-graphx gremlin graph-databases

来源：https://stackoverflow.com/questions/62086093/gremlin-traversal-queries-on-spark-graph

1条答案

按热度按时间

6ju8rftf1#

spark graphframes库对您来说应该是最方便的。它提供了neo4j类似cypher的遍历描述，并使用sparkDataframeapi进行过滤
https://graphframes.github.io/graphframes/docs/_site/user-guide.html#motif-以下是一个例子：

val g2: GraphFrame = GraphFrame.fromGraphX(gx) // you can start with just V and E dataframes here
val motifs: GraphFrame = g.find("(a)-[e]->(b); (b)-[e2]->(c)")
motifs.filter("a.name = 'xyz'  and e.label = 'parent' and c.name = 'abc'").show()

tnokerpopit本身具有spark支持，因此您可以从gremlin控制台发出spark olap查询https://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer
或者有一些接近源代码的解决方案。datastax企业数据库对spark有很好的gremlin支持：https://www.datastax.com/blog/2017/05/introducing-dse-graph-frames 我以前是这本书的作者

赞(0）回复(0）举报 2021-05-27

我来回答

spark图上的gremlin遍历查询

1条答案

相关问题

热门标签

最新问答