我正试着在实践中学习这个榜样https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-python.html
然而,当改变某些标准时,结果并不符合预期。请参见以下步骤-
从functools import reduce from pyspark.sql.functions import col,lit,when from graphframes import*
vertices = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60)], ["id", "name", "age"])
edges = sqlContext.createDataFrame([
("a", "b", "follow"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "follow"),
("d", "a", "follow"),
("a", "e", "follow")
], ["src", "dst", "relationship"])
g = GraphFrame(vertices, edges)
现在我在“关系”列中做了一个更改,所有值都是“follow”而不是“friend”。
下面的查询运行正常-
g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 32", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()
+--------------+--------------+---------------+--------------+----------------+
| from| e0| v1| e1| to|
+--------------+--------------+---------------+--------------+----------------+
|[a, Alice, 34]|[a, e, follow]|[e, Esther, 32]|[e, d, follow]| [d, David, 29]|
|[a, Alice, 34]|[a, b, follow]| [b, Bob, 36]|[b, c, follow]|[c, Charlie, 30]|
+--------------+--------------+---------------+--------------+----------------+
但如果我将筛选条件从32更改为40,将获取错误的结果-
>>> g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 35", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()
+--------------+--------------+
| from| to|
+--------------+--------------+
|[a, Alice, 34]|[a, Alice, 34]|
+--------------+--------------+
理想情况下,它应该从第一个查询中获取类似的结果,因为所有行的筛选条件仍然得到满足。
有什么解释吗?
1条答案
按热度按时间k5hmc34c1#
bfs()搜索满足 predicate 的第一个结果。艾丽丝今年34岁,符合
toExpr = "age < 35"
所以你得到了从alice开始的零长度路径。请更改为EXPR以获取更具体的信息。例如toExpr ="name = 'David' or name = 'Charlie'"
应该给出与第一个查询完全相同的结果。