scala 发现实体之间的关系

mspsb9vt  于 2022-11-09  发布在  Scala
关注(0)|答案(4)|浏览(126)

我有一个如下数据集-
List((X,Set(" 1", " 7")), (Z,Set(" 5")), (D,Set(" 2")), (E,Set(" 8")), ("F ",Set(" 5", " 9", " 108")), (G,Set(" 2", " 11")), (A,Set(" 7", " 5")), (M,Set(108)))
这里X与A相关,因为7在它们之间很常见
Z与A相关,因为5在它们之间很常见
F与A有关,因为5在它们之间很常见
M与F有关,因为108在它们之间很常见
因此,X、Z、A、F和M是相关的
D和G是相关的,因为2是它们之间的共同点
E与任何人都没有血缘关系
因此,输出将是((X,Z,A,F,M),(D,G),(E))
在这里,秩序并不重要。
我在这里使用了Scala,但Scala/Python或伪代码的解决方案对我来说是有效的。

enxuqcxy

enxuqcxy1#

构建一个无向图,其中每个标签都连接到相应集合中的每个数字(即(A, { 1, 2 })将给出两条边:A <-> 1A <-> 2)
计算连通分量(例如,使用深度优先搜索)。
仅从连接的组件中过滤出标签。

import util.{Left, Right, Either}
import collection.mutable

def connectedComponentsOfAsc[F, V](faces: List[(F, Set[V])]): List[List[F]] = {
  type Node = Either[F, V]
  val graphBuilder = mutable.HashMap.empty[Node, mutable.HashSet[Node]]

  def addEdge(a: Node, b: Node): Unit =
    graphBuilder.getOrElseUpdate(a, mutable.HashSet.empty[Node]) += b

  for
    (faceLabel, vertices) <- faces
    vertex <- vertices
  do
    val faceNode = Left(faceLabel)
    val vertexNode = Right(vertex)
    addEdge(faceNode, vertexNode)
    addEdge(vertexNode, faceNode)

  val graph = graphBuilder.view.mapValues(_.toSet).toMap
  val ccs = connectedComponents(graph)
  ccs.map(_.collect { case Left(faceLabel) => faceLabel }.toList)
}

def connectedComponents[V](undirectedGraph: Map[V, Set[V]]): List[Set[V]] = {
  val visited = mutable.HashSet.empty[V]
  var connectedComponent = mutable.HashSet.empty[V]
  val components = mutable.ListBuffer.empty[Set[V]]

  def dfs(curr: V): Unit = {
    if !visited(curr) then
      visited += curr
      connectedComponent += curr
      undirectedGraph(curr).foreach(dfs)
  }

  for v <- undirectedGraph.keys do
    if !visited(v) then
      connectedComponent = mutable.HashSet.empty[V]
      dfs(v)
      components += connectedComponent.toSet

  components.toList
}

可按如下方式使用:

@main def main(): Unit = {
  println(connectedComponentsOfAsc(
    List(
      ("X",Set("1", "7")),
      ("Z",Set("5")),
      ("D",Set("2")),
      ("E",Set("8")),
      ("F",Set("5", "9", "108")),
      ("G",Set("2", "11")),
      ("A",Set("7", "5")),
      ("M",Set("108"))
    )
  ).map(_.sorted).sortBy(_.toString))
}

产生:

List(List(A, F, M, X, Z), List(D, G), List(E))

所有步长都是O(n)(随输入大小线性调整)。
这个答案是自成一体的,但在这里使用某种图形库显然是有利的。

pwuypxnk

pwuypxnk2#

最终使用的是一种更简单的解决方案,如下所示:

data=[
["X",{"1", "7"}],
["Z",{"5",}],
["D",{"2",}],
["E",{"8",}],
["F",{"5", "9", "108"}],
["G",{"2", "11"}],
["A",{"7", "5"}],
["M",{"108"}]
]

for i in range(len(data)):
    for j in range(len(data)):
        if(data[i][1].intersection(data[j][1])):
            if(data[i][0]!=data[j][0] ):
                data[i][1] = data[j][1] = (data[i][1]).union(data[j][1])
for k, g in groupby(sorted([[sorted(tuple(d[1])),d[0]] for d in data]), key=lambda x: x[0]):
    print(list(l[1] for l in g))

以下列方式获取输出:

['A', 'F', 'M', 'X', 'Z']
['D', 'G']
['E']

对更多的几个数据集进行了测试,似乎运行良好。

hgtggwj0

hgtggwj03#

//我将一些值放在引号中,这样我们就有了一致的字符串输入

val initialData :List[(String, Set[String])] = List(
    ("X",Set(" 1", " 7")),
    ("Z",Set(" 5")),
    ("D",Set(" 2")),
    ("E",Set(" 8")),
    ("F ",Set(" 5", " 9", " 108")),
    ("G",Set(" 2", " 11")),
    ("A",Set(" 7", " 5")),
    ("M",Set("108"))
)

//通过将集合中的字符串数据转换为Int来清理集合

val cleanedData = initialData.map(elem => (elem._1, elem._2.map(_.trim.toInt)))

> cleanedData: List[(String, scala.collection.immutable.Set[Int])] = List((X,Set(1, 7)), (Z,Set(5)), (D,Set(2)), (E,Set(8)), ("F ",Set(5, 9, 108)), (G,Set(2, 11)), (A,Set(7, 5)), (M,Set(108)))

//将集合分解为简单Map列表。X->1,X->7。

val explodedList = cleanedData.flatMap(x => x._2.map(v => (x._1, v)))

> explodedList: List[(String, Int)] = List((X,1), (X,7), (Z,5), (D,2), (E,8), ("F ",5), ("F ",9), ("F ",108), (G,2), (G,11), (A,7), (A,5), (M,108))

将它们按新密钥组合在一起

val mappings = explodedList.groupBy(_._2)

> mappings: scala.collection.immutable.Map[Int,List[(String, Int)]] = Map(5 -> List((Z,5), ("F ",5), (A,5)), 1 -> List((X,1)), 9 -> List(("F ",9)), 2 -> List((D,2), (G,2)), 7 -> List((X,7), (A,7)), 108 -> List(("F ",108), (M,108)), 11 -> List((G,11)), 8 -> List((E,8)))

打印输出

mappings.foreach { case (key, items) =>
    println(s"${items.map(_._1).mkString(",")} are all related because of $key")
}

> Z,F ,A are all related because of 5
> X are all related because of 1
> F  are all related because of 9
> D,G are all related because of 2
> X,A are all related because of 7
> F ,M are all related because of 108
> G are all related because of 11
> E are all related because of 8
t0ybt7op

t0ybt7op4#

  • 读取输入,创建成对的矢量

例如:

X 1
X 7
Z 5
...
  • 按对的第二个成员的顺序对向量进行排序

E.g

X 1
D 2
G 2
...
  • 迭代排序后的向量,添加到“pass1组”,只要第二个成员不变。如果它确实发生了变化,则创建一个新的Pass1组。

例如:

X 
D G
Z F A
X A
E
F
G
  • 将pass1组与普通成员合并,以提供输出组。

下面是实现这一点的C++代码


# include <string>

# include <iostream>

# include <vector>

# include <algorithm>

bool merge(
    std::vector<char> &res,
    std::vector<char> &vg)
{
    bool ret = false;
    for (char r : res)
    {
        for (char c : vg)
        {
            if (c == r)
                ret = true;
        }
    }
    if (!ret)
        return false;

    for (char c : vg)
    {
        if (std::find(res.begin(), res.end(), c) == res.end())
            res.push_back(c);
    }
    return true;
}

void add(
    std::vector<std::vector<char>> &result,
    std::vector<char> &vg)
{
    std::vector<char> row;
    for (char c : vg)
        row.push_back(c);
    result.push_back(row);
}

main()
{
    std::string input = "List((X,Set(\" 1\", \" 7\")), (Z,Set(\" 5\")), (D,Set(\" 2\")), (E,Set(\" 8\")), (F,Set(\" 5\", \" 9\", \" 108\")), (G,Set(\" 2\", \" 11\")), (A,Set(\" 7\", \" 5\")), (M,Set(\"108\")))";

    input = "List((A,Set(\"0\", \"1\")),(B,Set(\"1\", \"2\")),(C,Set(\"2\", \"3\")),(D,Set(\"3\", \"4\")))";

    std::vector<std::pair<char, int>> vinp;

    int p = input.find("Set");
    int q = input.find("Set", p + 1);
    while (p != -1)
    {
        char c = input[p - 2];
        int s = input.find_first_of("0123456789", p);
        if( s == -1 )
            break;
        while (s < q)
        {
            vinp.push_back(std::make_pair(
                c,
                atoi(input.substr(s).c_str())));
            s = input.find_first_of("0123456789", s + 3);
            if( s == -1 )
                break;
        }
        p = q;
        q = input.find("Set", p + 1);
        if( q == -1 )
            q = input.length();
    }

    std::sort(vinp.begin(), vinp.end(),
              [](std::pair<char, int> a, std::pair<char, int> b)
              {
                  return a.second < b.second;
              });

    std::cout << "sorted\n";
    for (auto &p : vinp)
        std::cout << p.first << " " << p.second << "\n";

    std::vector<std::vector<char>> vpass1;
    std::vector<char> row;
    int sec = -1;
    for (auto &p : vinp)
    {
        if (p.second != sec)
        {
            // new group
            if (row.size())
                vpass1.push_back(row);
            sec = p.second;
            row.clear();
        }
        row.push_back(p.first);
    }

    std::cout << "\npass1\n";
    for (auto &row : vpass1)
    {
        for (char c : row)
            std::cout << c << " ";
        std::cout << "\n";
    }

    std::vector<std::vector<char>> result;
    std::vector<char> pass2group;
    bool fmerge2 = true;
    while (fmerge2)
    {
        fmerge2 = false;
        for (auto &vg : vpass1)
        {
            if (!result.size())
                add(result, vg);
            else
            {
                bool fmerge1 = false;
                for (auto &res : result)
                {
                    if (merge(res, vg))
                    {
                        fmerge1 = true;
                        fmerge2 = true;
                        break;
                    }
                }
                if (!fmerge1)
                    add(result, vg);
            }
        }
        if (fmerge2)
        {
            vpass1 = result;
            result.clear();
        }
    }

    std::cout << "\n(";
    for (auto &res : result)
    {
        if (res.size())
        {
            std::cout << "(";
            for (char c : res)
                std::cout << c << " ";
            std::cout << ")";
        }
    }
    std::cout << ")\n";

    return 0;
}

它会产生正确的结果

((X A Z F M )(D G )(E ))

相关问题