Erlang -字符串到集合

mklgxw1f  于 2022-12-08  发布在  Erlang
关注(0)|答案(3)|浏览(185)

我目前正在尝试理解Erlang集合的行为,当我计算字符串变位词上的集合时。在我的理解中,两个变位词应该产生两个相同的字符串集合。

Set1 = sets:from_list("orchestra"). 
Set2 = sets:from_list("carthorse"). 
Set1 =:= Set2. %% true

然而,使用sets:intersection,我们得到了一个不同的集合,它不等于前两个集合。

Intersection = sets:intersection(Set1, Set2). 
Intersection =:= Set1. %% false
Intersection =:= Set2. %% false

基于Erlang中集合交集的计算方式,这种行为是否有特殊的原因?

1cklez4t

1cklez4t1#

sets模块的实现并不保证两个集合可以与=:=进行比较,即使它们包含相同的元素。内部数据结构可以不同。您可以使用is_subset/2subtract/2之类的运算(相对效率较低),或者可以使用to_list/1,然后使用lists:sort/1来获得两个可以直接比较的列表。(字符列表)无论如何,你最好马上使用ordsets。这些是有序的列表,你可以把它们作为集合来操作,并且可以直接进行比较。对于小的集合,它们通常比sets更有效。

> Set1 = ordsets:from_list("orchestra").
"acehorst"
> Set2 = ordsets:from_list("carthorse").
"acehorst"
> Set1 =:= Set2.
true
> Intersection = ordsets:intersection(Set1, Set2).
"acehorst"
> Intersection =:= Set1.
true
> Intersection =:= Set2.
true
xxhby3vn

xxhby3vn2#

=:=运算符比较Set1Intersection的表示,但不能保证表示是什么,也不能保证同一个集合只有一个表示。
sets的文档只在描述如何比较集合的元素时讨论了=:=,而不是集合本身。
对于集合相等,可以定义

set_eq(S1, S2) ->
    sets:is_subset(S1, S2) andalso sets:is_subset(S2, S1).
ars1skjm

ars1skjm3#

Let's take the code and look at what happens along the way.
We are defining 2 sets

Set1 = sets:from_list("orchestra"). 
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

and

Set2 = sets:from_list("carthorse"). 
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

where the second lines represent the values created with a representation. So based on the above definition, we infer that the representation chosen for storing sets is of a tuple.
We can refer to the Term Comparison section of Erlang Reference manual which states that

  • ...Lists are compared element by element. Tuples are ordered by size, two tuples with the same size are compared element by element...*

Having that as an invariant, now lets look at the set which comes about after an intersection

Intersection = sets:intersection(Set1, Set2).
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"cs",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

Again, this being a set gets the same underlying representation of the tuple, but if we look at the last element, there is a string (essentially a list of characters in Erlang) "cs" , which is different in value in the same place when we defined Set1 and Set2 . And hence the inequality as per the Term Comparison .
Now, let's attempt to change the sub-value "cs" inside Intersection to "sc" and then as per the Term Comparison rules, two sets must satisfy equality.

Improvised_Intersection = setelement(9, Intersection, {setelement(2, element(1, element(9, Intersection)), "sc")}).
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

And now let's compare Improvised_Intersection with Set1 and Set2 , which gives

Improvised_Intersection =:= Set1.                                                                                  
true

Improvised_Intersection =:= Set2.                                                                                  
true

Intersection =:= Improvised_Intersection.
false

It's just an attempt to provide some insight into what already @Richard and others have pointed out.

相关问题