为什么在Erlang/OTP 20上评估此List解析需要太长时间?

plupiseo  于 2022-12-08  发布在  Erlang
关注(0)|答案(4)|浏览(146)

找到任何5个数字,其总和= 100。这可以在一个循环中完成,但我正在向一个朋友说明列表理解,才意识到这需要超过30分钟在我的Mac Book Pro,核心i7,2.2GHz
[[A,B,C,D,E] || A <- lists:seq(1,100),B <- lists:seq(1,100),C <- lists:seq(1,100),D <- lists:seq(1,100),E <- lists:seq(1,100),(A + B + C + D + E) == 100]
如果将问题改为5个数字连续,那么构建列表解析甚至需要更长的时间。如果我使用列表解析来解决这个问题,我做得对吗?如果是,为什么需要太长的时间?请提供一个可能更快的解决方案,也许使用循环。

koaltpgm

koaltpgm1#

多个生成器的行为就像列表上的嵌套循环,每次调用lists:seq()都将被完全求值。这需要很长的时间,而且大部分时间都花在分配列表单元格和垃圾回收上。但是由于它们都求值为相同的常量列表,所以可以重写为L = lists:seq(1,100),[[A,B,C,D,E]|| A〈- L,B〈- L,C〈- L,D〈- L,E〈- L,(A + B + C + D + E)== 100]。而且,在shell中运行这个代码会比在编译模块中慢很多。在我的macbook上,编译后的代码大约在2分30秒内完成。这还只是使用单核。用[native]编译可以让它在60秒内运行。

7fyelxc5

7fyelxc52#

Because it "creates" all the elements of a 100^5 list of list of 5 elements before it makes the filter, that represents 50000000000 elements.

[edit] I reviewed the answer from RichardC and Alexey Romanov and I decided to make some tests:

-module (testlc).

-export ([test/1]).

test(N) ->
    F1 = fun() -> [{W,X,Y,Z}|| W <- lists:seq(1,N),X <- lists:seq(1,N),Y <- lists:seq(1,N),Z <- lists:seq(1,N), W+X+Y+Z == N] end,
    F2 = fun() ->L = lists:seq(1,N),  [{W,X,Y,Z}|| W <- L,X <- L,Y <- L,Z <- L, W+X+Y+Z == N] end,
    F3 = fun() -> [{W,X,Y,Z}|| W <- lists:seq(1,N-3), X <- lists:seq(1,N-2-W),Y <- lists:seq(1,N-1-W-X),Z <- lists:seq(1,N-W-X-Y), W+X+Y+Z == N] end,
    F4 = fun() -> [{W,X,Y,N-W-X-Y}|| W <- lists:seq(1,N-3),X <- lists:seq(1,N-2-W),Y <- lists:seq(1,N-1-W-X)] end,
    F5 = fun() -> L = lists:seq(1,N), [{W,X,Y,N-W-X-Y}|| W <- L, 
                                                         XM <- [N-2-W],      X <- L, X =< XM, 
                                                         YM <- [N-1-W-X],    Y <- L, Y =< YM] end,
    {T1,L1} = timer:tc(F1),
    {T2,L2} = timer:tc(F2),
    {T3,L3} = timer:tc(F3),
    {T4,L4} = timer:tc(F4),
    {T5,L5} = timer:tc(F5),
    _L = lists:sort(L1),
    _L = lists:sort(L2),
    _L = lists:sort(L3),
    _L = lists:sort(L4),
    _L = lists:sort(L5),
    {test_for,N,{t1,T1},{t2,T2},{t3,T3},{t4,T4},{t5,T5}}.

and the result:

1> c(testlc).      
{ok,testlc}
2> testlc:test(50).
{test_for,50,
          {t1,452999},
          {t2,92999},
          {t3,32000},
          {t4,0},
          {t5,0}}
3> testlc:test(100).
{test_for,100,
          {t1,4124992},
          {t2,1452997},
          {t3,203000},
          {t4,16000},
          {t5,15000}}
4> testlc:test(150).
{test_for,150,
          {t1,20312959},
          {t2,7483985},
          {t3,890998},
          {t4,93000},
          {t5,110000}}
5> testlc:test(200).
{test_for,200,
          {t1,63874875},
          {t2,24952951},
          {t3,2921995},
          {t4,218999},
          {t5,265000}}

Preparing the list outside of the list comprehension has a big impact, but it is more efficient to limit drastically the number of useless intermediate lists generated before the filter works. So it is a balance to evaluate. In this example, the 2 enhancements can be used together (Thanks to Alexey) but it does not make a big difference.

wmvff8tz

wmvff8tz3#

Erlang很强,当我们在编程中使用并发时,这样你也可以派生100个进程来处理列表[1,...,100]。这样可以方便你的笔记本电脑计算。例如:

do()->    
    L100 = lists:seq(1,100),
    [spawn(?MODULE, func, [self(), [A], L100, L100, L100, L100]) || 
        A <- L100],    
    loop(100, []).
loop(0, Acc) -> Acc;
loop(N, Acc) ->
    receive
        {ok, Result} ->
            loop(N - 1, Acc ++ Result)
    end.

func(Pid, LA, LB, LC, LD, LE) ->
    Result = [[A,B,C,D,E] ||
             A <- LA,B <- LB,C <- LC,D <- LD,E <- LE,(A + B + C + D + E) == 100],
    Pid ! {ok, Result}.

有了上面的解决方案,我的笔记本电脑与i3 2.1GHz可以很容易地计算在1分钟。你也可以产生更多的进程更短的计算。进程在Erlang是轻量级进程,所以它可以很容易地开始,然后很容易停止。

os8fio9y

os8fio9y4#

一个选择是

[[A,B,C,D,100-A-B-C-D] || A <- lists:seq(1,100), B <- lists:seq(1,100-A), C <- lists:seq(1,100-A-B), D <- lists:seq(1,100-A-B-C), 100-A-B-C-D > 0]

仅仅是不枚举所有可能的E,最多只会有一个成功,应该会快100倍(或者更多,因为产生的垃圾更少)。
但是这里有一些代码重复。不幸的是,Erlang不允许在列表解析中使用“局部”变量,但是你可以用单元素生成器来模拟它们:

[[A,B,C,D,E] || A <- lists:seq(1,100), 
    BMax <- [100-A], B <- lists:seq(1,BMax), 
    CMax <- [BMax-B], C <- lists:seq(1,CMax), 
    DMax <- [CMax-C], D <- lists:seq(1,DMax), 
    E <- [100-A-B-C-D], E > 0]

或者避免重复lists:seq调用,正如@RichardC指出的:

L = lists:seq(1, 100),
[[A,B,C,D,E] || A <- L, 
    BMax <- [100-A], B <- L, B =< BMax,
    CMax <- [BMax-B], C <- L, C =< CMax,
    DMax <- [CMax-C], D <- L, D =< DMax, 
    E <- [100-A-B-C-D], E > 0]

相关问题