erlang Elixir进程,无共享堆内存

93ze6v8z  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(228)

**Elixir进程有自己的堆。**如果一个进程想与另一个进程共享数据结构,这怎么可能呢?我想到的一个答案是,该进程向另一个进程发送包含数据结构的消息。这是否意味着整个数据结构从一个堆复制到另一个堆?如果这是真的,这不是效率低下吗?

ffvjumwh

ffvjumwh1#

TL;DR:

Yes, it is inefficient. But you'll almost never notice this in practice. Welcome to the world of enormously safer programming. Most of the stuff you'll probably use an Erlang-based language for will be network related, and the network is by far the greater constraint (and sometimes disk or port IO).
Also, the alternative is a freaking nightmare. If you do massively concurrent programming, anyway.

Discussion

There are two very different contexts to consider when contemplating "efficiency":

  • Is it efficient for the machine to perform the task in terms of time, space, and locked resources? Are there obvious shortcuts that do not introduce leaky abstractions?
  • Is it efficient for humans to write, understand and maintain?

When you consider these two aspects of efficiency you must eventually bring the question down to time and money -- because that's where things are going to actually matter in terms of usefully employing the tool.

The Human Context

This efficiency argument is very similar to the argument that "Python is way less efficient than assembler". I used to argue the same thing -- until I took charge of several large development efforts. I still think JavaScript, XML and a few other demonstrably bad languages and data representations are the devil, but in the general case (defined as "cases where you don't have precise knowledge and control over your interrupt timing as it relates to bus reads/write and CPU cycles") the greater the basic abstraction provided by the language (and the smaller that language), the better.
Erlang wins by every measure in the context of modern, massively concurrent systems, crushing even most other EVM languages in terms of simplicity and syntactic limitation (except for LFE -- Richard got that right, imo).
Consider the syntactic complexity of Elixir, for example. It is not a bad language by any means (quite the contrary). But while it is easier for many newcomers in terms of familiarity it is several times more complex in real terms, and that stings a lot longer than any initial learning curve. "Easy" is not at all the same thing as "simple" ; "ease" being an issue of familiarity, not utility value.

The Machine Context

Whether or not a paradigm is efficient in execution depends almost entirely on the context of reference passing ("by pointer") VS message passing ("by value") in the underlying implementation.
How large are the things that are being passed? Is a hybrid approach employed that does not break the abstraction of passing message by value?
In Erlang (and by extension Elixir and LFE) most messages being passed between processes are quite small. Really, really tiny, in fact. Large, immutable messages are nearly always Erlang binaries -- and these actually are passed by reference (more on that later).
Large messages are a bit more rare, but considering the way that copying is implemented, even this is not such a huge problem. To allow processes to crash on their own and allow each process to have its own garbage collection schedule (as opposed to the nightmare scenario of unpredictable "stop the world" garbage collection) every Erlang process has its own heap.
That is an overall optimization in two ways:

  • This allows each process to crash and not affect anything.
  • It also allows each process to be written in a way so that every assignment is, generally speaking, an immutable label declaration as opposed to a mutable assignment (as opposed to an insanely dangerous and/or insanely complex to manage and schedule shared data object declaration).

All of that is what enables segregated garbage collection per process, and this single difference makes Erlang feel like it has incremental garbage collection while actually implementing a boringly ordinary GC model underneath (just splitting it up per process).
But then there are a few places where we really do want to have some pass-by-reference at the expense of underlying complexity (and according difficulty in terms of cognitive overhead for the programmer).
"Large" binaries are the classic example case. Any binary larger than 64 bytes is a shared object by default, passed by reference (pointer) instead of passed by value (copying). They are still immutable, of course, and that is the only reason this is safe to do. The problem is that without using binary:copy/1,2 any reference to a sub-section of a larger binary becomes a reference to the whole binary, so you can wind up with a surprising amount of memory tied up in the global heap because of binary references to tiny fragments of larger overall binary objects in memory. This is problematic (there are occasionally situations where you'll have to carefully map out what is being referenced where to prevent memory leaks), but that's the price of implementing a performance hack like shared memory objects in the context of safe concurrency.

Conclusion (some unquantifiable anecdotally based guidance...)

就我个人而言,按值复制从来没有成为一个瓶颈。一次也没有。而且我写了很多Erlang程序。
您真实的的瓶颈几乎总是对外部资源(如磁盘/存储/网络)的共享访问(从概念上讲,这是一回事)。无论如何,为额外的核心或额外的VM/示例付费都比为程序员追踪应该使用binary:copy/1,2的情况付费要便宜得多--而且内存和CPU时间的速度只会越来越快,越来越便宜,因此,无论你认为今天的“性能打击”是什么,到明年,与让你的 * 昂贵的 * 程序员在未来跟踪你的代码中愚蠢的速度黑客的真实的成本相比,都将成为微不足道的抱怨。
(And如果你的程序员不是比你的计算资源贵得多 * 你为什么要雇佣这么糟糕的程序员?!?!ZOMG!*)
关于未来的注解...
未来只会越来越多地采用多核技术,而且在大多数情况下,更多的并行 * 和 * 更多的并发。现在AMD正在实现其将1000多个内核系统带到桌面的愿景,我预测下一个大的竞争将是总线速度、通道、缓存管理的大幅改进,以及内核内存大小的大幅增加。这是所有这些内核都将被使用的唯一途径。
唯一能够利用这一点的语言是Erlang这样的语言,它将按值传递消息作为主要方法,在这种情况下,卫生范例将变得更加重要,语言 * 简单性 * 将是使我们免于如此多的并行性和并发性所带来的复杂性爆炸的元素。
考虑一下向“微服务体系结构”甚至Docker的发展--人们在无意识中遇到并解决了许多Erlang最初设计要解决的问题,只是以一种特殊的方式。
在大规模多核、大规模并发环境中,考虑到优秀程序员的成本要比内核、主轴、存储和内存高得多,按值传递和每个进程拥有一个堆似乎是一种整体 * 优化 *。(顺便说一句,我认为将来用并发语言编写的更持久的软件将由更少的程序员编写,而猴子军队方法将继续产生本质上短暂的代码基。)

相关问题