commit eccb01636eebe27af002104bdf2d140f977babf1 · hauleth.dev/blog

.vale.ini

···

       3
       3
        
       [*.md]

     

       4
       4
        
       BasedOnStyles = proselint, write-good

     

       5
       5
        
       write-good.Passive = NO

     

       6
       6
       +
       write-good.TooWordy = NO

+475

content/post/beam-process-memory-usage.md

···

       1
       1
       +
       +++

     

       2
       2
       +
       date = 2023-06-10

     

       3
       3
       +
       title = "How much memory is needed to run 1M Erlang processes?"

     

       4
       4
       +
       description = "How to not write benchmarks"

     

       5
       5
       +
       

     

       6
       6
       +
       [taxonomies]

     

       7
       7
       +
       tags = [

     

       8
       8
       +
         "beam",

     

       9
       9
       +
         "elixir",

     

       10
       10
       +
         "erlang",

     

       11
       11
       +
         "benchmarks",

     

       12
       12
       +
         "programming"

     

       13
       13
       +
       ]

     

       14
       14
       +
       +++

     

       15
       15
       +
       

     

       16
       16
       +
       Recently [benchmark for concurrency implementation in different

     

       17
       17
       +
       languages][benchmark]. In this article [Piotr Kołaczkowski][] used Chat GPT to

     

       18
       18
       +
       generate the examples in the different languages and benchmarked them. This was

     

       19
       19
       +
       poor choice as I have found this article and read the Elixir example:

     

       20
       20
       +
       

     

       21
       21
       +
       [benchmark]: https://pkolaczk.github.io/memory-consumption-of-async/ "How Much Memory Do You Need to Run 1 Million Concurrent Tasks?"

     

       22
       22
       +
       [Piotr Kołaczkowski]: https://github.com/pkolaczk

     

       23
       23
       +
       

     

       24
       24
       +
       ```elixir

     

       25
       25
       +
       tasks =

     

       26
       26
       +
           for _ <- 1..num_tasks do

     

       27
       27
       +
               Task.async(fn ->

     

       28
       28
       +
                   :timer.sleep(10000)

     

       29
       29
       +
               end)

     

       30
       30
       +
           end

     

       31
       31
       +
       

     

       32
       32
       +
       Task.await_many(tasks, :infinity)

     

       33
       33
       +
       ```

     

       34
       34
       +
       

     

       35
       35
       +
       And, well, it's pretty poor example of BEAM's process memory usage, and I am

     

       36
       36
       +
       not talking about the fact that it uses 4 spaces for indentation.

     

       37
       37
       +
       

     

       38
       38
       +
       For 1 million processes this code reported 3.94 GiB of memory used by the process

     

       39
       39
       +
       in Piotr's benchmark, but with little work I managed to reduce it about 4 times

     

       40
       40
       +
       to around 0.93 GiB of RAM usage. In this article I will describe:

     

       41
       41
       +
       

     

       42
       42
       +
       - how I did that

     

       43
       43
       +
       - why the original code was consuming so much memory

     

       44
       44
       +
       - why in the real world you probably should not optimise like I did here

     

       45
       45
       +
       - why using ChatGPT to write benchmarking code sucks (TL;DR because that will

     

       46
       46
       +
         nerd snipe people like me)

     

       47
       47
       +
       

     

       48
       48
       +
       ## What are Erlang processes?

     

       49
       49
       +
       

     

       50
       50
       +
       Erlang is ~~well~~ known of being language which support for concurrency is

     

       51
       51
       +
       superb, and Erlang processes are the main reason for that. But what are these?

     

       52
       52
       +
       

     

       53
       53
       +
       In Erlang *process* is the common name for what other languages call *virtual

     

       54
       54
       +
       threads* or *green threads*, but in Erlang these have small neat twist - each of

     

       55
       55
       +
       the process is isolated from the rest and these processes can communicate only

     

       56
       56
       +
       via message passing. That gives Erlang processes 2 features that are rarely

     

       57
       57
       +
       spotted in other implementations:

     

       58
       58
       +
       

     

       59
       59
       +
       - Failure isolation - bug, unhandled case, or other issue in single process will

     

       60
       60
       +
         not directly affect any other process in the system. VM can send some messages

     

       61
       61
       +
         due to process shutdown, and other processes may be killed because of that,

     

       62
       62
       +
         but by itself shutting down single process will not cause problems in any

     

       63
       63
       +
         process not related to that.

     

       64
       64
       +
       - Location transparency - process can be spawned locally or on different

     

       65
       65
       +
         machine, but from the viewpoint of the programmer, there is no difference.

     

       66
       66
       +
       

     

       67
       67
       +
       The above features and requirements results in some design choices, but for our

     

       68
       68
       +
       purpose only one is truly needed today - each process have separate and (almost)

     

       69
       69
       +
       independent memory stack from any other process.

     

       70
       70
       +
       

     

       71
       71
       +
       ### Process dictionary

     

       72
       72
       +
       

     

       73
       73
       +
       Each process in Erlang VM has dedicated *mutable* memory space for their

     

       74
       74
       +
       internal uses. Most people do not use it for anything because in general it

     

       75
       75
       +
       should not be used unless you know exactly what you are doing (in my case, a bad

     

       76
       76
       +
       carpenter could count cases when I needed it, on single hand). In general it's

     

       77
       77
       +
       *here be dragons* area.

     

       78
       78
       +
       

     

       79
       79
       +
       How it's relevant to us?

     

       80
       80
       +
       

     

       81
       81
       +
       Well, OTP internally uses process dictionary (`pdict` for short) to store

     

       82
       82
       +
       metadata about given process that can be later used for debugging purposes. Some

     

       83
       83
       +
       data that it store are:

     

       84
       84
       +
       

     

       85
       85
       +
       - Initial function that was run by the given process

     

       86
       86
       +
       - PIDs to all ancestors of the given process

     

       87
       87
       +
       

     

       88
       88
       +
       Different processes abstractions (like `get_server`/`GenServer`, Elixir's

     

       89
       89
       +
       `Task`, etc.) can store even more metadata there, `logger` store process

     

       90
       90
       +
       metadata in process dictionary, `rand` store state of the PRNGs in the process

     

       91
       91
       +
       dictionary. it's used quite extensively by some OTP features.

     

       92
       92
       +
       

     

       93
       93
       +
       ### "Well behaved" OTP process

     

       94
       94
       +
       

     

       95
       95
       +
       In addition to the above metadata if the process is meant to be "well behaved"

     

       96
       96
       +
       process in OTP system, i.e. process that can be observed and debugged using OTP

     

       97
       97
       +
       facilities, it must respond to some additional messages defined by [`sys`][]

     

       98
       98
       +
       module. Without that the features like [`observer`][] would not be able to "see"

     

       99
       99
       +
       the content of the process state.

     

       100
       100
       +
       

     

       101
       101
       +
       [`sys`]: https://erlang.org/doc/man/sys.html

     

       102
       102
       +
       [`observer`]: https://erlang.org/doc/man/observer.html

     

       103
       103
       +
       

     

       104
       104
       +
       ## Process memory usage

     

       105
       105
       +
       

     

       106
       106
       +
       As we have seen above, the `Task.async/1` function form Elixir **must** do

     

       107
       107
       +
       much more than just simple "start process and live with it". That was one of the

     

       108
       108
       +
       most important problems with the original process, it was using system, that was

     

       109
       109
       +
       allocating quite substantial memory alongside of the process itself, just to

     

       110
       110
       +
       operate this process. In general, that would be desirable approach (as you

     

       111
       111
       +
       **really, really, want the debugging facilities**), but in synthetic benchmarks,

     

       112
       112
       +
       it reduce the feasibility of such benchmark.

     

       113
       113
       +
       

     

       114
       114
       +
       If we want to avoid that additional memory overhead in our spawned processes we

     

       115
       115
       +
       need to go back to more primitive functions in Erlang, namely `erlang:spawn/1`

     

       116
       116
       +
       (`Kernel.spawn/1` in Elixir). But that mean that we cannot use

     

       117
       117
       +
       `Task.await_many/2` anymore, so we need to workaround it by using custom

     

       118
       118
       +
       function:

     

       119
       119
       +
       

     

       120
       120
       +
       ```elixir

     

       121
       121
       +
       defmodule Bench do

     

       122
       122
       +
         def await(pid) when is_pid(pid) do

     

       123
       123
       +
           # Monitor is internal feature of Erlang that will inform you (by sending

     

       124
       124
       +
           # message) when process you monitor die. The returned value is type called

     

       125
       125
       +
           # "reference" which is just simply unique value returned by the VM.

     

       126
       126
       +
           # If the process is already dead, then message will be delivered

     

       127
       127
       +
           # immediately.

     

       128
       128
       +
           ref = Process.monitor(pid)

     

       129
       129
       +
       

     

       130
       130
       +
           receive do

     

       131
       131
       +
             {:DOWN, ^ref, :process, _, _} -> :ok

     

       132
       132
       +
           end

     

       133
       133
       +
         end

     

       134
       134
       +
       

     

       135
       135
       +
         def await_many(pids) do

     

       136
       136
       +
           Enum.each(pids, &await/1)

     

       137
       137
       +
         end

     

       138
       138
       +
       end

     

       139
       139
       +
       

     

       140
       140
       +
       tasks =

     

       141
       141
       +
         for _ <- 1..num_tasks do

     

       142
       142
       +
           # `Kernel` module is imported by default, so no need for `Kernel.` prefix

     

       143
       143
       +
           spawn(fn ->

     

       144
       144
       +
             :timer.sleep(10000)

     

       145
       145
       +
           end)

     

       146
       146
       +
         end

     

       147
       147
       +
       

     

       148
       148
       +
       Bench.await_many(tasks)

     

       149
       149
       +
       ```

     

       150
       150
       +
       

     

       151
       151
       +
       We already removed one problem (well, two in fact, but we will go into

     

       152
       152
       +
       details in next section).

     

       153
       153
       +
       

     

       154
       154
       +
       ## All your lists belongs to us now

     

       155
       155
       +
       

     

       156
       156
       +
       Erlang, like most of the functional programming languages, have 2 built-in

     

       157
       157
       +
       sequence types:

     

       158
       158
       +
       

     

       159
       159
       +
       - Tuples - which are non-growable product type of the values, so you can access

     

       160
       160
       +
         any field quite fast, but adding more values is performance no-no

     

       161
       161
       +
       - (Singly) linked lists - growable type (in most case it will have single type

     

       162
       162
       +
         values in it, but in Erlang that is not always the case), which is fast to

     

       163
       163
       +
         prepend or pop data from the beginning, but do not try to do anything else if

     

       164
       164
       +
         you care about performance.

     

       165
       165
       +
       

     

       166
       166
       +
       In this case we will focus on the 2nd one, as there tuples aren't important at

     

       167
       167
       +
       all.

     

       168
       168
       +
       

     

       169
       169
       +
       Singly linked list is simple data structure. It's either special value `[]`

     

       170
       170
       +
       (an empty list) or it's something called "cons-cell". Cons-cells are also

     

       171
       171
       +
       simple structures - it's 2ary tuple (tuple with 2 elements) where first value

     

       172
       172
       +
       is head - the value in the list cell, and another one is the "tail" of the list (aka

     

       173
       173
       +
       rest of the list). In Elixir the cons-cell is denoted like that `[head | tail]`.

     

       174
       174
       +
       Super simple structure as you can see, and perfect for the functional

     

       175
       175
       +
       programming as you can add new values to the list without modifying existing

     

       176
       176
       +
       values, so you can be immutable and fast. However if you need to construct the

     

       177
       177
       +
       sequence of a lot of values (like our list of all tasks) then we have problem.

     

       178
       178
       +
       Because Elixir promises that list returned from the `for` will be **in-order**

     

       179
       179
       +
       of the values passed to it. That mean that we either need to process our data

     

       180
       180
       +
       like that:

     

       181
       181
       +
       

     

       182
       182
       +
       ```elixir

     

       183
       183
       +
       def map([], _), do: []

     

       184
       184
       +
       

     

       185
       185
       +
       def map([head | tail], func) do

     

       186
       186
       +
         [func.(head) | map(tail, func)]

     

       187
       187
       +
       end

     

       188
       188
       +
       ```

     

       189
       189
       +
       

     

       190
       190
       +
       Where we build call stack (as we cannot have tail call optimisation there, of

     

       191
       191
       +
       course sans compiler optimisations). Or we need to build our list in reverse

     

       192
       192
       +
       order, and then reverse it before returning (so we can have TCO):

     

       193
       193
       +
       

     

       194
       194
       +
       ```elixir

     

       195
       195
       +
       def map(list, func), do: do_map(list, func, [])

     

       196
       196
       +
       

     

       197
       197
       +
       def map([], _func, agg), do: :lists.reverse(agg)

     

       198
       198
       +
       

     

       199
       199
       +
       def map([head | tail], func, agg) do

     

       200
       200
       +
         map(tail, func, [func.(head) | agg])

     

       201
       201
       +
       end

     

       202
       202
       +
       ```

     

       203
       203
       +
       

     

       204
       204
       +
       Which one of these approaches is more performant is irrelevant[^erlang-perf],

     

       205
       205
       +
       what is relevant is that we need either build call stack or construct our list

     

       206
       206
       +
       *twice* to be able to conform to the Elixir promises (even if in this case we do

     

       207
       207
       +
       not care about order of the list returned by the `for`).

     

       208
       208
       +
       

     

       209
       209
       +
       [^erlang-perf]: Sometimes body recursion will be faster, sometimes TCO will be

     

       210
       210
       +
       faster. it's impossible to tell without more benchmarking. For more info check

     

       211
       211
       +
       out [superb article by Ferd Herbert](https://ferd.ca/erlang-s-tail-recursion-is-not-a-silver-bullet.html).

     

       212
       212
       +
       

     

       213
       213
       +
       Of course we could mitigate our problem by using `Enum.reduce/3` function (or

     

       214
       214
       +
       writing it on our own) and end with code like:

     

       215
       215
       +
       

     

       216
       216
       +
       ```elixir

     

       217
       217
       +
       defmodule Bench do

     

       218
       218
       +
         def await(pid) when is_pid(pid) do

     

       219
       219
       +
           ref = Process.monitor(pid)

     

       220
       220
       +
       

     

       221
       221
       +
           receive do

     

       222
       222
       +
             {:DOWN, ^ref, :process, _, _} -> :ok

     

       223
       223
       +
           end

     

       224
       224
       +
         end

     

       225
       225
       +
       

     

       226
       226
       +
         def await_many(pids) do

     

       227
       227
       +
           Enum.each(pids, &await/1)

     

       228
       228
       +
         end

     

       229
       229
       +
       end

     

       230
       230
       +
       

     

       231
       231
       +
       tasks =

     

       232
       232
       +
         Enum.reduce(1..num_tasks, [], fn _, agg ->

     

       233
       233
       +
           # `Kernel` module is imported by default, so no need for `Kernel.` prefix

     

       234
       234
       +
           pid =

     

       235
       235
       +
             spawn(fn -> :timer.sleep(10000) end)

     

       236
       236
       +
       

     

       237
       237
       +
           [pid | agg]

     

       238
       238
       +
         end)

     

       239
       239
       +
       

     

       240
       240
       +
       Bench.await_many(tasks)

     

       241
       241
       +
       ```

     

       242
       242
       +
       

     

       243
       243
       +
       Even then we build list of all PIDs.

     

       244
       244
       +
       

     

       245
       245
       +
       Here I can also go back to the "second problem* I have mentioned above.

     

       246
       246
       +
       `Task.await_many/1` *also construct a list*. it's list of return value from all

     

       247
       247
       +
       the processes in the list, so not only we constructed list for the tasks' PIDs,

     

       248
       248
       +
       we also constructed list of return values (which will be `:ok` for all processes

     

       249
       249
       +
       as it's what `:timer.sleep/1` returns), and immediately discarded all of that.

     

       250
       250
       +
       

     

       251
       251
       +
       How we can better? See that **all** we care is that all `num_task` processes

     

       252
       252
       +
       have gone down. We do not care about any of the return values, all what we want

     

       253
       253
       +
       is to know that all processes that we started went down. For that we can just

     

       254
       254
       +
       send messages from the spawned processes and count the received messages count:

     

       255
       255
       +
       

     

       256
       256
       +
       ```elixir

     

       257
       257
       +
       defmodule Bench do

     

       258
       258
       +
         def worker(parent) do

     

       259
       259
       +
           :timer.sleep(10000)

     

       260
       260
       +
           send(parent, :done)

     

       261
       261
       +
         end

     

       262
       262
       +
       

     

       263
       263
       +
         def start(0), do: :ok

     

       264
       264
       +
         def start(n) when n > 0 do

     

       265
       265
       +
           this = self()

     

       266
       266
       +
           spawn(fn -> worker(this) end)

     

       267
       267
       +
       

     

       268
       268
       +
           start(n - 1)

     

       269
       269
       +
         end

     

       270
       270
       +
       

     

       271
       271
       +
         def await(0), do: :ok

     

       272
       272
       +
         def await(n) when n > 0 do

     

       273
       273
       +
           receive do

     

       274
       274
       +
             :done -> await(n - 1)

     

       275
       275
       +
           end

     

       276
       276
       +
         end

     

       277
       277
       +
       end

     

       278
       278
       +
       

     

       279
       279
       +
       Bench.start(num_tasks)

     

       280
       280
       +
       Bench.await(num_tasks)

     

       281
       281
       +
       ```

     

       282
       282
       +
       

     

       283
       283
       +
       Now we do not have any lists involved and we still do what the original task

     

       284
       284
       +
       meant to do - spawn `num_tasks` processes and wait till all go down.

     

       285
       285
       +
       

     

       286
       286
       +
       ## Arguments copying

     

       287
       287
       +
       

     

       288
       288
       +
       One another thing that we can account there - lambda context and data passing

     

       289
       289
       +
       between processes.

     

       290
       290
       +
       

     

       291
       291
       +
       You see, we need to pass `this` (which is PID of the parent) to our newly

     

       292
       292
       +
       spawned process. That is suboptimal, as we are looking for the way to reduce

     

       293
       293
       +
       amount of the memory (and ignore all other metrics at the same time). As Erlang

     

       294
       294
       +
       processes are meant to be "share nothing" type of processes there is problem -

     

       295
       295
       +
       we need to copy that PID to all processes. it's just 1 word (which mean 8 bytes

     

       296
       296
       +
       on 64-bit architectures, 4 bytes on 32-bit), but hey, we are microbenchmarking,

     

       297
       297
       +
       so we cut whatever we can (with 1M processes, this adds up to 8 MiBs).

     

       298
       298
       +
       

     

       299
       299
       +
       Hey, we can avoid that by using yet another feature of Erlang, called

     

       300
       300
       +
       *registry*. This is yet another simple feature that allows us to assign PID of

     

       301
       301
       +
       the process to the atom, which allows us then to send messages to that process

     

       302
       302
       +
       using just name, we have given. While atoms are also 1 word that wouldn't make

     

       303
       303
       +
       sense to send it as well, but instead we can do what any reasonable

     

       304
       304
       +
       microbenchmarker would do - *hardcode stuff*:

     

       305
       305
       +
       

     

       306
       306
       +
       ```elixir

     

       307
       307
       +
       defmodule Bench do

     

       308
       308
       +
         def worker do

     

       309
       309
       +
           :timer.sleep(10000)

     

       310
       310
       +
           send(:parent, :done)

     

       311
       311
       +
         end

     

       312
       312
       +
       

     

       313
       313
       +
         def start(0), do: :ok

     

       314
       314
       +
         def start(n) when n > 0 do

     

       315
       315
       +
           spawn(fn -> worker() end)

     

       316
       316
       +
       

     

       317
       317
       +
           start(n - 1)

     

       318
       318
       +
         end

     

       319
       319
       +
       

     

       320
       320
       +
         def await(0), do: :ok

     

       321
       321
       +
         def await(n) when n > 0 do

     

       322
       322
       +
           receive do

     

       323
       323
       +
             :done -> await(n - 1)

     

       324
       324
       +
           end

     

       325
       325
       +
         end

     

       326
       326
       +
       end

     

       327
       327
       +
       

     

       328
       328
       +
       Process.register(self(), :parent)

     

       329
       329
       +
       

     

       330
       330
       +
       Bench.start(num_tasks)

     

       331
       331
       +
       Bench.await(num_tasks)

     

       332
       332
       +
       ```

     

       333
       333
       +
       

     

       334
       334
       +
       Now we do not pass any arguments, and instead rely on the registry to dispatch

     

       335
       335
       +
       our messages to respective processes.

     

       336
       336
       +
       

     

       337
       337
       +
       ## One more thing

     

       338
       338
       +
       

     

       339
       339
       +
       As you may have already noticed we are passing lambda to the `spawn/1`. That is

     

       340
       340
       +
       also quite suboptimal, because of [difference between remote and local call][remote-vs-local].

     

       341
       341
       +
       This mean that we are paying slight memory cost for these processes to keep the

     

       342
       342
       +
       old module in memory. Instead we can use either fully qualified function capture

     

       343
       343
       +
       or `spawn/3` function that accepts MFA (module, function name, arguments list)

     

       344
       344
       +
       argument. We end with:

     

       345
       345
       +
       

     

       346
       346
       +
       [remote-vs-local]: https://www.erlang.org/doc/reference_manual/code_loading.html#code-replacement

     

       347
       347
       +
       

     

       348
       348
       +
       ```elixir

     

       349
       349
       +
       defmodule Bench do

     

       350
       350
       +
         def worker do

     

       351
       351
       +
           :timer.sleep(10000)

     

       352
       352
       +
           send(:parent, :done)

     

       353
       353
       +
         end

     

       354
       354
       +
       

     

       355
       355
       +
         def start(0), do: :ok

     

       356
       356
       +
         def start(n) when n > 0 do

     

       357
       357
       +
           spawn(&__MODULE__.worker/0)

     

       358
       358
       +
       

     

       359
       359
       +
           start(n - 1)

     

       360
       360
       +
         end

     

       361
       361
       +
       

     

       362
       362
       +
         def await(0), do: :ok

     

       363
       363
       +
         def await(n) when n > 0 do

     

       364
       364
       +
           receive do

     

       365
       365
       +
             :done -> await(n - 1)

     

       366
       366
       +
           end

     

       367
       367
       +
         end

     

       368
       368
       +
       end

     

       369
       369
       +
       

     

       370
       370
       +
       Process.register(self(), :parent)

     

       371
       371
       +
       

     

       372
       372
       +
       Bench.start(num_tasks)

     

       373
       373
       +
       Bench.await(num_tasks)

     

       374
       374
       +
       ```

     

       375
       375
       +
       

     

       376
       376
       +
       ## Results

     

       377
       377
       +
       

     

       378
       378
       +
       With given Erlang compilation:

     

       379
       379
       +
       

     

       380
       380
       +
       ```txt

     

       381
       381
       +
       Erlang/OTP 25 [erts-13.2.2.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]

     

       382
       382
       +
       

     

       383
       383
       +
       Elixir 1.14.5 (compiled with Erlang/OTP 25)

     

       384
       384
       +
       ```

     

       385
       385
       +
       

     

       386
       386
       +
       > Note no JIT as Nix on macOS currently[^currently] disable it and I didn't bother to enable

     

       387
       387
       +
       > it in the derivation (it was disabled because there were some issues, but IIRC

     

       388
       388
       +
       > these are resolved now).

     

       389
       389
       +
       

     

       390
       390
       +
       [^currently]: Nixpkgs rev `bc3ec5ea`

     

       391
       391
       +
       

     

       392
       392
       +
       The results are as follow (in bytes of peak memory footprint returned by

     

       393
       393
       +
       `/usr/bin/time` on macOS):

     

       394
       394
       +
       

     

       395
       395
       +
       | Implementation |       1k |      100k |         1M |

     

       396
       396
       +
       | -------------- | -------: | --------: | ---------: |

     

       397
       397
       +
       | Original       | 45047808 | 452837376 | 4227715072 |

     

       398
       398
       +
       | Spawn          | 43728896 | 318230528 | 2869723136 |

     

       399
       399
       +
       | Reduce         | 43552768 | 314798080 | 2849304576 |

     

       400
       400
       +
       | Count          | 43732992 | 313507840 | 2780540928 |

     

       401
       401
       +
       | Registry       | 44453888 | 311988224 | 2787237888 |

     

       402
       402
       +
       | RemoteCall     | 43597824 | 310595584 | 2771525632 |

     

       403
       403
       +
       

     

       404
       404
       +
       As we can see we have reduced the memory use by about 30% by just changing

     

       405
       405
       +
       from `Task.async/1` to `spawn/1`. Further optimisations reduced memory usage

     

       406
       406
       +
       slightly, but with no such drastic changes.

     

       407
       407
       +
       

     

       408
       408
       +
       Can we do better?

     

       409
       409
       +
       

     

       410
       410
       +
       Well, with some VM flags tinkering - of course.

     

       411
       411
       +
       

     

       412
       412
       +
       You see, by default Erlang VM will not only create some data required for

     

       413
       413
       +
       handling process itself[^word]:

     

       414
       414
       +
       

     

       415
       415
       +
       [^word]: Again, word here mean 8 bytes on 64-bit and 4 bytes on 32-bit architectures.

     

       416
       416
       +
       

     

       417
       417
       +
       > | Data Type | Memory Size |

     

       418
       418
       +
       > | - | - |

     

       419
       419
       +
       > | … | … |

     

       420
       420
       +
       > | Erlang process | 338 words when spawned, including a heap of 233 words. |

     

       421
       421
       +
       >

     

       422
       422
       +
       > -- <https://erlang.org/doc/efficiency_guide/advanced.html#Advanced>

     

       423
       423
       +
       

     

       424
       424
       +
       As we can see, there are 105 words that are required and 233 words which are

     

       425
       425
       +
       used for preallocated heap. But this is microbenchmarking, so as we do not need

     

       426
       426
       +
       that much of memory (because our processes basically does nothing), we can

     

       427
       427
       +
       reduce it. We do not care about time performance anyway. For that we can use

     

       428
       428
       +
       `+hms` flag and set it to some small value, for example `1`.

     

       429
       429
       +
       

     

       430
       430
       +
       In addition to heap size Erlang by default load some additional data from the

     

       431
       431
       +
       BEAM files. That data is used for debugging and error reporting, but again, we

     

       432
       432
       +
       are microbenchmarking, and who need debugging support anyway (answer: everyone,

     

       433
       433
       +
       so **do not** do it in production). Luckily for us, the VM has yet another flag

     

       434
       434
       +
       for that purpose `+L`.

     

       435
       435
       +
       

     

       436
       436
       +
       Erlang also uses some [ETS][] (Erlang Term Storage) tables by default (for

     

       437
       437
       +
       example to support process registry we have mentioned above). ETS tables can be

     

       438
       438
       +
       compressed, but by default it's not done, as it can slow down some kinds of

     

       439
       439
       +
       operations on such tables. Fortunately there is, another, flag `+ec` that has

     

       440
       440
       +
       description:

     

       441
       441
       +
       

     

       442
       442
       +
       > Forces option compressed on all ETS tables. Only intended for test and

     

       443
       443
       +
       > evaluation.

     

       444
       444
       +
       

     

       445
       445
       +
       [ETS]: https://erlang.org/doc/man/ets.html

     

       446
       446
       +
       

     

       447
       447
       +
       Sounds good enough for me.

     

       448
       448
       +
       

     

       449
       449
       +
       With all these flags enabled we get peak memory footprint at 996257792 bytes.

     

       450
       450
       +
       

     

       451
       451
       +
       Compare it in more human readable units.

     

       452
       452
       +
       

     

       453
       453
       +
       |                          | Peak Memory Footprint for 1M processes |

     

       454
       454
       +
       | ------------------------ | -------------------------------------- |

     

       455
       455
       +
       | Original code            | 3.94 GiB                               |

     

       456
       456
       +
       | Improved code            | 2.58 GiB                               |

     

       457
       457
       +
       | Improved code with flags | 0.93 GiB                               |

     

       458
       458
       +
       

     

       459
       459
       +
       Result - about 76% of the peak memory usage reduction. Not bad.

     

       460
       460
       +
       

     

       461
       461
       +
       ## Summary

     

       462
       462
       +
       

     

       463
       463
       +
       First of all:

     

       464
       464
       +
       

     

       465
       465
       +
       > Please, do not use ChatGPT for writing code for microbenchmarks.

     

       466
       466
       +
       

     

       467
       467
       +
       The thing about *micro*benchmarking is that we write code that does as little as

     

       468
       468
       +
       possible to show (mostly) meaningless features of the given technology in

     

       469
       469
       +
       abstract environment. ChatGPT cannot do that, not out of malice or incompetence,

     

       470
       470
       +
       but because it used (mostly) *good* and idiomatic code to teach itself,

     

       471
       471
       +
       microbenchmarks rarely are something that people will consider to have these

     

       472
       472
       +
       qualities. It also cannot consider other features that [wetware][] can take into

     

       473
       473
       +
       account (like our "we do not need lists there" thing).

     

       474
       474
       +
       

     

       475
       475
       +
       [wetware]: https://en.wikipedia.org/wiki/Wetware_(brain)

+26 -9

flake.lock

···

       1
       1
        
       {

     

       2
       2
        
         "nodes": {

     

       3
       3
        
           "flake-utils": {

     

       4
       4
       +
             "inputs": {

     

       5
       5
       +
               "systems": "systems"

     

       6
       6
       +
             },

     

       4
       7
        
             "locked": {

     

       5
       5
       -
               "lastModified": 1656928814,

     

       6
       6
       -
               "narHash": "sha256-RIFfgBuKz6Hp89yRr7+NR5tzIAbn52h8vT6vXkYjZoM=",

     

       8
       8
       +
               "lastModified": 1681202837,

     

       9
       9
       +
               "narHash": "sha256-H+Rh19JDwRtpVPAWp64F+rlEtxUWBAQW28eAi3SRSzg=",

     

       7
       10
        
               "owner": "numtide",

     

       8
       11
        
               "repo": "flake-utils",

     

       9
       9
       -
               "rev": "7e2a3b3dfd9af950a856d66b0a7d01e3c18aa249",

     

       12
       12
       +
               "rev": "cfacdce06f30d2b68473a46042957675eebb3401",

     

       10
       13
        
               "type": "github"

     

       11
       14
        
             },

     

       12
       15
        
             "original": {

     
···

       17
       20
        
           },

     

       18
       21
        
           "nixpkgs": {

     

       19
       22
        
             "locked": {

     

       20
       20
       -
               "lastModified": 1658430343,

     

       21
       21
       -
               "narHash": "sha256-cZ7dw+dyHELMnnMQvCE9HTJ4liqwpsIt2VFbnC+GNNk=",

     

       22
       22
       -
               "owner": "NixOS",

     

       23
       23
       -
               "repo": "nixpkgs",

     

       24
       24
       -
               "rev": "e2b34f0f11ed8ad83d9ec9c14260192c3bcccb0d",

     

       25
       25
       -
               "type": "github"

     

       23
       23
       +
               "lastModified": 1684120848,

     

       24
       24
       +
               "narHash": "sha256-gIwJ5ac1FwZEkCRwjY+gLwgD4G1Bw3Xtr2jr2XihMPo=",

     

       25
       25
       +
               "path": "/nix/store/a33m7cv6m6rnmw2psqffc51ylk8n5820-source",

     

       26
       26
       +
               "rev": "0cb867999eec4085e1c9ca61c09b72261fa63bb4",

     

       27
       27
       +
               "type": "path"

     

       26
       28
        
             },

     

       27
       29
        
             "original": {

     

       28
       30
        
               "id": "nixpkgs",

     
···

       33
       35
        
             "inputs": {

     

       34
       36
        
               "flake-utils": "flake-utils",

     

       35
       37
        
               "nixpkgs": "nixpkgs"

     

       38
       38
       +
             }

     

       39
       39
       +
           },

     

       40
       40
       +
           "systems": {

     

       41
       41
       +
             "locked": {

     

       42
       42
       +
               "lastModified": 1681028828,

     

       43
       43
       +
               "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",

     

       44
       44
       +
               "owner": "nix-systems",

     

       45
       45
       +
               "repo": "default",

     

       46
       46
       +
               "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",

     

       47
       47
       +
               "type": "github"

     

       48
       48
       +
             },

     

       49
       49
       +
             "original": {

     

       50
       50
       +
               "owner": "nix-systems",

     

       51
       51
       +
               "repo": "default",

     

       52
       52
       +
               "type": "github"

     

       36
       53
        
             }

     

       37
       54
        
           }

     

       38
       55
        
         },

+2 -2

flake.nix

···

       7
       7
        
           flake-utils.lib.eachDefaultSystem (system:

     

       8
       8
        
             let

     

       9
       9
        
               pkgs = nixpkgs.legacyPackages.${system};

     

       10
       10
       -
               blog = pkgs.stdenv.mkDerivation {

     

       10
       10
       +
               blog = pkgs.stdenvNoCC.mkDerivation {

     

       11
       11
        
                 name = "hauleth-blog";

     

       12
       12
        
                 src = ./.;

     

       13
       13
        
       

     
···

       28
       28
        
                 };

     

       29
       29
        
               };

     

       30
       30
        
             in rec {

     

       31
       31
       -
               packages = flake-utils.lib.flattenTree {

     

       31
       31
       +
               packages = {

     

       32
       32
        
                 inherit blog;

     

       33
       33
        
               };

     

       34
       34
        
               defaultPackage = blog;

+1 -1

netlify.toml

···

       3
       3
        
         publish = "public/"

     

       4
       4
        
       

     

       5
       5
        
       [context.deploy-preview]

     

       6
       6
       -
         command = "zola build --drafts"

     

       6
       6
       +
         command = "zola build --drafts --base-url $DEPLOY_PRIME_URL"

     

       7
       7
        
       

     

       8
       8
        
       [[headers]]

     

       9
       9
        
         for = "/*"

+15 -28

sass/_main.scss

···

       78
       78
        
       

     

       79
       79
        
       a {

     

       80
       80
        
         color: inherit;

     

       81
       81
       +
       

     

       82
       82
       +
         &:hover {

     

       83
       83
       +
           color: var(--accent);

     

       84
       84
       +
         };

     

       81
       85
        
       }

     

       82
       86
        
       

     

       83
       87
        
       img {

     
···

       149
       153
        
           padding-right: 0;

     

       150
       154
        
         }

     

       151
       155
        
       

     

       152
       152
       -
         &:before {

     

       153
       153
       -
           content: '”';

     

       154
       154
       -
           font-family: Georgia, serif;

     

       155
       155
       -
           font-size: 3.875rem;

     

       156
       156
       -
           position: absolute;

     

       157
       157
       -
           left: -40px;

     

       158
       158
       -
           top: -20px;

     

       159
       159
       -
         }

     

       156
       156
       +
         > :first-child {

     

       157
       157
       +
           margin-top: 0;

     

       158
       158
       +
           position: relative;

     

       160
       159
        
       

     

       161
       161
       -
         p:first-of-type {

     

       162
       162
       -
           margin-top: 0;

     

       160
       160
       +
           &:before {

     

       161
       161
       +
             content: '>';

     

       162
       162
       +
             display: block;

     

       163
       163
       +
             position: absolute;

     

       164
       164
       +
             left: -25px;

     

       165
       165
       +
             color: var(--accent);

     

       166
       166
       +
           }

     

       163
       167
        
         }

     

       164
       168
        
       

     

       165
       165
       -
         p:last-of-type {

     

       169
       169
       +
         > :last-child {

     

       166
       170
        
           margin-bottom: 0;

     

       167
       167
       -
         }

     

       168
       168
       -
       

     

       169
       169
       -
         p {

     

       170
       170
       -
           position: relative;

     

       171
       171
       -
         }

     

       172
       172
       -
       

     

       173
       173
       -
         p:before {

     

       174
       174
       -
           content: '>';

     

       175
       175
       -
           display: block;

     

       176
       176
       -
           position: absolute;

     

       177
       177
       -
           left: -25px;

     

       178
       178
       -
           color: var(--accent);

     

       179
       171
        
         }

     

       180
       172
        
       }

     

       181
       173
        
       

     
···

       263
       255
        
           }

     

       264
       256
        
         }

     

       265
       257
        
       }

     

       266
       266
       -
       

     

       267
       267
       -
       .halmos {

     

       268
       268
       -
         text-align: right;

     

       269
       269
       -
         font-size: 1.5em;

     

       270
       270
       -
       }

+31

sass/_post.scss

···

       62
       62
        
       

     

       63
       63
        
         &-content {

     

       64
       64
        
           margin-top: 30px;

     

       65
       65
       +
           position: relative;

     

       65
       66
        
         }

     

       66
       67
        
       

     

       67
       68
        
         &-cover {

     
···

       162
       163
        
       .webmentions .url-only {

     

       163
       164
        
         line-break: anywhere;

     

       164
       165
        
       }

     

       166
       166
       +
       

     

       167
       167
       +
       .halmos {

     

       168
       168
       +
         text-align: right;

     

       169
       169
       +
         font-size: 1.5em;

     

       170
       170
       +
       }

     

       171
       171
       +
       

     

       172
       172
       +
       .footnote-definition {

     

       173
       173
       +
         @media (min-width: #{$tablet-max-width + 1px}) {

     

       174
       174
       +
           position: absolute;

     

       175
       175
       +
           left: 105%;

     

       176
       176
       +
       

     

       177
       177
       +
           width: 10vw;

     

       178
       178
       +
       

     

       179
       179
       +
           margin-top: -7rem;

     

       180
       180
       +
         }

     

       181
       181
       +
       

     

       182
       182
       +
         margin-top: 1rem;

     

       183
       183
       +
       

     

       184
       184
       +
         font-size: .8em;

     

       185
       185
       +
       

     

       186
       186
       +
         p {

     

       187
       187
       +
           padding-left: .5rem;

     

       188
       188
       +
           display: inline;

     

       189
       189
       +
         }

     

       190
       190
       +
       

     

       191
       191
       +
         // For some reason `:last-of-type` doesn't work

     

       192
       192
       +
         &:has(+ .halmos) {

     

       193
       193
       +
           margin-bottom: -.5rem;

     

       194
       194
       +
         }

     

       195
       195
       +
       }

+1 -1

sass/_variables.scss

···

       1
       1
        
       $phone-max-width: 683px;

     

       2
       2
       -
       $tablet-max-width: 899px;

     

       2
       2
       +
       $tablet-max-width: 1199px;

+10 -4

templates/macros/posts.html

···

       19
       19
        
                     [Updated: <time class="dt-updated" datetime="{{ page.updated }}">{{ page.updated | date(format="%Y.%m.%d") }}</time>]

     

       20
       20
        
                   {%- endif -%}

     

       21
       21
        
               </span>

     

       22
       22
       +
               ::

     

       23
       23
       +
               <time datetime="P{{ page.reading_time }}M">{{ page.reading_time }} min</time>

     

       22
       24
        
       

     

       23
       25
        
               {{ posts::taxonomies(taxonomy=page.taxonomies,

     

       24
       26
        
                  disp_cat=config.extra.show_categories,

     
···

       54
       56
        
       {% endmacro tags %}

     

       55
       57
        
       

     

       56
       58
        
       {% macro thanks(who) %}

     

       57
       57
       -
         {%- if who.why -%}

     

       58
       58
       -
           {{ who.name }} - {{ who.why }}

     

       59
       59
       +
         {%- if who is object -%}

     

       60
       60
       +
           {%- if who.url -%}

     

       61
       61
       +
             <a class="u-url p-name" href="{{ who.url }}">{{ who.name }}</a>

     

       62
       62
       +
           {%- else -%}

     

       63
       63
       +
             <span class="p-name">{{ who.name }}</span>

     

       64
       64
       +
           {%- endif -%}

     

       65
       65
       +
           {%- if who.why %} for {{ who.why }}{%- endif -%}

     

       59
       66
        
         {%- else -%}

     

       60
       60
       -
           {{ who.name }}

     

       67
       67
       +
           <span class="p-name">{{ who }}</span>

     

       61
       68
        
         {%- endif -%}

     

       62
       69
        
       {% endmacro %}

     

       63
       63
       -

+2 -12

templates/page.html

···

       34
       34
        
           {%- if page.extra.thanks -%}

     

       35
       35
        
           <hr />

     

       36
       36
        
           <p>

     

       37
       37
       -
             <b>Special thanks</b>:

     

       37
       37
       +
             <b>Special thanks to</b>:

     

       38
       38
        
             <ul>

     

       39
       39
        
               {%- for person in page.extra.thanks -%}

     

       40
       40
       -
               <li class="h-card">

     

       41
       41
       -
                 {%- if person is object -%}

     

       42
       42
       -
                   {%- if person.url -%}

     

       43
       43
       -
                     <a class="u-url p-name" href="{{ person.url }}">{{ posts::thanks(who=person) }}</a>

     

       44
       44
       -
                   {%- else -%}

     

       45
       45
       -
                     <span class="p-name">{{ posts::thanks(who=person) }}</span>

     

       46
       46
       -
                   {%- endif -%}

     

       47
       47
       -
                 {%- else -%}

     

       48
       48
       -
                   <span class="p-name">{{ person }}</span>

     

       49
       49
       -
                 {%- endif -%}

     

       50
       50
       -
               </li>

     

       40
       40
       +
               <li class="h-card">{{ posts::thanks(who=person) }}</li>

     

       51
       41
        
               {%- endfor -%}

     

       52
       42
        
             </ul>

     

       53
       43
        
           </p>