commit 8a9049307fa909de11880b4952c95ba7a70c32e2 · hauleth.dev/blog

.vale.ini

···

       2
       2
        
       

     

       3
       3
        
       [*.md]

     

       4
       4
        
       BasedOnStyles = proselint, write-good

     

       5
       5
       +
       write-good.Passive = NO

+517

content/post/who-watches-watchmen-i.md

···

       1
       1
       +
       +++

     

       2
       2
       +
       title = "Who Watches Watchmen? - Part 1"

     

       3
       3
       +
       date = 2022-01-17T21:22:18+01:00

     

       4
       4
       +
       draft = true

     

       5
       5
       +
       

     

       6
       6
       +
       description = """

     

       7
       7
       +
       A lot of application use systems like Kubernetes for their deployment. In my

     

       8
       8
       +
       humble opinion it is often overkill as system ,that offers most of the stuff such

     

       9
       9
       +
       thing provide, is already present in your OS. In this article I will try to

     

       10
       10
       +
       present how to utilise the most popular system supervisor from Elixir

     

       11
       11
       +
       applications.

     

       12
       12
       +
       """

     

       13
       13
       +
       

     

       14
       14
       +
       [taxonomies]

     

       15
       15
       +
       tags = [

     

       16
       16
       +
         "elixir",

     

       17
       17
       +
         "programming",

     

       18
       18
       +
         "systemd",

     

       19
       19
       +
         "deployment"

     

       20
       20
       +
       ]

     

       21
       21
       +
       +++

     

       22
       22
       +
       

     

       23
       23
       +
       I gave talk about this topic on CODE Beam V Americas, but I wasn't really

     

       24
       24
       +
       satisfied with it. In this post I will try to describe what my presentation was

     

       25
       25
       +
       meant to be about.

     

       26
       26
       +
       

     

       27
       27
       +
       If you are wondering about the presentation, [the slides are on SpeakerDeck][slides].

     

       28
       28
       +
       

     

       29
       29
       +
       [slides]: https://speakerdeck.com/hauleth/who-supervises-supervisors

     

       30
       30
       +
       

     

       31
       31
       +
       ## Abstract

     

       32
       32
       +
       

     

       33
       33
       +
       Most of the operating systems are multi-process and multi-user operating

     

       34
       34
       +
       systems. This has a lot of positive aspects, like to be able to do more than one

     

       35
       35
       +
       thing at the time at our devices, but it introduces a lot of complexities that

     

       36
       36
       +
       in most cases are hidden from the users and developers. These things still need

     

       37
       37
       +
       to be handled in one or another way. The most basic problems are:

     

       38
       38
       +
       

     

       39
       39
       +
       - some processes need to be started before user can interact with the OS

     

       40
       40
       +
         in meaningful (for them) way (for example mounting filesystems, logging,

     

       41
       41
       +
         etc.)

     

       42
       42
       +
       - some processes require strict startup ordering, for example you may need

     

       43
       43
       +
         logging to be started before starting HTTP server

     

       44
       44
       +
       - system operator somehow need to know when the process is ready to do their

     

       45
       45
       +
         work, which is often some time after process start

     

       46
       46
       +
       - system operator should be able to check process state in case when debugging

     

       47
       47
       +
         is needed, most commonly via logs

     

       48
       48
       +
       - shutdown of the processes should be handled in a way, that will allow other

     

       49
       49
       +
         processes to be shut down cleanly (for example application that uses DB should

     

       50
       50
       +
         be down before DB itself)

     

       51
       51
       +
       

     

       52
       52
       +
       ## Why we need system supervisor?

     

       53
       53
       +
       

     

       54
       54
       +
       System supervisor is a process started early in the OS boot, that should handle

     

       55
       55
       +
       starting and managing all other processes that will be run on our system. It is

     

       56
       56
       +
       often the init process (first process started by the OS that is running with PID

     

       57
       57
       +
       1\) or it is first (and sometimes only) process started by the init process.

     

       58
       58
       +
       Popular examples of such supervisors (often integrated with init systems):

     

       59
       59
       +
       

     

       60
       60
       +
       - SysV which is "traditional" implementation that originates at UNIX System

     

       61
       61
       +
         V (hence the name)

     

       62
       62
       +
       - BSD init that with some variations is used in BSD-based OSes (NetBSD,

     

       63
       63
       +
         FreeBSD), it shares some similarities to SysV init and services description is

     

       64
       64
       +
         provided by shell scripts

     

       65
       65
       +
       - OpenRC that also uses shell-based scripts for service description, used by

     

       66
       66
       +
         Linux distributions like Gentoo or Alpine

     

       67
       67
       +
       - `launchd` that is used on Darwin (macOS, iPadOS, iOS, watchOS) systems that uses

     

       68
       68
       +
         XML-based `plists` for services description

     

       69
       69
       +
       - `runit` which is small init and supervisor, but quite capable, for example

     

       70
       70
       +
         used by Void Linux

     

       71
       71
       +
       - Upstart created by Canonical Ltd. as a replacement for SysV-like init system

     

       72
       72
       +
         in Ubuntu (no longer in use in Ubuntu), still used in some distributions like

     

       73
       73
       +
         ChromeOS or Synology NAS

     

       74
       74
       +
       - `systemd` (this is the name, not "SystemD") that was created by Red Hat

     

       75
       75
       +
         employee, (in)famous Lennart Poettering, and later was adopted by almost all

     

       76
       76
       +
         major Linux distributions which spawned some heated discussion about it

     

       77
       77
       +
       

     

       78
       78
       +
       In this article I will focus on systemd, and its approach to "new-style system

     

       79
       79
       +
       daemons".

     

       80
       80
       +
       

     

       81
       81
       +
       ---

     

       82
       82
       +
       

     

       83
       83
       +
       **DISCLAIMER**

     

       84
       84
       +
       

     

       85
       85
       +
       Each of the solutions mentioned above has its strong and weak points. I do not

     

       86
       86
       +
       want to start another flame war whether it is good or not. It has some good in

     

       87
       87
       +
       it, and it has some bad in it, but we can say that it "won" over the most used

     

       88
       88
       +
       distributions, and despite our love or hate towards it, we need to learn how to

     

       89
       89
       +
       live with that.

     

       90
       90
       +
       

     

       91
       91
       +
       ---

     

       92
       92
       +
       

     

       93
       93
       +
       ## Why `systemd`?

     

       94
       94
       +
       

     

       95
       95
       +
       `systemd` became a thing because SysV approach to ordering services' startup was

     

       96
       96
       +
       mildly irritating and non-parallelizable. In short, SysV is starting processes

     

       97
       97
       +
       exactly in lexicographical order of files in given directory. This meant, that

     

       98
       98
       +
       even if your service didn't need the DB at all, but it somehow ended further in

     

       99
       99
       +
       the directory listing, you ended in waiting for the DB startup. Additionally,

     

       100
       100
       +
       SysV wasn't really monitoring services, it just assumed that when process forked

     

       101
       101
       +
       itself to the background, then it is "done" with the startup, and we can

     

       102
       102
       +
       continue. This is obviously not true in many cases, for example, if your

     

       103
       103
       +
       previous shutdown wasn't clean because of power shortage or other issue, then

     

       104
       104
       +
       your DB probably need a bit of time to rebuild state from journal. This causes

     

       105
       105
       +
       even more slowdown for the processes further in the list. This is highly

     

       106
       106
       +
       undesired in modern, cloud-based, environment, where you can often start the

     

       107
       107
       +
       machines on-demand during autoscaling actions. When there is a spike in the

     

       108
       108
       +
       traffic that need autoscaling, then the sooner new machine is in usable state

     

       109
       109
       +
       the sooner it can take load from other machines.

     

       110
       110
       +
       

     

       111
       111
       +
       Different tools take different approach to solve that issue there. `systemd`

     

       112
       112
       +
       take approach that is derived from `launchd` - do not do stuff, that is not

     

       113
       113
       +
       needed. It achieved that by merging D-Bus into the `systemd` itself, and then

     

       114
       114
       +
       making all service to be D-Bus daemons (which are started on request), and

     

       115
       115
       +
       additionally it provides a bunch of triggers for that daemons. We can trigger on

     

       116
       116
       +
       action of other services (obviously), but also on stuff like socket activity,

     

       117
       117
       +
       path creation/modification, mounts, connection or disconnection of device,

     

       118
       118
       +
       time events, etc.

     

       119
       119
       +
       

     

       120
       120
       +
       ---

     

       121
       121
       +
       

     

       122
       122
       +
       **DIGRESSION**

     

       123
       123
       +
       

     

       124
       124
       +
       This is exactly the reason why `systemd` has its infamous "feature creep", it

     

       125
       125
       +
       doesn't "digest" all services like Cron or `udev`. It is not that these are

     

       126
       126
       +
       "tightly" intertwined into `systemd`. You can still replace them with their

     

       127
       127
       +
       older counterparts, you will just lose all the features these bring with them.

     

       128
       128
       +
       

     

       129
       129
       +
       ---

     

       130
       130
       +
       

     

       131
       131
       +
       Such lazy approach sometimes require changes into the service itself. For

     

       132
       132
       +
       example to let supervisor know, that you are ready (not just started), you need

     

       133
       133
       +
       some way to communicate with supervisor. In `systemd` you can do so via UNIX

     

       134
       134
       +
       socket pointed by `NOTIFY_SOCKET` environment variable passed to your

     

       135
       135
       +
       application. With the same socket you can implement another useful feature

     

       136
       136
       +
       \- watchdog/heartbeat process. This mean that if for any reason your process

     

       137
       137
       +
       became non-responsive (but it will refuse to die), then supervisor will

     

       138
       138
       +
       forcefully bring process down and restart it, assuming that the error was

     

       139
       139
       +
       accidental.

     

       140
       140
       +
       

     

       141
       141
       +
       About restarting, we can define behaviour of service after main process die. It

     

       142
       142
       +
       can be restarted regardless of the exit code, it can be restarted on abnormal

     

       143
       143
       +
       exit, it can remain shut, etc. Does this ring a bell? This works similarly to

     

       144
       144
       +
       OTP supervisors, but "one level above". If your service utilize system

     

       145
       145
       +
       supervisor right, you can make your application almost ultimately self-healing

     

       146
       146
       +
       (by restarts).

     

       147
       147
       +
       

     

       148
       148
       +
       ## Basic setup

     

       149
       149
       +
       

     

       150
       150
       +
       Now, when we know a little about how and why `systemd` works as it works, we

     

       151
       151
       +
       now can go to details on how to utilize that with services in Elixir.

     

       152
       152
       +
       

     

       153
       153
       +
       As a base we will implement super simple Plug application:

     

       154
       154
       +
       

     

       155
       155
       +
       ```elixir

     

       156
       156
       +
       # hello/application.ex

     

       157
       157
       +
       defmodule Hello.Application do

     

       158
       158
       +
         use Application

     

       159
       159
       +
       

     

       160
       160
       +
         def start(_type, _opts) do

     

       161
       161
       +
           children = [

     

       162
       162
       +
             {Plug.Cowboy, [scheme: :http, plug: Hello.Router] ++ cowboy_opts()},

     

       163
       163
       +
             {Plug.Cowboy.Drainer, refs: :all}

     

       164
       164
       +
           ]

     

       165
       165
       +
       

     

       166
       166
       +
           Supervisor.start_link(children, strategy: :one_for_one)

     

       167
       167
       +
         end

     

       168
       168
       +
       

     

       169
       169
       +
         defp cowboy_opts do

     

       170
       170
       +
           [

     

       171
       171
       +
             port: String.to_integer(System.get_env("PORT", "4000"))

     

       172
       172
       +
           ]

     

       173
       173
       +
         end

     

       174
       174
       +
       end

     

       175
       175
       +
       ```

     

       176
       176
       +
       

     

       177
       177
       +
       ```elixir

     

       178
       178
       +
       # hello/router.ex

     

       179
       179
       +
       defmodule Hello.Router do

     

       180
       180
       +
         use Plug.Router

     

       181
       181
       +
       

     

       182
       182
       +
         plug :match

     

       183
       183
       +
         plug :dispatch

     

       184
       184
       +
       

     

       185
       185
       +
         get "/" do

     

       186
       186
       +
           send_resp(conn, 200, "Hello World!")

     

       187
       187
       +
         end

     

       188
       188
       +
       end

     

       189
       189
       +
       ```

     

       190
       190
       +
       

     

       191
       191
       +
       I will also assume that we are using [Mix release][mix-release] named `hello`

     

       192
       192
       +
       that we later copy to `/opt/hello`.

     

       193
       193
       +
       

     

       194
       194
       +
       [mix-release]: https://hexdocs.pm/mix/Mix.Tasks.Release.html

     

       195
       195
       +
       

     

       196
       196
       +
       ### systemd unit file

     

       197
       197
       +
       

     

       198
       198
       +
       We have only one thing left, we need to define our [`hello.service`][systemd.service]:

     

       199
       199
       +
       

     

       200
       200
       +
       ```ini

     

       201
       201
       +
       [Unit]

     

       202
       202
       +
       Description=Hello World service

     

       203
       203
       +
       

     

       204
       204
       +
       [Service]

     

       205
       205
       +
       Environment=PORT=80

     

       206
       206
       +
       ExecStart=/opt/plug/bin/plug start

     

       207
       207
       +
       ```

     

       208
       208
       +
       

     

       209
       209
       +
       Now you can create file with that content in

     

       210
       210
       +
       `/usr/local/lib/systemd/system/hello.service` and then start it with:

     

       211
       211
       +
       

     

       212
       212
       +
       ```

     

       213
       213
       +
       # systemctl start hello.service

     

       214
       214
       +
       ```

     

       215
       215
       +
       

     

       216
       216
       +
       This is the simplest service imaginable, however from the start we have few

     

       217
       217
       +
       issues there:

     

       218
       218
       +
       

     

       219
       219
       +
       - It will run service as user running supervisor, so if it is run using global

     

       220
       220
       +
         supervisor, then it will run as `root`. You do not want to run anything as

     

       221
       221
       +
         `root`.

     

       222
       222
       +
       - On error it will produce (BEAM) core dump, which may contain sensitive data.

     

       223
       223
       +
       - It can read (and, due to being run as `root`, write) everything in the system,

     

       224
       224
       +
         like private data of other processes.

     

       225
       225
       +
       

     

       226
       226
       +
       [systemd.service]: https://www.freedesktop.org/software/systemd/man/systemd.service.html#

     

       227
       227
       +
       

     

       228
       228
       +
       ## Service readiness

     

       229
       229
       +
       

     

       230
       230
       +
       Erlang VM isn't really the best tool out there wrt the startup times. In

     

       231
       231
       +
       addition to that our application may need some preparation steps before it can

     

       232
       232
       +
       be marked as "ready". This is problem that I sometimes encounter in Docker,

     

       233
       233
       +
       where some containers do not really have any health check, and then I need to

     

       234
       234
       +
       have loop with check in some of the containers that depend on another one. This

     

       235
       235
       +
       "workaround" is frustrating, error prone, and can cause nasty Heisenbugs when

     

       236
       236
       +
       the timing will be wrong.

     

       237
       237
       +
       

     

       238
       238
       +
       Two possible solutions for this problem are:

     

       239
       239
       +
       

     

       240
       240
       +
       - Readiness probe - another program that is ran after the main process is

     

       241
       241
       +
         started, that checks whether our application is ready to work.

     

       242
       242
       +
       - Notification system where our application uses some common protocol to inform

     

       243
       243
       +
         the supervisor that it finished setup and is ready for work.

     

       244
       244
       +
       

     

       245
       245
       +
       systemd supports the second approach via [`sd_notify`][sd_notify]. The approach

     

       246
       246
       +
       there is simple - we have `NOTIFY_SOCKET` environment variable that contain path

     

       247
       247
       +
       to the Unix datagram socket, that we can use to send informations about state of

     

       248
       248
       +
       our application. This socket accept set of different messages, but right now,

     

       249
       249
       +
       for our purposes, we will focus only on few of them:

     

       250
       250
       +
       

     

       251
       251
       +
       - `READY=1` - marks our service as ready, aka it is ready to do its work (for

     

       252
       252
       +
         example accept incoming HTTP connections in our example). It need to be sent

     

       253
       253
       +
         withing given timespan after start of the VM, otherwise the process will be

     

       254
       254
       +
         killed and possibly restarted

     

       255
       255
       +
       - `STATUS=name` - sets status of our application that can be checked via

     

       256
       256
       +
         `systemctl status hello.service`, this allows us to have better insight into

     

       257
       257
       +
         what is the high level state without manually traversing through logs

     

       258
       258
       +
       - `RELOADING=1` - marks, that our application is reloading, which in general may

     

       259
       259
       +
         mean a lot of things, but there it will be used to mark `:init.restart/0`-like

     

       260
       260
       +
         behaviour (due to [erlang/otp#4698][] there is wrapper for that function in

     

       261
       261
       +
         `systemd` library). The process need then to send `READY=1` within given

     

       262
       262
       +
         timespan, or the process will be marked as a malfunctioning, and will be

     

       263
       263
       +
         forcefully killed and possibly restarted

     

       264
       264
       +
       - `STOPPING=1` - marks, that our application began shutting down process, and

     

       265
       265
       +
         will be closing soon. If the process will not close within given timespan, it

     

       266
       266
       +
         will be forcefully killed

     

       267
       267
       +
       

     

       268
       268
       +
       These messages provide us enough power to not only mark the service as ready,

     

       269
       269
       +
       but also provides additional information about system state, so even operator,

     

       270
       270
       +
       who knows a little about Erlang or our application runtime, will be able to

     

       271
       271
       +
       understand what is going on.

     

       272
       272
       +
       

     

       273
       273
       +
       The main thing is that systemd will wait with activation of the dependants of

     

       274
       274
       +
       our system as well as the `systemctl start` and `systemctl restart` commands

     

       275
       275
       +
       will wait until our service declare that it is ready.

     

       276
       276
       +
       

     

       277
       277
       +
       Usage of such feature is quite simple:

     

       278
       278
       +
       

     

       279
       279
       +
       ```ini

     

       280
       280
       +
       [Unit]

     

       281
       281
       +
       Description=Hello World service

     

       282
       282
       +
       

     

       283
       283
       +
       [Service]

     

       284
       284
       +
       # Define `Type=` to `notify`

     

       285
       285
       +
       Type=notify

     

       286
       286
       +
       Environment=PORT=80

     

       287
       287
       +
       ExecStart=/opt/plug/bin/plug start

     

       288
       288
       +
       WatchdogSec=1min

     

       289
       289
       +
       ```

     

       290
       290
       +
       

     

       291
       291
       +
       And then in our supervisor tree we need add `:systemd.ready()` **after** last

     

       292
       292
       +
       process needed for proper functioning of our application, in our simple example

     

       293
       293
       +
       it is after `Plug.Cowboy`:

     

       294
       294
       +
       

     

       295
       295
       +
       ```elixir

     

       296
       296
       +
       # hello/application.ex

     

       297
       297
       +
       defmodule Hello.Application do

     

       298
       298
       +
         use Application

     

       299
       299
       +
       

     

       300
       300
       +
         def start(_type, _opts) do

     

       301
       301
       +
           children = [

     

       302
       302
       +
             {Plug.Cowboy, [scheme: :http, plug: Hello.Router] ++ cowboy_opts()},

     

       303
       303
       +
             :systemd.ready(), # <-- it is function call, as it returns proper

     

       304
       304
       +
                               # `child_spec/0`

     

       305
       305
       +
             {Plug.Cowboy.Drainer, refs: :all}

     

       306
       306
       +
           ]

     

       307
       307
       +
       

     

       308
       308
       +
           Supervisor.start_link(children, strategy: :one_for_one)

     

       309
       309
       +
         end

     

       310
       310
       +
       

     

       311
       311
       +
         defp cowboy_opts do

     

       312
       312
       +
           [

     

       313
       313
       +
             port: String.to_integer(System.get_env("PORT", "4000"))

     

       314
       314
       +
           ]

     

       315
       315
       +
         end

     

       316
       316
       +
       end

     

       317
       317
       +
       ```

     

       318
       318
       +
       

     

       319
       319
       +
       Now restarting our service will not finish immediately, but will wait until our

     

       320
       320
       +
       service will declare that it is ready.

     

       321
       321
       +
       

     

       322
       322
       +
       ```shell

     

       323
       323
       +
       # systemctl restart hello.service

     

       324
       324
       +
       ```

     

       325
       325
       +
       

     

       326
       326
       +
       About `STOPPING=1` - the magic thing is that the `systemd` library takes care of

     

       327
       327
       +
       it for you. As soon as the system will be scheduled to shutdown this message

     

       328
       328
       +
       will be automatically sent, and the operator will be notified about this fact.

     

       329
       329
       +
       

     

       330
       330
       +
       We can also provide more information about state of our application. As you may

     

       331
       331
       +
       have already noticed, we have [`Plug.Cowboy.Drainer`][] there. It is process that

     

       332
       332
       +
       will delay shutdown of our application while there are still open connections.

     

       333
       333
       +
       This can take some time, so it would be handy if the operator would see that the

     

       334
       334
       +
       draining is in progress. We can easily achieve that by again changing our

     

       335
       335
       +
       supervision tree to:

     

       336
       336
       +
       

     

       337
       337
       +
       ```elixir

     

       338
       338
       +
       # hello/application.ex

     

       339
       339
       +
       defmodule Hello.Application do

     

       340
       340
       +
         use Application

     

       341
       341
       +
       

     

       342
       342
       +
         def start(_type, _opts) do

     

       343
       343
       +
           children = [

     

       344
       344
       +
             {Plug.Cowboy, [scheme: :http, plug: Hello.Router] ++ cowboy_opts()},

     

       345
       345
       +
             :systemd.ready(),

     

       346
       346
       +
             :systemd.set_status(down: [status: "drained"]),

     

       347
       347
       +
             {Plug.Cowboy.Drainer, refs: :all, shutdown: 10_000},

     

       348
       348
       +
             :systemd.set_status(down: [status: "draining"])

     

       349
       349
       +
           ]

     

       350
       350
       +
       

     

       351
       351
       +
           Supervisor.start_link(children, strategy: :one_for_one)

     

       352
       352
       +
         end

     

       353
       353
       +
       

     

       354
       354
       +
         defp cowboy_opts do

     

       355
       355
       +
           [

     

       356
       356
       +
             port: String.to_integer(System.get_env("PORT", "4000"))

     

       357
       357
       +
           ]

     

       358
       358
       +
         end

     

       359
       359
       +
       end

     

       360
       360
       +
       ```

     

       361
       361
       +
       

     

       362
       362
       +
       Now when we will shutdown our application by:

     

       363
       363
       +
       

     

       364
       364
       +
       ```shell

     

       365
       365
       +
       # systemctl stop hello.service

     

       366
       366
       +
       ```

     

       367
       367
       +
       

     

       368
       368
       +
       And we have some connections open to our service (you can simulate that with

     

       369
       369
       +
       `wrk`) then when we ran `systemctl status hello.service` in separate terminal

     

       370
       370
       +
       (previous will be blocked until our service shuts down) then you will be able to

     

       371
       371
       +
       see something like:

     

       372
       372
       +
       

     

       373
       373
       +
       ```

     

       374
       374
       +
       ● hello.service - Example Plug application

     

       375
       375
       +
            Loaded: loaded (/usr/local/lib/systemd/system/hello.service; static; vendor preset: enabled)

     

       376
       376
       +
                 Active: deactivating (stop-sigterm) since Sat 2022-01-15 17:46:30 CET;

     

       377
       377
       +
                 1s ago

     

       378
       378
       +
                 Main PID: 1327 (beam.smp)

     

       379
       379
       +
                 Status: "draining"

     

       380
       380
       +
                 Tasks: 19 (limit: 1136)

     

       381
       381
       +
                 Memory: 106.5M

     

       382
       382
       +
       ```

     

       383
       383
       +
       

     

       384
       384
       +
       You can notice that the `Status` is set to `"draining"`. As soon as all

     

       385
       385
       +
       connections will be drained it will change to `"drained"` and then the

     

       386
       386
       +
       application will shut down and service will be marked as `inactive`.

     

       387
       387
       +
       

     

       388
       388
       +
       [sd_notify]: https://www.freedesktop.org/software/systemd/man/sd_notify.html

     

       389
       389
       +
       [erlang/otp#4698]: https://github.com/erlang/otp/issues/4698

     

       390
       390
       +
       [`Plug.Cowboy.Drainer`]: https://hexdocs.pm/plug_cowboy/2.5.2/Plug.Cowboy.Drainer.html

     

       391
       391
       +
       

     

       392
       392
       +
       ## Watchdog

     

       393
       393
       +
       

     

       394
       394
       +
       Watchdog allows us to monitor our application for responsiveness (as mentioned

     

       395
       395
       +
       above). It is simple feature that requires our application to ping systemd

     

       396
       396
       +
       within specified interval, otherwise the application will be forcibly shut down

     

       397
       397
       +
       as malfunctioning. Fortunately for us, the `systemd` library that provides our

     

       398
       398
       +
       integration, have that feature out of the box, so all we need to do to achieve

     

       399
       399
       +
       expected result is set `WatchdogSec=` option in our `systemd.service` file:

     

       400
       400
       +
       

     

       401
       401
       +
       ```ini

     

       402
       402
       +
       [Unit]

     

       403
       403
       +
       Description=Hello World service

     

       404
       404
       +
       

     

       405
       405
       +
       [Service]

     

       406
       406
       +
       Environment=PORT=80

     

       407
       407
       +
       Type=notify

     

       408
       408
       +
       ExecStart=/opt/plug/bin/plug start

     

       409
       409
       +
       WatchdogSec=1min

     

       410
       410
       +
       ```

     

       411
       411
       +
       

     

       412
       412
       +
       This configuration says that if the VM will not send healthy message each 1

     

       413
       413
       +
       minute interval, then the service will be marked as malfunctioning. From the

     

       414
       414
       +
       application side we can manage state of the watchdog in several ways:

     

       415
       415
       +
       

     

       416
       416
       +
       - By setting `systemd.watchdog_check` configuration option we can configure the

     

       417
       417
       +
         function that will be called on each check, if that function return `true`

     

       418
       418
       +
         then it mean that application is healthy and the systemd should be notified

     

       419
       419
       +
         with ping, if it returns `false` or fail, then the check will be omitted.

     

       420
       420
       +
       - Manually sending trigger message in case of detected problems via

     

       421
       421
       +
         `:systemd.watchdog(trigger)`, it will immediately mark service as

     

       422
       422
       +
         malfunctioning and will trigger action defined in service unit file (by

     

       423
       423
       +
         default it will restart application)

     

       424
       424
       +
       - Disabling built in watchdog process via `:systemd.watchdog(:disable)` and then

     

       425
       425
       +
         manually sending `:systemd.watchdog(:ping)` within expected intervals

     

       426
       426
       +
         (discouraged)

     

       427
       427
       +
       

     

       428
       428
       +
       ## Security

     

       429
       429
       +
       

     

       430
       430
       +
       We should start with changing default user and group which is assigned to our

     

       431
       431
       +
       process. We can do so in 2 different ways:

     

       432
       432
       +
       

     

       433
       433
       +
       1. Use some existing user and group by defining `User=` and `Group=` directives

     

       434
       434
       +
          in our service definition; or

     

       435
       435
       +
       2. Create ephemeral user on-demand before our service starts, by using directive

     

       436
       436
       +
         `DynamicUser=true` in service definition.

     

       437
       437
       +
       

     

       438
       438
       +
       I prefer second option, as it additionally provides a lot of other security

     

       439
       439
       +
       related options, like creating private `/tmp` directory, making system

     

       440
       440
       +
       read-only, etc. This has also some disadvantages, like removing all of given

     

       441
       441
       +
       data on service shutdown, however there are options to keep some data between

     

       442
       442
       +
       launches.

     

       443
       443
       +
       

     

       444
       444
       +
       In addition to that we can add `PrivateDevices=true` that will hide all

     

       445
       445
       +
       physical devices from `/dev` leaving only pseudo devices like `/dev/null` or

     

       446
       446
       +
       `/dev/urandom` (so you will be able to use `:crypto` and `:ssl` modules without

     

       447
       447
       +
       problems).

     

       448
       448
       +
       

     

       449
       449
       +
       Next thing is that we can do, is to [disable crash dumps generated by BEAM][crash].

     

       450
       450
       +
       While not strictly needed in this case, it is worth remembering, that it isn't

     

       451
       451
       +
       hard to achieve, it is just using `Environment=ERL_CRASH_DUMP_SECONDS=0`.

     

       452
       452
       +
       

     

       453
       453
       +
       Our new, more secure, `hello.service` will look like:

     

       454
       454
       +
       

     

       455
       455
       +
       ```ini

     

       456
       456
       +
       [Unit]

     

       457
       457
       +
       Description=Hello World service

     

       458
       458
       +
       Requires=network.target

     

       459
       459
       +
       

     

       460
       460
       +
       [Service]

     

       461
       461
       +
       Type=notify

     

       462
       462
       +
       Environment=PORT=80

     

       463
       463
       +
       ExecStart=/opt/plug/bin/plug start

     

       464
       464
       +
       WatchdogSec=1min

     

       465
       465
       +
       

     

       466
       466
       +
       # We need to add capability to be able to bind on port 80

     

       467
       467
       +
       CapabilityBoundingSet=CAP_NET_BIND_SERVICE

     

       468
       468
       +
       

     

       469
       469
       +
       # Hardening

     

       470
       470
       +
       DynamicUser=true

     

       471
       471
       +
       PrivateDevices=true

     

       472
       472
       +
       Environment=ERL_CRASH_DUMP_SECONDS=0

     

       473
       473
       +
       ```

     

       474
       474
       +
       

     

       475
       475
       +
       The problem with that configuration is that our service is now capable on

     

       476
       476
       +
       binding **any** port under 1024, so for example, if there is some security

     

       477
       477
       +
       issue, then the malicious party can open any of the restricted ports and then

     

       478
       478
       +
       serve whatever data they want there. This can be quite problematic, and the

     

       479
       479
       +
       solution for that problem will be covered in Part 2, where we will cover socket

     

       480
       480
       +
       passing and socket activation for our service.

     

       481
       481
       +
       

     

       482
       482
       +
       With that we achieved quite basic level of isolation to what Docker (or other

     

       483
       483
       +
       container runtime) is providing, but it do not require `overlayfs` or anything

     

       484
       484
       +
       more, than what you already have on your machine. That means, updates done by

     

       485
       485
       +
       your system package manager will be applied to all running services. With that

     

       486
       486
       +
       you do not need to rebuild all your containers when there is security patch

     

       487
       487
       +
       issued for any of your dependencies.

     

       488
       488
       +
       

     

       489
       489
       +
       Of course it only scratches the surface of what is possible with systemd wrt

     

       490
       490
       +
       the hardening of the services. More information can be found in [RedHat

     

       491
       491
       +
       article][rh-systemd-hardening] and in [`systemd-analyze security` command

     

       492
       492
       +
       output][systemd-analyze-security]. Possible features are:

     

       493
       493
       +
       

     

       494
       494
       +
       - creation of the private networks for your services

     

       495
       495
       +
       - disallowing creation of socket connections that are outside of the specified

     

       496
       496
       +
         set of families

     

       497
       497
       +
       - make only some paths readable

     

       498
       498
       +
       - hide some paths from the process

     

       499
       499
       +
       - etc.

     

       500
       500
       +
       

     

       501
       501
       +
       Coverage of just that topic is a little bit out of scope for this blog post, so

     

       502
       502
       +
       I encourage you to read the documentation of [`systemd.exec`][systemd.exec] and

     

       503
       503
       +
       articles mentioned above for more details.

     

       504
       504
       +
       

     

       505
       505
       +
       [crash]: https://erlef.github.io/security-wg/secure_coding_and_deployment_hardening/crash_dumps

     

       506
       506
       +
       [rh-systemd-hardening]: https://www.redhat.com/sysadmin/mastering-systemd

     

       507
       507
       +
       [systemd-analyze-security]: https://www.freedesktop.org/software/systemd/man/systemd-analyze.html#systemd-analyze%20security%20%5BUNIT...%5D

     

       508
       508
       +
       [systemd.exec]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html

     

       509
       509
       +
       

     

       510
       510
       +
       ## Summary

     

       511
       511
       +
       

     

       512
       512
       +
       This blog post is already quite lengthy, so I will split it into separate parts.

     

       513
       513
       +
       There probably will be 3 of them:

     

       514
       514
       +
       

     

       515
       515
       +
       - [Part 1 - Basics, security, and FD passing (this one)](?1)

     

       516
       516
       +
       - Part 2 - Socket activation

     

       517
       517
       +
       - Part 3 - Logging

+9 -3

netlify.toml

···

       1
       1
       +
       [build]

     

       2
       2
       +
         command = "zola build"

     

       3
       3
       +
         publish = "public/"

     

       4
       4
       +
       

     

       5
       5
       +
       [context.deploy-preview]

     

       6
       6
       +
         command = "zola build --drafts"

     

       7
       7
       +
       

     

       1
       8
        
       [[headers]]

     

       2
       9
        
         for = "/*"

     

       3
       10
        
         [headers.values]

     

       11
       11
       +
           # Disable Google cohort tracking

     

       4
       12
        
           Permission-Policy = "interest-cohort=()"

     

       13
       13
       +
           # Disallow showing the website in frames

     

       5
       14
        
           X-Frame-Options = "DENY"

     

       6
       15
        
           X-XSS-Protection = "1; mode=block"

     

       7
       7
       -
       

     

       8
       8
       -
       [context.deploy-preview]

     

       9
       9
       -
         command = "zola build --drafts"

     

       10
       16
        
       

     

       11
       17
        
       [[redirects]]

     

       12
       18
        
         from = "/post"