content/post/log-all-the-things.md at f55d4d605d168495e3a989e7292a03f98f97f027 · hauleth.dev/blog

hauleth.dev / blog
this repo has no description
blog / content / post / log-all-the-things.md
at f55d4d605d168495e3a989e7292a03f98f97f027 18 kB view raw view rendered
  1+++
  2title = "Log all the things"
  3date = 2022-01-02
  4
  5[taxonomies]
  6tags = [
  7  "beam",
  8  "observability"
  9]
 10
 11[[extra.thanks]]
 12name = "Kai Wern Choong"
 13+++
 14
 15In Elixir 1.11 landed set of new features that allows for more powerful logging
 16by utilising Erlang's [`logger`][erl-log] features. Here I will try to describe
 17new possibilities and how You can use them to improve your logs.
 18
 19<!-- more -->
 20
 21## New log levels {#levels}
 22
 23Elixir gained 4 new log levels to total 8 (from most verbose to least verbose):
 24
 25- debug
 26- info
 27- **notice** *
 28- warning (renamed from warn)
 29- error
 30- **critical** *
 31- **alert** *
 32- **emergency** *
 33
 34<small>* new levels</small>
 35
 36This allow to provide finer graded verbosity control, due to compatibility
 37reasons, in Elixir backends we need to translate these levels back to "old" set
 38of 4. The current table looks like:
 39
 40| Call level            | What Elixir backend will see |
 41| --                    | --                           |
 42| `debug`               | `debug`                      |
 43| `info`                | `info`                       |
 44| `notice`              | **`info`** *                 |
 45| `warning` (or `warn`) | `warn`                       |
 46| `error`               | `error`                      |
 47| `critical`            | **`error`** *                |
 48| `alert`               | **`error`** *                |
 49| `emergency`           | **`error`** *                |
 50
 51<small>* "translated" messages</small>
 52
 53We can set verbosity to all levels. This may be confusing during the transition
 54period, but we cannot change the behaviour until Elixir 2 (which is not
 55happening any time soon).
 56
 57Usage of the new levels is "obvious":
 58
 59```elixir
 60Logger.notice("Hello")
 61```
 62
 63Will produce message with `notice` level of verbosity.
 64
 65Additionally the `logger.level` option in configuration supports 2 additional
 66verbosity levels that you can use in your config:
 67
 68- `:all` - all messages will be logged, logically exactly the same as `:debug`
 69- `:none` - no messages will be logged
 70
 71## Per module log level {#per-module-level}
 72
 73This is change that can be quite handy during debugging sessions. With this
 74change we have 4 new functions in `Logger` module:
 75
 76- [`get_module_level/1`](https://hexdocs.pm/logger/Logger.html#get_module_level/1)
 77- [`put_module_level/2`](https://hexdocs.pm/logger/Logger.html#put_module_level/2)
 78- [`delete_module_level/1`](https://hexdocs.pm/logger/Logger.html#delete_module_level/1)
 79- [`delete_all_module_level/0`](https://hexdocs.pm/logger/Logger.html#delete_all_module_level/0)
 80
 81These allow us to manipulate verbosity level on per-module basis. What is
 82non-obvious and is super handy is that it allows both lowering **and raising**
 83verbosity for given module. This mean that:
 84
 85```elixir
 86require Logger
 87
 88Logger.configure(level: :error)
 89
 90defmodule Foo do
 91  def run do
 92    Logger.debug("I am still there")
 93  end
 94end
 95
 96Foo.run() # Does not log anything
 97
 98# Set `debug` level for `Foo` module only
 99Logger.put_module_level(Foo, :debug)
100Foo.run()
101# `I am still there` is logged
102Logger.debug("I will not be printed")
103# Nothing got logged as top-level verbositi is still set to `:error`
104```
105
106Of course it will not work if you decide to use [compile time purging][logger-purge]
107
108## Logger handlers {#handlers}
109
110---
111
112**Warning!** This is not fully implemented in both - Erlang and Elixir. Writing
113your own handlers without additional knowledge can cause overload problems.
114
115---
116
117Erlang together with their logging implementation needed to provide a way to
118ingest these logs somehow. This is done via Erlang logger handlers (in this
119article called *handlers* in contrast to Elixir backends called *backends*
120there).
121
122Handlers are modules that export at least 1 function `log/2` that takes 2
123arguments:
124
125- `log_event` which is a map with 3 fields:
126  - `:level` - verbosity level
127  - `:msg` - tuple describing message:
128    - `{:io.format(), [term()]}` - format string and list of terms that should
129      be passed to `:io_lib.format/2` function
130    - `{:report, map() | keyword()}` - report that can be formatted into string
131      by `report_cb/{1,2}` set in metadata map (see below)
132    - `{:string, :unicode.chardata()}` - raw string that should be printed as
133      a message
134  - `:meta` - map containing all metadata for given event. All keys should be
135    atoms and values can be anything. Some keys have special meaning, and some
136    of them will be populated automatically by the `Logger` macros and functions.
137    These are:
138    - `:pid` - PID of the process that fired log event
139    - `:gl` - group leader of the process that fired log event
140    - `:mfa` - tuple in form of `{module(), name :: atom(), arity :: non_neg_integer()}`
141      that describe function that fired log event
142    - `:file` - filename of file that defines the code that fired log event
143    - `:line` - line in the given file where the log event was fired
144    - `:domain` - list of atoms that can be used to describe log events
145      hierarchy which then can be used for filtering. All events fired using
146      `Logger` macros and functions will have `:elixir` prepended to their
147      domain list.
148    - `:report_cb` - function that will be used to format `{:report, map() |
149      keyword()}` messages. This can be either 1-ary function, that takes report
150      and returns `{:io.format(), [term()]}` leaving truncation and further
151      formatting up to the main formatter, or 2-ary function that takes report
152      and configuration map `%{depth: pos_integer() | :unlimited, chars_limit:
153      pos_integer() | :unlimited, single_line: boolean()}` and returns already
154      formatted `:unicode.chardata()`. More about it can be found in [separate
155      section](#structured-logging).
156
157Return value of this function is ignored. If there will be any exception raised
158when calling this function, then it will be captured and failing handler will be
159removed. This is important, as if such handler is the only one, then you can be
160left without any logging handler and miss logs.
161
162The important thing about Erlang handlers and Elixir backends is that Erlang
163handlers functions are called **within caller process** while Elixir backends
164are called in separate process. This mean that wrongly written Erlang handler
165can cause quite substantial load on application.
166
167To read on other, optional, callbacks that can be defined by Erlang handler, that
168will not be covered there, I suggest looking into [Erlang documentation][formatter_cb].
169
170## Structured logging {#structured-logging}
171
172One of the biggest new features in the Elixir 1.11 is support for structured
173logging. This mean that the log message do not need to be free-form string, but
174instead we can pass structure, that can provide more machine-readable data for
175processing in log aggregators. In Elixir 1.11 is simple as passing map as a
176first argument to the `Logger` macros:
177
178```elixir
179Logger.info(%{
180  status: :completed,
181  response: :ok
182})
183```
184
185This will produce message that looks like:
186
187```log
18814:08:46.849 [info]  [response: :ok, status: :completed]
189```
190
191As we can see, the map (called *report*) is formatted as a keyword list. This is
192default way to present the report data. Unfortunately we cannot access the
193metadata from the Elixir backends, but we have 2 ways to make these messages
194more readable for the human operator:
195
1961. Utilise [`Logger`'s translators](https://hexdocs.pm/logger/Logger.Translator.html)
1971. Using `:report_cb` field in metadata
198
1991st option is described quite good in Elixir documentation and is available
200since Elixir 1.0 as it was used to translate `error_logger` messages in old
201Erlang versions. Here I will describe the 2nd option which provide way for
202**caller** to define how report should be formatted into human-readable string.
203
204`:report_cb` accepts 2 kind of functions as an argument:
205
206- 1-ary function, that takes report as an argument and should return tuple
207  in form of `{:io.format(), [term()]}` that will be later formatted
208  respectively by the formatters.
209- 2-ary function that takes report and configuration map as an arguments and
210  should return formatted string.
211
2121st option is much easier for most use cases, as it do not force you to worry
213about handling width, depth, and multiline logs, as it will all be handled for
214you.
215
216For example, instead of doing:
217
218```elixir
219Logger.info("Started HTTP server on http://localhost:8080")
220```
221
222We can do:
223
224```elixir
225Logger.info(
226  %{
227    protocol: :http,
228    port: 8080,
229    address: "localhost",
230    endpoint: MyEndpoint,
231    handler: Plug.Cowboy
232  },
233  report_cb: &__MODULE__.report_cb/1
234)
235
236# …
237
238def report_cb(%{protocol: protocol, port: port, address: address}) do
239  {"Started ~s server on ~s://~s:~B", [protocol, protocol, address, port]}
240end
241```
242
243While the second entry seems much more verbose, with proper handler, it can
244provide much more detailed output. Just imagine that we would have handler that
245output JSON data and what information we could contain in such message:
246
247```json
248{
249  "msg": "Started HTTP server on http://localhost:8080",
250  "metadata": {
251    "mfa": "MyMod.start/2",
252    "file": "foo.ex",
253    "line": 42
254  }
255}
256```
257
258Now our log aggregation service need to parse `msg` field to extract all
259information that is contained there, like port, address, and protocol. With
260structured logging we can have that message available already there while
261presenting the "human readable" form as well:
262
263```json
264{
265  "text": "Started HTTP server on http://localhost:8080",
266  "msg": {
267    "address": "localhost",
268    "port": 8080,
269    "protocol": "http",
270    "endpoint": "MyEndpoint",
271    "handler": "Plug.Cowboy"
272  },
273  "metadata": {
274    "mfa": "MyMod.start/2",
275    "file": "foo.ex",
276    "line": 42
277  }
278}
279```
280
281You can see there that we can have more information available in the structured
282log that would otherwise needed to be crammed somewhere into the text message,
283even if it is not important in "regular" Ops observability.
284
285This can raise a question - why not use metadata for such functionality, like it
286is available in [`LoggerJSON`][] or [`Ink`][]? The reason is that their reason
287existence is different. Metadata meant for "meta" stuff like location, tracing
288ID, but not for the information about the message itself. It is best shown on
289example. For this use Elixir's implementation of `GenServer` wrapper that
290produces error log entry on unknown message handled by default `handle_info/2`:
291
292```elixir
293Logger.error(
294  # Report
295  %{
296    label: {GenServer, :no_handle_info},
297    report: %{
298      module: __MODULE__,
299      message: msg,
300      name: proc
301    }
302  },
303  # Metadata
304  %{
305    error_logger: %{tag: :error_msg},
306    report_cb: &GenServer.format_report/1
307  }
308)
309```
310
311As we can see there, the report contains information like:
312
313- `:label` - that describes type of the event
314- `:report` - content of the "main" event
315  - `:module` - module that created the event, it is important to notice, that
316    it is also present in metadata (as part of `:mfa` key), but their meaning is
317    different. Module name here is meant for the operator to know the name of
318    the implementor that failed to handle message, while `:mfa` is meant to
319    describe the location of the code that fired the event.
320  - `:message` - the message itself that hasn't been handled. Notice, that it is
321    not stringified in any way there, it is simply passed "as is" to the
322    report. It is meant to be stringified later by the `:report_cb` function.
323  - `:name` - name of the process. Remember, similarly to `:module`, the PID of
324    the current process is part of the metadata, so in theory we could use value
325    from there, but their meaning is different (additionally this one may be an
326    atom in case if the process is locally registered with name).
327
328Metadata on the other hand contains information that will be useful for
329filtering or formatting of the event.
330
331The rule of thumb you can follow is:
332
333> If it is thing that you will want to filter on, then it probably should be
334> part of the metadata. If you want to aggregate information or just display
335> them, it should be part of the message report.
336
337## Log filtering
338
339Finally we come to first feature that is not directly accessible from the Elixir
340`Logger` API (yet). Erlang's `logger` have powerful functionality for filtering
341log messages which allows us to dynamically decide which message should, or
342should not be logged. These even can alter messages on the fly.
343
344Currently that functionality is available only via `:logger` module. It can be
345used like:
346
347```elixir
348defmodule MyFilter do
349  def filter(log_event, opts) do
350    # …
351  end
352end
353
354:logger.add_primary_filter(:my_filter, {&MyFilter.filter/2, opts})
355# Or
356:logger.add_handler_filter(handler_id, :my_filter, {&MyFilter.filter/2, opts})
357```
358
359Few important things that need to be remembered when writing such filters:
360
361- It is best practice to make such functions public and define filters using
362  remote function capture, like `&__MODULE__.process_disabled/2` (so not
363  anonymous functions either). It will make such filter much easier for VM to
364  handle (it is bigger topic why it is that, I may to cover it in another post).
365- Filters are ran **within the same process that fired log event**, so it is
366  important to make such filters as fast as possible, and do not do any heavy
367  work there.
368
369Filters can be used for 2 different things:
370
371- preventing some messages from being logged
372- modifying a message
373
374While the former is much more common, I will try to describe both use cases
375there, as the latter is also quite useful.
376
377Filters are defined as 2-ary functions where 1st argument is log event, and
378second argument is any term that can be used as a configuration for filter.
379Filter should return one of these 3 values:
380
381- `:stop` - which will immediately discard message and do not run any additional
382  filters.
383- `:ignore` - which mean that given filter didn't recognise the given message
384  and leaves it up to other filters to decide on the action. If all filters
385  return `:ignore` then `:filter_default` option for the handler will be taken.
386  By default it is `:log`, which mean that message will be logged, but default
387  handler has it set to `:stop` by default, which mean, that non-matching
388  messages will be discarded.
389- Just log event (possibly modified) that will cause next filter to be called
390  with altered message. The message returned by the last filter (or in case of
391  `:ignore` return, previous filters) will be the message passed to handler.
392
393### Preventing some messages from being logged
394
395Most common use-case for filters will probably be rejecting messages that aren't
396important for us. [Erlang even prepared some useful filters][logger_filters]:
397
398- `domain` - allow filtering by metadata `:domain` field (remember as I said
399  that metadata is for filtering?). It supports multiple possible relations
400  between the log domain and defined domain.
401- `level` - allow filtering (in or out) messages depending on their level, in
402  both directions. It will allow you to filter messages with higher level for
403  some handlers. Just remember, that it will not receive messages that will not
404  pass primary/module level.
405- `progress` - filters all reports from `supervisor` and
406  `application_controller`. Simply, reduces startup/process shutdown chatter
407  that often is meaningless for most time.
408- `remote_gl` - filters messages coming from group leader on another node.
409  Useful when you want to discard/log messages coming from other nodes in
410  cluster.
411
412### Modifying a message
413
414Sometimes there is need to alter messages in the system. For example we may need
415to prevent sensitive information from being logged. When using "old" Elixir
416approach you could abuse translators, but that was error prone, as first
417successful translator was breaking pipeline, so you couldn't just smash one on
418top and then keep rest working as is. With "new" approach and structured logging
419you can just traverse the report and replace all occurrences of the unsafe data
420with anonymised data. For example:
421
422```elixir
423def filter_out_password(%{msg: {:report, report}} = event, _opts) do
424  %{event | msg: {:report, replace(report)}}
425end
426
427@filtered "[FILTERED]"
428
429defp replace(%{password: _} = map) do
430  for {k, v} <- %{map | password: @filtered}, into: %{} do
431    {k, replace(v)}
432  end
433end
434
435defp replace(%{"password" => _} = map) do
436  for {k, v} <- %{map | "password" => @filtered}, into: %{} do
437    {k, replace(v)}
438  end
439end
440
441defp replace(list) when is_list(list) do
442  for elem <- list do
443    case elem do
444      {:password, _} -> {:password, @filtered}
445      {"password", _} -> {"password", @filtered}
446      {k, v} -> {k, replace(v)}
447      other -> replace(other)
448    end
449  end
450end
451
452defp replace(other), do: other
453```
454
455This snippet will replace all occurrences of `:password` or `"password"` with
456filtered out value.
457
458The disadvantage of such approach - it will make all messages with such fields
459allowed in case if your filter has `:filter_default` set to `:stop`. That mean,
460that if you want to make some of them rejected anyway, then you will need to
461manually add additional step to reject messages that do not fit into your
462patterns. Alternatively you can use `filter_default: :log` and then use opt-out
463logging. There currently is no way to alter the message and make other filters
464decide whether log it or not (as of OTP 24).
465
466## Summary
467
468New features and possibilities with relation to logging in Elixir 1.11 can be
469overwhelming. Fortunately all of the new features are optional and provided in
470addition to "good 'ol `Logger.info("logging")`". But for the people who works on
471the observability in BEAM (EEF Observability WG, Sentry, Logflare, etc.) it
472brings a lot of new powerful capabilities.
473
474I am thrilled to see what will people create using all that power.
475
476[erl-log]: https://erlang.org/doc/man/logger.html
477[syslog]: https://en.wikipedia.org/wiki/Syslog#Severity_level
478[`LoggerJSON`]: https://github.com/Nebo15/logger_json
479[`Ink`]: https://hex.pm/packages/ink
480[logger_filters]: https://erlang.org/doc/man/logger_filters.html
481[logger-purge]: https://hexdocs.pm/logger/Logger.html#module-application-configuration
482[formatter_cb]: https://erlang.org/doc/man/logger.html#formatter-callback-functions