content/post/log-all-the-things.md at b107a783b0937d6d00d325f8f5a09615f4ceb904 · hauleth.dev/blog

hauleth.dev / blog
this repo has no description
blog / content / post / log-all-the-things.md
at b107a783b0937d6d00d325f8f5a09615f4ceb904 18 kB view raw view rendered
  1+++
  2title = "Log all the things"
  3date = 2021-10-13
  4draft = true
  5
  6[taxonomies]
  7tags = [
  8  "elixir",
  9  "programming",
 10  "observability"
 11]
 12+++
 13
 14In Elixir 1.11 landed set of new features that allows for more powerful logging
 15by utilising Erlang's [`logger`][erl-log] features. Here I will try to descibe
 16new possibilities and how You can use them to improve your logs.
 17
 18<!-- more -->
 19
 20## New log levels {#levels}
 21
 22Elixir gained 4 new log levels to total 8 (from most verbose to least verbose):
 23
 24- debug
 25- info
 26- **notice** *
 27- warning (renamed from warn)
 28- error
 29- **critical** *
 30- **alert** *
 31- **emergency** *
 32
 33<small>* new levels</small>
 34
 35This allow to provide finer graded verbosity control, however due to
 36compatibility reasons, in Elixir backends we need to translate these levels back
 37to "old" set of 4. The current table looks like:
 38
 39| Call level            | What Elixir backend will see |
 40| --                    | --                           |
 41| `debug`               | `debug`                      |
 42| `info`                | `info`                       |
 43| `notice`              | **`info`** *                 |
 44| `warning` (or `warn`) | `warn`                       |
 45| `error`               | `error`                      |
 46| `critical`            | **`error`** *                |
 47| `alert`               | **`error`** *                |
 48| `emergency`           | **`error`** *                |
 49
 50<small>* "translated" messages</small>
 51
 52However we can set verbosity to all levels. This may be confusing during the
 53transition period, but we cannot change the behaviour until Elixir 2 (which is
 54not happening any time soon).
 55
 56Usage of the new levels is "obvious":
 57
 58```elixir
 59Logger.notice("Hello")
 60```
 61
 62Will produce message with `notice` level of verbosity.
 63
 64Additionally the `logger.level` option in configuration supports 2 additional
 65verbosity levels that you can use in your config:
 66
 67- `:all` - all messages will be logged, logically exactly the same as `:debug`
 68- `:none` - no messages will be logged
 69
 70## Per module log level {#per-module-level}
 71
 72This is change that can be quite handy during debugging sessions. With this
 73change we have 4 new functions in `Logger` module:
 74
 75- [`get_module_level/1`](https://hexdocs.pm/logger/Logger.html#get_module_level/1)
 76- [`put_module_level/2`](https://hexdocs.pm/logger/Logger.html#put_module_level/2)
 77- [`delete_module_level/1`](https://hexdocs.pm/logger/Logger.html#delete_module_level/1)
 78- [`delete_all_module_level/0`](https://hexdocs.pm/logger/Logger.html#delete_all_module_level/0)
 79
 80These allow us to manipulate verbosity level on per-module basis. What is
 81non-obvious and is super handy is that it allows both lowering **and raising**
 82verbosity for given module. This mean that:
 83
 84```elixir
 85require Logger
 86
 87Logger.configure(level: :error)
 88
 89defmodule Foo do
 90  def run do
 91    Logger.debug("I am still there")
 92  end
 93end
 94
 95Foo.run() # Does not log anything
 96
 97# Set `debug` level for `Foo` module only
 98Logger.put_module_level(Foo, :debug)
 99Foo.run()
100# `I am still there` is logged
101Logger.debug("I will not be printed")
102# Nothing got logged as top-level verbositi is still set to `:error`
103```
104
105Of course it will not work if you decide to use [compile time purging](https://hexdocs.pm/logger/Logger.html#module-application-configuration).
106
107## Logger handlers {#handlers}
108
109---
110
111**Warning!** This is not fully implemented in both - Erlang and Elixir. Writing
112your own handlers without additional knowledge can cause overload problems.
113
114---
115
116Erlang together with their logging implementation needed to provide a way to
117ingest these logs somehow. This is done via Erlang logger handlers (in this
118article called *handlers* in contrast to Elixir backends called *backends*
119there).
120
121Handlers are modules that export at least 1 function `log/2` that takes 2
122arguments:
123
124- `log_event` which is a map with 3 fields:
125  - `:level` - verbosity level
126  - `:msg` - tuple describing message:
127    - `{:io.format(), [term()]}` - format string and list of terms that should
128      be passed to `:io_lib.format/2` function
129    - `{:report, map() | keyword()}` - report that can be formatted into string
130      by `report_cb/{1,2}` set in metadata map (see below)
131    - `{:string, :unicode.chardata()}` - raw string that should be printed as
132      a message
133  - `:meta` - map containing all metadata for given event. All keys should be
134    atoms and values can be anything. Some keys have special meaning, and some
135    of them will be populated automatically by the `Logger` macros and functions.
136    These are:
137    - `:pid` - PID of the process that fired log event
138    - `:gl` - group leader of the process that fired log event
139    - `:mfa` - tuple in form of `{module(), name :: atom(), arity :: non_neg_integer()}`
140      that describe function that fired log event
141    - `:file` - filename of file that defines the code that fired log event
142    - `:line` - line in the given file where the log event was fired
143    - `:domain` - list of atoms that can be used to describe log events
144      hierarchy which then can be used for filtering. All events fired using
145      `Logger` macros and functions will have `:elixir` prepended to their
146      domain list.
147    - `:report_cb` - function that will be used to format `{:report, map() | keyword()}`
148      messages. This can be either 1-ary function, that takes report and returns
149      `{:io.format(), [term()]}` leaving truncation and further formatting up to
150      the main formatter, or 2-ary function that takes report and configuration
151      map `%{depth: pos_integer() | :unlimited, chars_limit: pos_integer() |
152      :unlimited, single_line: boolean()}` and returns already formatted
153      `:unicode.chardata()`. More about it can be found in [separate section](#structured-logging).
154
155Return value of this function is ignored. If there will be any exception raised
156when calling this function, then it will be captured and failing handler will be
157removed. This is important, as if such handler is the only one, then you can be
158left without any logging handler and miss logs.
159
160The important thing about Erlang handlers and Elixir backends is that Erlang
161handlers functions are called **within caller process** while Elixir backends
162are called in separate process. This mean that wrongly written Erlang handler
163can cause quite substantial load on application.
164
165To read on other, optional, callbacks that can be defined by Erlang handler, that
166will not be covered there, I suggest looking into [Erlang documentation](https://erlang.org/doc/man/logger.html#formatter-callback-functions).
167
168## Structured logging {#structured-logging}
169
170One of the biggest new features in the Elixir 1.11 is support for structured
171logging. This mean that the log message do not need to be free-form string, but
172instead we can pass structure, that can provide more machine-readable data for
173processing in log aggregators. In Elixir 1.11 is is simple as passing map as a
174first argument to the `Logger` macros:
175
176```elixir
177Logger.info(%{
178  status: :completed,
179  response: :ok
180})
181```
182
183This will produce message that looks like:
184
185```log
18614:08:46.849 [info]  [response: :ok, status: :completed]
187```
188
189As we can see, the map (called *report*) is formatted as a keyword list. This is
190default way to present the report data. Unfortunately we cannot access the
191metadata from the Elixir backends, but we have 2 ways to make these messages
192more readable for the human operator:
193
1941. Utilise [`Logger`'s translators](https://hexdocs.pm/logger/Logger.Translator.html)
1951. Using `:report_cb` field in metadata
196
1971st option is described quite good in Elixir documentation and is available
198since Elixir 1.0 as it was used to translate `error_logger` messages in old
199Erlang versions. Here I will describe the 2nd option which provide way for
200**caller** to define how report should be formatted into human-readable string.
201
202`:report_cb` accepts 2 kind of functions as an argument:
203
204- 1-ary function, that takes report as an argument and should return tuple
205  in form of `{:io.format(), [term()]}` that will be later formatted
206  respectively by the formatters.
207- 2-ary function that takes report and configuration map as an arguments and
208  should return formatted string.
209
2101st option is much easier for most use cases, as it do not force you to worry
211about handling width, depth, and multiline logs, as it will all be handled for
212you.
213
214For example, instead of doing:
215
216```elixir
217Logger.info("Started HTTP server on http://localhost:8080")
218```
219
220We can do:
221
222```elixir
223Logger.info(
224  %{
225    protocol: :http,
226    port: 8080,
227    address: "localhost",
228    endpoint: MyEndpoint,
229    handler: Plug.Cowboy
230  },
231  report_cb: &__MODULE__.report_cb/1
232)
233
234# …
235
236def report_cb(%{protocol: protocol, port: port, address: address}) do
237  {"Started ~s server on ~s://~s:~B", [protocol, protocol, address, port]}
238end
239```
240
241While the second entry seems much more verbose, with proper handler, it can
242provide much more detailed output. Just imagine that we would have handler that
243output JSON data and what information we could contain in such message:
244
245```json
246{
247  "msg": "Started HTTP server on http://localhost:8080",
248  "metadata": {
249    "mfa": "MyMod.start/2",
250    "file": "foo.ex",
251    "line": 42
252  }
253}
254```
255
256Now our log aggregation service need to parse `msg` field to extract all
257information that is contained there, like port, address, and protocol. With
258structured logging we can have that message available already there while
259presenting the "human readable" form as well:
260
261```json
262{
263  "text": "Started HTTP server on http://localhost:8080",
264  "msg": {
265    "address": "localhost",
266    "port": 8080,
267    "protocol": "http",
268    "endpoint": "MyEndpoint",
269    "handler": "Plug.Cowboy"
270  },
271  "metadata": {
272    "mfa": "MyMod.start/2",
273    "file": "foo.ex",
274    "line": 42
275  }
276}
277```
278
279Additionally you can see there that we can have more information available in
280the structured log that would otherwise needed to be crammed somewhere into the
281text message, even if it is not important in "regular" Ops observability.
282
283This can raise a question - why not use metadata for such functionality, like it
284is available in [`LoggerJSON`][] or [`Ink`][]? The reason is that their reason
285existence is different. Metadata meant for "meta" stuff like location, tracing
286ID, but not for the information about the message itself. It is best shown on
287example. For this use Elixir's implementation of `GenServer` wrapper that
288produces error log entry on unknown message handled by default `handle_info/2`:
289
290```elixir
291Logger.error(
292  # Report
293  %{
294    label: {GenServer, :no_handle_info},
295    report: %{
296      module: __MODULE__,
297      message: msg,
298      name: proc
299    }
300  },
301  # Metadata
302  %{
303    domain: [:otp, :elixir],
304    error_logger: %{tag: :error_msg},
305    report_cb: &GenServer.format_report/1
306  }
307)
308```
309
310As we can see there, the report contains informations like:
311
312- `:label` - that describes type of the event
313- `:report` - content of the "main" event
314  - `:module` - module that created the event, it is important to notice, that
315    it is also present in metadata (as part of `:mfa` key), but their meaning is
316    different. Module name here is meant for the operator to know the name of
317    the implementor that failed to handle message, while `:mfa` is meant to
318    describe the location of the code that fired the event.
319  - `:message` - the message itself that hasn't been handled. Notice, that it is
320    not stringified in any way there, it is simply passed "as is" to the
321    report. It is meant to be stringified later by the `:report_cb` function.
322  - `:name` - name of the process. Remember, similarly to `:module`, the PID of
323    the current process is part of the metadata, so in theory we could use value
324    from there, but their meaning is different (additionally this one may be an
325    atom in case if the process is locally registered with name).
326
327Metadata on the other hand contains information that will be useful for
328filtering or formatting of the event.
329
330The rule of thumb you can follow is:
331
332> If it is thing that you will want to filter on, then it propably should be
333> part of the metadata. If you want to aggregate information or just display
334> them, it should be part of the message report.
335
336## Log filtering
337
338Finally we come to first feature that is not directly accessible from the Elixir
339`Logger` API (yet). Erlang's `logger` have powerful functionality for filtering
340log messages which allows us to dynamically decide which message should, or
341should not be logged. These even can alter messages on the fly.
342
343Currently that functionality is available only via `:logger` module. It can be
344used like:
345
346```elixir
347defmodule MyFilter do
348  def filter(log_event, opts) do
349    # …
350  end
351end
352
353:logger.add_primary_filter(:my_filter, {&MyFilter.filter/2, opts})
354# Or
355:logger.add_handler_filter(handler_id, :my_filter, {&MyFilter.filter/2, opts})
356```
357
358Few important things that need to be remembered when writing such filters:
359
360- It is best practice to make such functions public and define filters using
361  remote function capture, like `&__MODULE__.process_disabled/2` (so not
362  anonymous functions either). It will make such filter much easier for VM to
363  handle (it is bigger topic why it is that, I may to cover it in another post).
364- Filters are ran **within the same process that fired log event**, so it is
365  important to make such filters as fast as possible, and do not do any heavy
366  work there.
367
368Filters can be used for 2 different things:
369
370- preventing some messages from being logged
371- modifying a message
372
373While the former is much more common, I will try to describe both use cases
374there, as the latter is also quite useful.
375
376Filters are defined as 2-ary functions where 1st argument is log event, and
377second argument is any term that can be used as a configuration for filter.
378Filter should return one of these 3 values:
379
380- `:stop` - which will immediately discard message and do not run any additional
381  filters.
382- `:ignore` - which mean that given filter didn't recognise the given message
383  and leaves it up to other filters to decide on the action. If all filters
384  return `:ignore` then `:filter_default` option for the handler will be taken.
385  By default it is `:log`, which mean that message will be logged, but default
386  handler has it set to `:stop` by default, which mean, that non-matching
387  messages will be discarded.
388- Just log event (possibly modified) that will cause next filter to be called
389  with altered message. The message returned by the last filter (or in case of
390  `:ignore` return, previous filters) will be the message passed to handler.
391
392### Preventing some messages from being logged
393
394Most common use-case for filters will probably be rejecting messages that aren't
395important for us. [Erlang even prepared some useful filters][logger_filters]:
396
397- `domain` - allow filtering by metadata `:domain` field (remember as I said
398  that metadata is for filtering?). It supports multiple possible relations
399  between the log domain and defined domain.
400- `level` - allow filtering (in or out) messages depending on their level, in
401  both directions. So it will allow you to filter messages with higher level for
402  some handlers. Just remember, that it will not receive messages that will not
403  pass primary/module level.
404- `progress` - filters all reports from `supervisor` and
405  `application_controller`. Simply, reduces startup/process shutdown chatter
406  that often is meaningless for most time.
407- `remote_gl` - filters messages coming from group leader on another node.
408  Useful when you want to discard/log messages coming from other nodes in
409  cluster.
410
411### Modifying a message
412
413Sometimes (hopefully rarely) there is need to alter messages in the system. For
414example we may need to prevent sensitive information from being logged. When
415using "old" Elixir approach you could abuse translators, but that was error
416prone, as first successful translator was breaking pipeline, so you couldn't
417just smash one on top and then keep rest working as is. With "new" approach and
418structured logging you can just traverse the report and replace all occurences
419of the unsafe data with anonymised data. For example:
420
421```elixir
422def filter_out_password(%{msg: {:report, report}} = event, _opts) do
423  %{event | msg: {:report, replace(report)}}
424end
425
426@filtered "[FILTERED]"
427
428defp replace(%{password: _} = map) do
429  for {k, v} <- %{map | password: @filtered}, into: %{} do
430    {k, replace(v)}
431  end
432end
433
434defp replace(%{"password" => _} = map) do
435  for {k, v} <- %{map | "password" => @filtered}, into: %{} do
436    {k, replace(v)}
437  end
438end
439
440defp replace(list) when is_list(list) do
441  for elem <- list do
442    case elem do
443      {:password, _} -> {:password, @filtered}
444      {"password", _} -> {"password", @filtered}
445      {k, v} -> {k, replace(v)}
446      other -> replace(other)
447    end
448  end
449end
450
451defp replace(other), do: other
452```
453
454This snippet will replace all occurences of `:password` or `"password"` with
455filtered out value.
456
457There is disadvantage of such approach though - it will make all messages with
458such fields allowed in case if your filter has `:filter_default` set to `:stop`.
459That mean, that if you want to make some of them rejected anyway, then you will
460need to manually add additional step to reject messages that do not fit into
461your patterns. Alternatively you can use `filter_default: :log` and then use
462opt-out logging. There currently is no way to alter the message and make other
463filters decide whether log it or not (as of OTP 24).
464
465## Summary
466
467New features and possibilities with relation to logging in Elixir 1.11 can be
468overwhelming. Fortunately all of the new features are optional and provided in
469addition to "good 'ol `Logger.info("logging")`". But for the people who works on
470the observability in BEAM (EEF Observability WG, Sentry, Logflare, etc.) it
471brings a lot of new powerful capabilities.
472
473I am thrilled to see what will people create using all that power.
474
475[erl-log]: https://erlang.org/doc/man/logger.html
476[syslog]: https://en.wikipedia.org/wiki/Syslog#Severity_level
477[`LoggerJSON`]: https://github.com/Nebo15/logger_json
478[`Ink`]: https://hex.pm/packages/ink
479[logger_filters]: https://erlang.org/doc/man/logger_filters.html