content/post/log-all-the-things.md at 48ea4db49c3eeb55665b88479ec593da63276d6b · hauleth.dev/blog

hauleth.dev / blog
this repo has no description
blog / content / post / log-all-the-things.md
at 48ea4db49c3eeb55665b88479ec593da63276d6b 18 kB view raw view rendered
  1+++
  2title = "Log all the things"
  3date = 2021-10-13
  4draft = true
  5
  6[taxonomies]
  7tags = [
  8  "elixir",
  9  "programming",
 10  "observability"
 11]
 12
 13[[extra.thanks]]
 14name = "Kai Wern Choong"
 15+++
 16
 17In Elixir 1.11 landed set of new features that allows for more powerful logging
 18by utilising Erlang's [`logger`][erl-log] features. Here I will try to describe
 19new possibilities and how You can use them to improve your logs.
 20
 21<!-- more -->
 22
 23## New log levels {#levels}
 24
 25Elixir gained 4 new log levels to total 8 (from most verbose to least verbose):
 26
 27- debug
 28- info
 29- **notice** *
 30- warning (renamed from warn)
 31- error
 32- **critical** *
 33- **alert** *
 34- **emergency** *
 35
 36<small>* new levels</small>
 37
 38This allow to provide finer graded verbosity control, due to compatibility
 39reasons, in Elixir backends we need to translate these levels back to "old" set
 40of 4. The current table looks like:
 41
 42| Call level            | What Elixir backend will see |
 43| --                    | --                           |
 44| `debug`               | `debug`                      |
 45| `info`                | `info`                       |
 46| `notice`              | **`info`** *                 |
 47| `warning` (or `warn`) | `warn`                       |
 48| `error`               | `error`                      |
 49| `critical`            | **`error`** *                |
 50| `alert`               | **`error`** *                |
 51| `emergency`           | **`error`** *                |
 52
 53<small>* "translated" messages</small>
 54
 55We can set verbosity to all levels. This may be confusing during the transition
 56period, but we cannot change the behaviour until Elixir 2 (which is not
 57happening any time soon).
 58
 59Usage of the new levels is "obvious":
 60
 61```elixir
 62Logger.notice("Hello")
 63```
 64
 65Will produce message with `notice` level of verbosity.
 66
 67Additionally the `logger.level` option in configuration supports 2 additional
 68verbosity levels that you can use in your config:
 69
 70- `:all` - all messages will be logged, logically exactly the same as `:debug`
 71- `:none` - no messages will be logged
 72
 73## Per module log level {#per-module-level}
 74
 75This is change that can be quite handy during debugging sessions. With this
 76change we have 4 new functions in `Logger` module:
 77
 78- [`get_module_level/1`](https://hexdocs.pm/logger/Logger.html#get_module_level/1)
 79- [`put_module_level/2`](https://hexdocs.pm/logger/Logger.html#put_module_level/2)
 80- [`delete_module_level/1`](https://hexdocs.pm/logger/Logger.html#delete_module_level/1)
 81- [`delete_all_module_level/0`](https://hexdocs.pm/logger/Logger.html#delete_all_module_level/0)
 82
 83These allow us to manipulate verbosity level on per-module basis. What is
 84non-obvious and is super handy is that it allows both lowering **and raising**
 85verbosity for given module. This mean that:
 86
 87```elixir
 88require Logger
 89
 90Logger.configure(level: :error)
 91
 92defmodule Foo do
 93  def run do
 94    Logger.debug("I am still there")
 95  end
 96end
 97
 98Foo.run() # Does not log anything
 99
100# Set `debug` level for `Foo` module only
101Logger.put_module_level(Foo, :debug)
102Foo.run()
103# `I am still there` is logged
104Logger.debug("I will not be printed")
105# Nothing got logged as top-level verbositi is still set to `:error`
106```
107
108Of course it will not work if you decide to use [compile time purging][logger-purge]
109
110## Logger handlers {#handlers}
111
112---
113
114**Warning!** This is not fully implemented in both - Erlang and Elixir. Writing
115your own handlers without additional knowledge can cause overload problems.
116
117---
118
119Erlang together with their logging implementation needed to provide a way to
120ingest these logs somehow. This is done via Erlang logger handlers (in this
121article called *handlers* in contrast to Elixir backends called *backends*
122there).
123
124Handlers are modules that export at least 1 function `log/2` that takes 2
125arguments:
126
127- `log_event` which is a map with 3 fields:
128  - `:level` - verbosity level
129  - `:msg` - tuple describing message:
130    - `{:io.format(), [term()]}` - format string and list of terms that should
131      be passed to `:io_lib.format/2` function
132    - `{:report, map() | keyword()}` - report that can be formatted into string
133      by `report_cb/{1,2}` set in metadata map (see below)
134    - `{:string, :unicode.chardata()}` - raw string that should be printed as
135      a message
136  - `:meta` - map containing all metadata for given event. All keys should be
137    atoms and values can be anything. Some keys have special meaning, and some
138    of them will be populated automatically by the `Logger` macros and functions.
139    These are:
140    - `:pid` - PID of the process that fired log event
141    - `:gl` - group leader of the process that fired log event
142    - `:mfa` - tuple in form of `{module(), name :: atom(), arity :: non_neg_integer()}`
143      that describe function that fired log event
144    - `:file` - filename of file that defines the code that fired log event
145    - `:line` - line in the given file where the log event was fired
146    - `:domain` - list of atoms that can be used to describe log events
147      hierarchy which then can be used for filtering. All events fired using
148      `Logger` macros and functions will have `:elixir` prepended to their
149      domain list.
150    - `:report_cb` - function that will be used to format `{:report, map() |
151      keyword()}` messages. This can be either 1-ary function, that takes report
152      and returns `{:io.format(), [term()]}` leaving truncation and further
153      formatting up to the main formatter, or 2-ary function that takes report
154      and configuration map `%{depth: pos_integer() | :unlimited, chars_limit:
155      pos_integer() | :unlimited, single_line: boolean()}` and returns already
156      formatted `:unicode.chardata()`. More about it can be found in [separate
157      section](#structured-logging).
158
159Return value of this function is ignored. If there will be any exception raised
160when calling this function, then it will be captured and failing handler will be
161removed. This is important, as if such handler is the only one, then you can be
162left without any logging handler and miss logs.
163
164The important thing about Erlang handlers and Elixir backends is that Erlang
165handlers functions are called **within caller process** while Elixir backends
166are called in separate process. This mean that wrongly written Erlang handler
167can cause quite substantial load on application.
168
169To read on other, optional, callbacks that can be defined by Erlang handler, that
170will not be covered there, I suggest looking into [Erlang documentation][formatter_cb].
171
172## Structured logging {#structured-logging}
173
174One of the biggest new features in the Elixir 1.11 is support for structured
175logging. This mean that the log message do not need to be free-form string, but
176instead we can pass structure, that can provide more machine-readable data for
177processing in log aggregators. In Elixir 1.11 is simple as passing map as a
178first argument to the `Logger` macros:
179
180```elixir
181Logger.info(%{
182  status: :completed,
183  response: :ok
184})
185```
186
187This will produce message that looks like:
188
189```log
19014:08:46.849 [info]  [response: :ok, status: :completed]
191```
192
193As we can see, the map (called *report*) is formatted as a keyword list. This is
194default way to present the report data. Unfortunately we cannot access the
195metadata from the Elixir backends, but we have 2 ways to make these messages
196more readable for the human operator:
197
1981. Utilise [`Logger`'s translators](https://hexdocs.pm/logger/Logger.Translator.html)
1991. Using `:report_cb` field in metadata
200
2011st option is described quite good in Elixir documentation and is available
202since Elixir 1.0 as it was used to translate `error_logger` messages in old
203Erlang versions. Here I will describe the 2nd option which provide way for
204**caller** to define how report should be formatted into human-readable string.
205
206`:report_cb` accepts 2 kind of functions as an argument:
207
208- 1-ary function, that takes report as an argument and should return tuple
209  in form of `{:io.format(), [term()]}` that will be later formatted
210  respectively by the formatters.
211- 2-ary function that takes report and configuration map as an arguments and
212  should return formatted string.
213
2141st option is much easier for most use cases, as it do not force you to worry
215about handling width, depth, and multiline logs, as it will all be handled for
216you.
217
218For example, instead of doing:
219
220```elixir
221Logger.info("Started HTTP server on http://localhost:8080")
222```
223
224We can do:
225
226```elixir
227Logger.info(
228  %{
229    protocol: :http,
230    port: 8080,
231    address: "localhost",
232    endpoint: MyEndpoint,
233    handler: Plug.Cowboy
234  },
235  report_cb: &__MODULE__.report_cb/1
236)
237
238# …
239
240def report_cb(%{protocol: protocol, port: port, address: address}) do
241  {"Started ~s server on ~s://~s:~B", [protocol, protocol, address, port]}
242end
243```
244
245While the second entry seems much more verbose, with proper handler, it can
246provide much more detailed output. Just imagine that we would have handler that
247output JSON data and what information we could contain in such message:
248
249```json
250{
251  "msg": "Started HTTP server on http://localhost:8080",
252  "metadata": {
253    "mfa": "MyMod.start/2",
254    "file": "foo.ex",
255    "line": 42
256  }
257}
258```
259
260Now our log aggregation service need to parse `msg` field to extract all
261information that is contained there, like port, address, and protocol. With
262structured logging we can have that message available already there while
263presenting the "human readable" form as well:
264
265```json
266{
267  "text": "Started HTTP server on http://localhost:8080",
268  "msg": {
269    "address": "localhost",
270    "port": 8080,
271    "protocol": "http",
272    "endpoint": "MyEndpoint",
273    "handler": "Plug.Cowboy"
274  },
275  "metadata": {
276    "mfa": "MyMod.start/2",
277    "file": "foo.ex",
278    "line": 42
279  }
280}
281```
282
283You can see there that we can have more information available in the structured
284log that would otherwise needed to be crammed somewhere into the text message,
285even if it is not important in "regular" Ops observability.
286
287This can raise a question - why not use metadata for such functionality, like it
288is available in [`LoggerJSON`][] or [`Ink`][]? The reason is that their reason
289existence is different. Metadata meant for "meta" stuff like location, tracing
290ID, but not for the information about the message itself. It is best shown on
291example. For this use Elixir's implementation of `GenServer` wrapper that
292produces error log entry on unknown message handled by default `handle_info/2`:
293
294```elixir
295Logger.error(
296  # Report
297  %{
298    label: {GenServer, :no_handle_info},
299    report: %{
300      module: __MODULE__,
301      message: msg,
302      name: proc
303    }
304  },
305  # Metadata
306  %{
307    error_logger: %{tag: :error_msg},
308    report_cb: &GenServer.format_report/1
309  }
310)
311```
312
313As we can see there, the report contains informations like:
314
315- `:label` - that describes type of the event
316- `:report` - content of the "main" event
317  - `:module` - module that created the event, it is important to notice, that
318    it is also present in metadata (as part of `:mfa` key), but their meaning is
319    different. Module name here is meant for the operator to know the name of
320    the implementor that failed to handle message, while `:mfa` is meant to
321    describe the location of the code that fired the event.
322  - `:message` - the message itself that hasn't been handled. Notice, that it is
323    not stringified in any way there, it is simply passed "as is" to the
324    report. It is meant to be stringified later by the `:report_cb` function.
325  - `:name` - name of the process. Remember, similarly to `:module`, the PID of
326    the current process is part of the metadata, so in theory we could use value
327    from there, but their meaning is different (additionally this one may be an
328    atom in case if the process is locally registered with name).
329
330Metadata on the other hand contains information that will be useful for
331filtering or formatting of the event.
332
333The rule of thumb you can follow is:
334
335> If it is thing that you will want to filter on, then it probably should be
336> part of the metadata. If you want to aggregate information or just display
337> them, it should be part of the message report.
338
339## Log filtering
340
341Finally we come to first feature that is not directly accessible from the Elixir
342`Logger` API (yet). Erlang's `logger` have powerful functionality for filtering
343log messages which allows us to dynamically decide which message should, or
344should not be logged. These even can alter messages on the fly.
345
346Currently that functionality is available only via `:logger` module. It can be
347used like:
348
349```elixir
350defmodule MyFilter do
351  def filter(log_event, opts) do
352    # …
353  end
354end
355
356:logger.add_primary_filter(:my_filter, {&MyFilter.filter/2, opts})
357# Or
358:logger.add_handler_filter(handler_id, :my_filter, {&MyFilter.filter/2, opts})
359```
360
361Few important things that need to be remembered when writing such filters:
362
363- It is best practice to make such functions public and define filters using
364  remote function capture, like `&__MODULE__.process_disabled/2` (so not
365  anonymous functions either). It will make such filter much easier for VM to
366  handle (it is bigger topic why it is that, I may to cover it in another post).
367- Filters are ran **within the same process that fired log event**, so it is
368  important to make such filters as fast as possible, and do not do any heavy
369  work there.
370
371Filters can be used for 2 different things:
372
373- preventing some messages from being logged
374- modifying a message
375
376While the former is much more common, I will try to describe both use cases
377there, as the latter is also quite useful.
378
379Filters are defined as 2-ary functions where 1st argument is log event, and
380second argument is any term that can be used as a configuration for filter.
381Filter should return one of these 3 values:
382
383- `:stop` - which will immediately discard message and do not run any additional
384  filters.
385- `:ignore` - which mean that given filter didn't recognise the given message
386  and leaves it up to other filters to decide on the action. If all filters
387  return `:ignore` then `:filter_default` option for the handler will be taken.
388  By default it is `:log`, which mean that message will be logged, but default
389  handler has it set to `:stop` by default, which mean, that non-matching
390  messages will be discarded.
391- Just log event (possibly modified) that will cause next filter to be called
392  with altered message. The message returned by the last filter (or in case of
393  `:ignore` return, previous filters) will be the message passed to handler.
394
395### Preventing some messages from being logged
396
397Most common use-case for filters will probably be rejecting messages that aren't
398important for us. [Erlang even prepared some useful filters][logger_filters]:
399
400- `domain` - allow filtering by metadata `:domain` field (remember as I said
401  that metadata is for filtering?). It supports multiple possible relations
402  between the log domain and defined domain.
403- `level` - allow filtering (in or out) messages depending on their level, in
404  both directions. It will allow you to filter messages with higher level for
405  some handlers. Just remember, that it will not receive messages that will not
406  pass primary/module level.
407- `progress` - filters all reports from `supervisor` and
408  `application_controller`. Simply, reduces startup/process shutdown chatter
409  that often is meaningless for most time.
410- `remote_gl` - filters messages coming from group leader on another node.
411  Useful when you want to discard/log messages coming from other nodes in
412  cluster.
413
414### Modifying a message
415
416Sometimes there is need to alter messages in the system. For example we may need
417to prevent sensitive information from being logged. When using "old" Elixir
418approach you could abuse translators, but that was error prone, as first
419successful translator was breaking pipeline, so you couldn't just smash one on
420top and then keep rest working as is. With "new" approach and structured logging
421you can just traverse the report and replace all occurrences of the unsafe data
422with anonymised data. For example:
423
424```elixir
425def filter_out_password(%{msg: {:report, report}} = event, _opts) do
426  %{event | msg: {:report, replace(report)}}
427end
428
429@filtered "[FILTERED]"
430
431defp replace(%{password: _} = map) do
432  for {k, v} <- %{map | password: @filtered}, into: %{} do
433    {k, replace(v)}
434  end
435end
436
437defp replace(%{"password" => _} = map) do
438  for {k, v} <- %{map | "password" => @filtered}, into: %{} do
439    {k, replace(v)}
440  end
441end
442
443defp replace(list) when is_list(list) do
444  for elem <- list do
445    case elem do
446      {:password, _} -> {:password, @filtered}
447      {"password", _} -> {"password", @filtered}
448      {k, v} -> {k, replace(v)}
449      other -> replace(other)
450    end
451  end
452end
453
454defp replace(other), do: other
455```
456
457This snippet will replace all occurrences of `:password` or `"password"` with
458filtered out value.
459
460The disadvantage of such approach - it will make all messages with such fields
461allowed in case if your filter has `:filter_default` set to `:stop`. That mean,
462that if you want to make some of them rejected anyway, then you will need to
463manually add additional step to reject messages that do not fit into your
464patterns. Alternatively you can use `filter_default: :log` and then use opt-out
465logging. There currently is no way to alter the message and make other filters
466decide whether log it or not (as of OTP 24).
467
468## Summary
469
470New features and possibilities with relation to logging in Elixir 1.11 can be
471overwhelming. Fortunately all of the new features are optional and provided in
472addition to "good 'ol `Logger.info("logging")`". But for the people who works on
473the observability in BEAM (EEF Observability WG, Sentry, Logflare, etc.) it
474brings a lot of new powerful capabilities.
475
476I am thrilled to see what will people create using all that power.
477
478[erl-log]: https://erlang.org/doc/man/logger.html
479[syslog]: https://en.wikipedia.org/wiki/Syslog#Severity_level
480[`LoggerJSON`]: https://github.com/Nebo15/logger_json
481[`Ink`]: https://hex.pm/packages/ink
482[logger_filters]: https://erlang.org/doc/man/logger_filters.html
483[logger-purge]: https://hexdocs.pm/logger/Logger.html#module-application-configuration
484[formatter_cb]: https://erlang.org/doc/man/logger.html#formatter-callback-functions