content/post/log-all-the-things.md at dfe202005f8b566d1ed849a3cde371aa074a6fa4 · hauleth.dev/blog

hauleth.dev / blog
this repo has no description
blog / content / post / log-all-the-things.md
at dfe202005f8b566d1ed849a3cde371aa074a6fa4 18 kB view raw view rendered
  1+++
  2title = "Log all the things"
  3date = 2022-01-02
  4
  5[taxonomies]
  6tags = [
  7  "elixir",
  8  "programming",
  9  "observability"
 10]
 11
 12[[extra.thanks]]
 13name = "Kai Wern Choong"
 14+++
 15
 16In Elixir 1.11 landed set of new features that allows for more powerful logging
 17by utilising Erlang's [`logger`][erl-log] features. Here I will try to describe
 18new possibilities and how You can use them to improve your logs.
 19
 20<!-- more -->
 21
 22## New log levels {#levels}
 23
 24Elixir gained 4 new log levels to total 8 (from most verbose to least verbose):
 25
 26- debug
 27- info
 28- **notice** *
 29- warning (renamed from warn)
 30- error
 31- **critical** *
 32- **alert** *
 33- **emergency** *
 34
 35<small>* new levels</small>
 36
 37This allow to provide finer graded verbosity control, due to compatibility
 38reasons, in Elixir backends we need to translate these levels back to "old" set
 39of 4. The current table looks like:
 40
 41| Call level            | What Elixir backend will see |
 42| --                    | --                           |
 43| `debug`               | `debug`                      |
 44| `info`                | `info`                       |
 45| `notice`              | **`info`** *                 |
 46| `warning` (or `warn`) | `warn`                       |
 47| `error`               | `error`                      |
 48| `critical`            | **`error`** *                |
 49| `alert`               | **`error`** *                |
 50| `emergency`           | **`error`** *                |
 51
 52<small>* "translated" messages</small>
 53
 54We can set verbosity to all levels. This may be confusing during the transition
 55period, but we cannot change the behaviour until Elixir 2 (which is not
 56happening any time soon).
 57
 58Usage of the new levels is "obvious":
 59
 60```elixir
 61Logger.notice("Hello")
 62```
 63
 64Will produce message with `notice` level of verbosity.
 65
 66Additionally the `logger.level` option in configuration supports 2 additional
 67verbosity levels that you can use in your config:
 68
 69- `:all` - all messages will be logged, logically exactly the same as `:debug`
 70- `:none` - no messages will be logged
 71
 72## Per module log level {#per-module-level}
 73
 74This is change that can be quite handy during debugging sessions. With this
 75change we have 4 new functions in `Logger` module:
 76
 77- [`get_module_level/1`](https://hexdocs.pm/logger/Logger.html#get_module_level/1)
 78- [`put_module_level/2`](https://hexdocs.pm/logger/Logger.html#put_module_level/2)
 79- [`delete_module_level/1`](https://hexdocs.pm/logger/Logger.html#delete_module_level/1)
 80- [`delete_all_module_level/0`](https://hexdocs.pm/logger/Logger.html#delete_all_module_level/0)
 81
 82These allow us to manipulate verbosity level on per-module basis. What is
 83non-obvious and is super handy is that it allows both lowering **and raising**
 84verbosity for given module. This mean that:
 85
 86```elixir
 87require Logger
 88
 89Logger.configure(level: :error)
 90
 91defmodule Foo do
 92  def run do
 93    Logger.debug("I am still there")
 94  end
 95end
 96
 97Foo.run() # Does not log anything
 98
 99# Set `debug` level for `Foo` module only
100Logger.put_module_level(Foo, :debug)
101Foo.run()
102# `I am still there` is logged
103Logger.debug("I will not be printed")
104# Nothing got logged as top-level verbositi is still set to `:error`
105```
106
107Of course it will not work if you decide to use [compile time purging][logger-purge]
108
109## Logger handlers {#handlers}
110
111---
112
113**Warning!** This is not fully implemented in both - Erlang and Elixir. Writing
114your own handlers without additional knowledge can cause overload problems.
115
116---
117
118Erlang together with their logging implementation needed to provide a way to
119ingest these logs somehow. This is done via Erlang logger handlers (in this
120article called *handlers* in contrast to Elixir backends called *backends*
121there).
122
123Handlers are modules that export at least 1 function `log/2` that takes 2
124arguments:
125
126- `log_event` which is a map with 3 fields:
127  - `:level` - verbosity level
128  - `:msg` - tuple describing message:
129    - `{:io.format(), [term()]}` - format string and list of terms that should
130      be passed to `:io_lib.format/2` function
131    - `{:report, map() | keyword()}` - report that can be formatted into string
132      by `report_cb/{1,2}` set in metadata map (see below)
133    - `{:string, :unicode.chardata()}` - raw string that should be printed as
134      a message
135  - `:meta` - map containing all metadata for given event. All keys should be
136    atoms and values can be anything. Some keys have special meaning, and some
137    of them will be populated automatically by the `Logger` macros and functions.
138    These are:
139    - `:pid` - PID of the process that fired log event
140    - `:gl` - group leader of the process that fired log event
141    - `:mfa` - tuple in form of `{module(), name :: atom(), arity :: non_neg_integer()}`
142      that describe function that fired log event
143    - `:file` - filename of file that defines the code that fired log event
144    - `:line` - line in the given file where the log event was fired
145    - `:domain` - list of atoms that can be used to describe log events
146      hierarchy which then can be used for filtering. All events fired using
147      `Logger` macros and functions will have `:elixir` prepended to their
148      domain list.
149    - `:report_cb` - function that will be used to format `{:report, map() |
150      keyword()}` messages. This can be either 1-ary function, that takes report
151      and returns `{:io.format(), [term()]}` leaving truncation and further
152      formatting up to the main formatter, or 2-ary function that takes report
153      and configuration map `%{depth: pos_integer() | :unlimited, chars_limit:
154      pos_integer() | :unlimited, single_line: boolean()}` and returns already
155      formatted `:unicode.chardata()`. More about it can be found in [separate
156      section](#structured-logging).
157
158Return value of this function is ignored. If there will be any exception raised
159when calling this function, then it will be captured and failing handler will be
160removed. This is important, as if such handler is the only one, then you can be
161left without any logging handler and miss logs.
162
163The important thing about Erlang handlers and Elixir backends is that Erlang
164handlers functions are called **within caller process** while Elixir backends
165are called in separate process. This mean that wrongly written Erlang handler
166can cause quite substantial load on application.
167
168To read on other, optional, callbacks that can be defined by Erlang handler, that
169will not be covered there, I suggest looking into [Erlang documentation][formatter_cb].
170
171## Structured logging {#structured-logging}
172
173One of the biggest new features in the Elixir 1.11 is support for structured
174logging. This mean that the log message do not need to be free-form string, but
175instead we can pass structure, that can provide more machine-readable data for
176processing in log aggregators. In Elixir 1.11 is simple as passing map as a
177first argument to the `Logger` macros:
178
179```elixir
180Logger.info(%{
181  status: :completed,
182  response: :ok
183})
184```
185
186This will produce message that looks like:
187
188```log
18914:08:46.849 [info]  [response: :ok, status: :completed]
190```
191
192As we can see, the map (called *report*) is formatted as a keyword list. This is
193default way to present the report data. Unfortunately we cannot access the
194metadata from the Elixir backends, but we have 2 ways to make these messages
195more readable for the human operator:
196
1971. Utilise [`Logger`'s translators](https://hexdocs.pm/logger/Logger.Translator.html)
1981. Using `:report_cb` field in metadata
199
2001st option is described quite good in Elixir documentation and is available
201since Elixir 1.0 as it was used to translate `error_logger` messages in old
202Erlang versions. Here I will describe the 2nd option which provide way for
203**caller** to define how report should be formatted into human-readable string.
204
205`:report_cb` accepts 2 kind of functions as an argument:
206
207- 1-ary function, that takes report as an argument and should return tuple
208  in form of `{:io.format(), [term()]}` that will be later formatted
209  respectively by the formatters.
210- 2-ary function that takes report and configuration map as an arguments and
211  should return formatted string.
212
2131st option is much easier for most use cases, as it do not force you to worry
214about handling width, depth, and multiline logs, as it will all be handled for
215you.
216
217For example, instead of doing:
218
219```elixir
220Logger.info("Started HTTP server on http://localhost:8080")
221```
222
223We can do:
224
225```elixir
226Logger.info(
227  %{
228    protocol: :http,
229    port: 8080,
230    address: "localhost",
231    endpoint: MyEndpoint,
232    handler: Plug.Cowboy
233  },
234  report_cb: &__MODULE__.report_cb/1
235)
236
237# …
238
239def report_cb(%{protocol: protocol, port: port, address: address}) do
240  {"Started ~s server on ~s://~s:~B", [protocol, protocol, address, port]}
241end
242```
243
244While the second entry seems much more verbose, with proper handler, it can
245provide much more detailed output. Just imagine that we would have handler that
246output JSON data and what information we could contain in such message:
247
248```json
249{
250  "msg": "Started HTTP server on http://localhost:8080",
251  "metadata": {
252    "mfa": "MyMod.start/2",
253    "file": "foo.ex",
254    "line": 42
255  }
256}
257```
258
259Now our log aggregation service need to parse `msg` field to extract all
260information that is contained there, like port, address, and protocol. With
261structured logging we can have that message available already there while
262presenting the "human readable" form as well:
263
264```json
265{
266  "text": "Started HTTP server on http://localhost:8080",
267  "msg": {
268    "address": "localhost",
269    "port": 8080,
270    "protocol": "http",
271    "endpoint": "MyEndpoint",
272    "handler": "Plug.Cowboy"
273  },
274  "metadata": {
275    "mfa": "MyMod.start/2",
276    "file": "foo.ex",
277    "line": 42
278  }
279}
280```
281
282You can see there that we can have more information available in the structured
283log that would otherwise needed to be crammed somewhere into the text message,
284even if it is not important in "regular" Ops observability.
285
286This can raise a question - why not use metadata for such functionality, like it
287is available in [`LoggerJSON`][] or [`Ink`][]? The reason is that their reason
288existence is different. Metadata meant for "meta" stuff like location, tracing
289ID, but not for the information about the message itself. It is best shown on
290example. For this use Elixir's implementation of `GenServer` wrapper that
291produces error log entry on unknown message handled by default `handle_info/2`:
292
293```elixir
294Logger.error(
295  # Report
296  %{
297    label: {GenServer, :no_handle_info},
298    report: %{
299      module: __MODULE__,
300      message: msg,
301      name: proc
302    }
303  },
304  # Metadata
305  %{
306    error_logger: %{tag: :error_msg},
307    report_cb: &GenServer.format_report/1
308  }
309)
310```
311
312As we can see there, the report contains informations like:
313
314- `:label` - that describes type of the event
315- `:report` - content of the "main" event
316  - `:module` - module that created the event, it is important to notice, that
317    it is also present in metadata (as part of `:mfa` key), but their meaning is
318    different. Module name here is meant for the operator to know the name of
319    the implementor that failed to handle message, while `:mfa` is meant to
320    describe the location of the code that fired the event.
321  - `:message` - the message itself that hasn't been handled. Notice, that it is
322    not stringified in any way there, it is simply passed "as is" to the
323    report. It is meant to be stringified later by the `:report_cb` function.
324  - `:name` - name of the process. Remember, similarly to `:module`, the PID of
325    the current process is part of the metadata, so in theory we could use value
326    from there, but their meaning is different (additionally this one may be an
327    atom in case if the process is locally registered with name).
328
329Metadata on the other hand contains information that will be useful for
330filtering or formatting of the event.
331
332The rule of thumb you can follow is:
333
334> If it is thing that you will want to filter on, then it probably should be
335> part of the metadata. If you want to aggregate information or just display
336> them, it should be part of the message report.
337
338## Log filtering
339
340Finally we come to first feature that is not directly accessible from the Elixir
341`Logger` API (yet). Erlang's `logger` have powerful functionality for filtering
342log messages which allows us to dynamically decide which message should, or
343should not be logged. These even can alter messages on the fly.
344
345Currently that functionality is available only via `:logger` module. It can be
346used like:
347
348```elixir
349defmodule MyFilter do
350  def filter(log_event, opts) do
351    # …
352  end
353end
354
355:logger.add_primary_filter(:my_filter, {&MyFilter.filter/2, opts})
356# Or
357:logger.add_handler_filter(handler_id, :my_filter, {&MyFilter.filter/2, opts})
358```
359
360Few important things that need to be remembered when writing such filters:
361
362- It is best practice to make such functions public and define filters using
363  remote function capture, like `&__MODULE__.process_disabled/2` (so not
364  anonymous functions either). It will make such filter much easier for VM to
365  handle (it is bigger topic why it is that, I may to cover it in another post).
366- Filters are ran **within the same process that fired log event**, so it is
367  important to make such filters as fast as possible, and do not do any heavy
368  work there.
369
370Filters can be used for 2 different things:
371
372- preventing some messages from being logged
373- modifying a message
374
375While the former is much more common, I will try to describe both use cases
376there, as the latter is also quite useful.
377
378Filters are defined as 2-ary functions where 1st argument is log event, and
379second argument is any term that can be used as a configuration for filter.
380Filter should return one of these 3 values:
381
382- `:stop` - which will immediately discard message and do not run any additional
383  filters.
384- `:ignore` - which mean that given filter didn't recognise the given message
385  and leaves it up to other filters to decide on the action. If all filters
386  return `:ignore` then `:filter_default` option for the handler will be taken.
387  By default it is `:log`, which mean that message will be logged, but default
388  handler has it set to `:stop` by default, which mean, that non-matching
389  messages will be discarded.
390- Just log event (possibly modified) that will cause next filter to be called
391  with altered message. The message returned by the last filter (or in case of
392  `:ignore` return, previous filters) will be the message passed to handler.
393
394### Preventing some messages from being logged
395
396Most common use-case for filters will probably be rejecting messages that aren't
397important for us. [Erlang even prepared some useful filters][logger_filters]:
398
399- `domain` - allow filtering by metadata `:domain` field (remember as I said
400  that metadata is for filtering?). It supports multiple possible relations
401  between the log domain and defined domain.
402- `level` - allow filtering (in or out) messages depending on their level, in
403  both directions. It will allow you to filter messages with higher level for
404  some handlers. Just remember, that it will not receive messages that will not
405  pass primary/module level.
406- `progress` - filters all reports from `supervisor` and
407  `application_controller`. Simply, reduces startup/process shutdown chatter
408  that often is meaningless for most time.
409- `remote_gl` - filters messages coming from group leader on another node.
410  Useful when you want to discard/log messages coming from other nodes in
411  cluster.
412
413### Modifying a message
414
415Sometimes there is need to alter messages in the system. For example we may need
416to prevent sensitive information from being logged. When using "old" Elixir
417approach you could abuse translators, but that was error prone, as first
418successful translator was breaking pipeline, so you couldn't just smash one on
419top and then keep rest working as is. With "new" approach and structured logging
420you can just traverse the report and replace all occurrences of the unsafe data
421with anonymised data. For example:
422
423```elixir
424def filter_out_password(%{msg: {:report, report}} = event, _opts) do
425  %{event | msg: {:report, replace(report)}}
426end
427
428@filtered "[FILTERED]"
429
430defp replace(%{password: _} = map) do
431  for {k, v} <- %{map | password: @filtered}, into: %{} do
432    {k, replace(v)}
433  end
434end
435
436defp replace(%{"password" => _} = map) do
437  for {k, v} <- %{map | "password" => @filtered}, into: %{} do
438    {k, replace(v)}
439  end
440end
441
442defp replace(list) when is_list(list) do
443  for elem <- list do
444    case elem do
445      {:password, _} -> {:password, @filtered}
446      {"password", _} -> {"password", @filtered}
447      {k, v} -> {k, replace(v)}
448      other -> replace(other)
449    end
450  end
451end
452
453defp replace(other), do: other
454```
455
456This snippet will replace all occurrences of `:password` or `"password"` with
457filtered out value.
458
459The disadvantage of such approach - it will make all messages with such fields
460allowed in case if your filter has `:filter_default` set to `:stop`. That mean,
461that if you want to make some of them rejected anyway, then you will need to
462manually add additional step to reject messages that do not fit into your
463patterns. Alternatively you can use `filter_default: :log` and then use opt-out
464logging. There currently is no way to alter the message and make other filters
465decide whether log it or not (as of OTP 24).
466
467## Summary
468
469New features and possibilities with relation to logging in Elixir 1.11 can be
470overwhelming. Fortunately all of the new features are optional and provided in
471addition to "good 'ol `Logger.info("logging")`". But for the people who works on
472the observability in BEAM (EEF Observability WG, Sentry, Logflare, etc.) it
473brings a lot of new powerful capabilities.
474
475I am thrilled to see what will people create using all that power.
476
477[erl-log]: https://erlang.org/doc/man/logger.html
478[syslog]: https://en.wikipedia.org/wiki/Syslog#Severity_level
479[`LoggerJSON`]: https://github.com/Nebo15/logger_json
480[`Ink`]: https://hex.pm/packages/ink
481[logger_filters]: https://erlang.org/doc/man/logger_filters.html
482[logger-purge]: https://hexdocs.pm/logger/Logger.html#module-application-configuration
483[formatter_cb]: https://erlang.org/doc/man/logger.html#formatter-callback-functions