2015-06-17

The entropy of logging

A couple of weeks ago, my colleague António Vieira made an internal presentation called the “The entropy of logging”. In this post I share the main points of the presentation and my thoughts on the subject.

Logging

"Event logs record events taking place in the execution of a system in order to provide an audit trail that can be used to understand the activity of the system and to diagnose problems. They are essential to understand the activities of complex systems, particularly in the case of applications with little user interaction (such as server applications)." - Wikipedia
We all know what logging is but how to do it right?
 Some guidelines might help…

 Relevant

Captain's Log, Stardate 43125.8. We have entered a spectacular binary star system in the Kavis Alpha sector on a most critical mission of astrophysical research. Our eminent guest, Dr. Paul Stubbs, will attempt to study the decay of neutronium expelled at relativistic speeds from a massive stellar explosion which will occur here in a matter of hours.

Irrelevant

Captain's Log, Stardate 43125.8. Today I had baked beans for lunch. Lt Commander Data didn’t understand my irony. Lt Commander La Forge has the feeling unwell, probably the galactic flu.

Levels


7 guidelines

  1. Log relevant events
  2. Log the unpredictable
  3. 1 event = 1 log line
  4. Log the boundaries of the application
  5. Consistent and parsable
  6. End-to-end traceability
  7. Protect privacy

Should libraries include logging?

No! Logging is an application level concern
Libraries should expose all the relevant state as return values, exceptions or event listeners.
If you really need events on your library keep it very simple and plugable.

My (Sergio) conclusion

Logging is something that many of us, professional developers, take for granted. In every project I've worked, as soon as deploy an application to an environment, things start to fail (which is expected), but it’s rare that the existing log is enough to quickly debug the problem. Most often, critical information is missing, and the existing log entries are mostly just clutter.

Based on my experience, all of these guidelines are valid. The ones I hadn't applied in the past were point 3, as well as point 5 and 7, but simply because they didn't apply to my application's domain.

Having one log entry per event may seem strange at first because it seems a strong constrain, but I've recently have been applying it, and it has actually made the code cleaner. Basically, each domain event will produce a single log entry in case of success (INFO), or in the case of failure (ERROR)To me it feels the correct approach, particularly for a Domain Driven Design approach (although I haven't been using it currently), as you should have one log entry per domain event. Having that said, this shouldn't be applied too literally: it's still OK to have a random log entry at the DEBUG level, just not at INFO and ERROR (you are filtering DEBUG entries in production, right? :))

By applying the single log entry guideline, you'll probably have to log a lot of information, which leads to the parsable guideline. In our current development, we've found that the best approach is to log it as JSON. We are not constraining the event to an actual schema, although we try to be consistent. In practise, this means we are using Json.NET to serialize a dynamic object (in .NET), or Jackson to serialize an HashMap object. Log parsing tools (such as logstash) should have any problem with this.

Regarding privacy, we have been very careful. We try to use SecureString (in .NET) or char[] (in Java) types for holding passwords in memory, but for the remaining data the best approach is to remove sensitive fields from serialization (preferably in a declaratively form, to be clear about this in the code). When the sensitive information is relevant for logging issues, the standard approach is to log the first N characters of the text, and add place-holders (such as the asterisk character) to the remaining text.

Finally, regarding on the point of not logging on libraries, I can't say I totally agree with António. I mean, I agree if the library is a thin layer, such as in an infrastructure layer library; but this may harder to implement if the library is complex, such as a domain layer library (as I have implemented in the past). What I can add is that we should avoid logging in libraries if we possible; if not,we should use a logging framework facade, such as CommonLogging in .NET or SLF4J in Java, to reduce the change of friction.

Thanks to António Vieira for the interesting discussion and allowing me to share some of the content of his presentation.

No comments:

Post a Comment