vsupalov

Match and Handle Date/Time Formats in Td-Agent or Fluentd

January 9, 2016

When handling your log files with either td-agent or fluentd, it’s sometimes not enough to rely on the built-in formats provided by them. See the ‘format (required)’ section here for a complete list. But what to do if you are working with logs which do not fit those common patterns? Do you have to switch to using something like json or just skip structuring/parsing the data with the none format?

Behind most built-in supported log formats, such as apache, apache2, apache_error, nginx or syslog, are a combination of a regular expression and a clarification on how to parse the time stamp part of each log line. The corresponding configuration lines of a source entry are format and time_format. When you need a little more flexibility, for example when parsing default Golang logs or an output of some fancier logging library, you can help fluentd or td-agent to handle those as usually. Here is what a source block using those two fields looks like:

<source>
  type tail
  format /^(?<time>[^ ]* [^ ]*) (?<message>.*)$/
  time_format %Y/%m/%d %H:%M:%S
  path /var/log/upstart/my-service.log
  pos_file /var/log/td-agent/my-service.log.pos
  tag my-service
</source>

This source block, when put into a td-agent.conf will handle default Golang-style logs emitted by the my-service upstart-run job to be parsed properly. The time format is able to handle log lines similar to the following:

2016/01/09 14:21:24 Hello!

Here, “Hello!” will be the message, while the time stamp is obtained by parsing the part of the line matched by the time group of the regex, using the time_format. We expect the time to be in two chunks of non-whitespace character groups, separated by a whitespace. But how to find out more about the percent-prefixed letters? Those are a common time formatting notation, and among others used in the Python functions time.strptime and time.strftime to both parse and create string representations of date-time combinations. The corresponding documentation section describes their behavior perfectly, but to get a better overview with great examples check out strftime.org.

I’d like to conclude with a few examples which might save you some time when handling the time_format field. For once, here is a way to parse the more fancy go-json-rest library log format in the most simple case:

format /^(?<remoteaddress>[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*) - (?<remoteuser>.*) (?<time>[^ ]* [^ ]* [^ ]*) (?<message>.*)$/
time_format %d/%b/%Y:%H:%M:%S %z

The message could be further subdivided into relevant fields if needed.

There are three historically allowed time formats for the representation of date/time stamps in the context of HTTP applications, according to the W3C:

Sun, 06 Nov 1994 08:49:37 GMT  ; RFC 822, updated by RFC 1123
Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
Sun Nov  6 08:49:37 1994       ; ANSI C's asctime() format

To match those, the following time_format strings can be used:

time_format %a, %d %b %Y %H:%M:%S %Z
time_format %A, %d-%b-%y %H:%M:%S %Z
time_format %c

Have a good time getting the most out of your logs!

Join the mailing list!


Subscribe to get notified about future articles and stay in touch via email.

I write about Kubernetes, Docker, automation- and deployment topics, but would also like to keep you up to date about news around the business-side of things.

Privacy and your data: You can get more information about the usage of your data, the storage of your registration, sending out mails with the US-provider ConvertKit, statistical analysis of emails sent and your possibility to unsubscribe in my Privacy Policy.

I use the US-provider ConvertKit for email automation. By clicking to submit this form, you acknowledge that the information you provide will be transferred to ConvertKit for processing in accordance with their Privacy Policy and Terms.

We won't send you spam. Unsubscribe at any time. Powered by ConvertKit