HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

What is a good logging practice for distributed tasks?

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
loggingwhatpracticedistributedtasksforgood

Problem

I have the following setting:


Create multiple workers, do a computation and terminate them after the
computation is done.

So, every-time it’ll be a different instance running the task, so each host will have its own a log file, this will result in a huge list of files.

Is it a good practice? If not, what would be a better way for logging the task processing in this particular use-case?

PS: My infrastructure is serverless. So, for now, I am logging to (AWS)CloudWatch. But, please answer the question independently of AWS, and suiting a serverless setup as much as possible.

Solution

"Serverless" mostly just means you've got relatively simple microservices, generally just a little webapp or a single function that is automatically connected to a REST frontend. The same concepts apply as you would use for a more traditional web services: usually some mix of remote syslog and ElasticSearch writers.

Networked or remote syslog has been around for a long time and has a fairly robust set of tools around it. You would have to run the central syslog server(s) but the protocol is very simple and there are pure client libraries in every language that you can use for sending logs. One common problem with remote syslog is that it has traditionally been based around UDP. This means that under heavy load, some log messages may be lost. This could be a good thing, helping avoid a cascade overload, but it is something to be aware of. Some newer syslog daemons also support a TCP-based protocol, but client support is less unified so just do your research.

More recent but very popular is logging to ElasticSearch. This is mostly useful because of the Kibana dashboard and Logstash tooklit (often called ELK, ElasticSearch+Logstash+Kibana). Amazon even offers a hosted ElasticSearch option, making it somewhat easier to get started. ES uses a relatively simple REST API, so any language with an HTTP client (read: all of them) should be okay with logging to ES but make sure you are careful with blocking network operations in cases of partial system outages (i.e. make sure your app won't get stuck in a logging call that will never succeed and stop servicing user requests).

More complex logging topologies are bounded only by your imagination, though these days you'll see a lot of use of the Kafka database/queue/whatever-you-want-to-call-it as a nexus point in very complex log distribution systems.

On the "serverless" side, you'll generally want to integrate with these systems directly at the network level, so sending log data directly to syslog or ES from your service/function, rather than writing to local files (though maybe echo to those too for local debugging and development).

Context

StackExchange DevOps Q#404, answer score: 12

Revisions (0)

No revisions yet.