HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to implement the immutable server pattern without loosing the ability to do post-mortems?

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
withoutthemortemsimplementpostimmutableloosinghowserverpattern

Problem

The immutable server pattern is a deployment discipline favouring the reproducibility of deployments. It is characterised by the fact that “a server that once deployed, is never modified, merely replaced with a new updated instance” and implementing this discipline requires an automation of server deployment. This automation has numerous operational advantages, one of the most important is allowing the quick and reliable replacement of failing instances in an infrastructure. This automation also implies that server deployment is described by versioned software artefacts and is subject to iterative improvements.

A popular aspect of implementations of this discipline is the removal of remote access methods to the server once it has been launched (esp. removing SSH access). Removing remote access is an easy way to ensure that the configuration of the server matches the configuration prepared by the deployment automation.

However, when investigating the causes of a software failure in a post-mortem, relying on structured monitoring is not always enough and remote access to the machine could be necessary. It is a common practical situation that server monitoring does not cover all of the failure sources, or that monitoring can be impaired by the server failure itself, which would likely be the case if the server runs out of memory or reaches its process limit.

How to implement the immutable server pattern without loosing the ability to do post-mortems?

Solution

First of all, removing ssh on an immutable server doesn't guarantee there'll be no change, it's more that as there should be no need to change something you reduce the attack surface by removing a remote access channel.

One way to keep a sort of post-mortem is log centralisation. There's a myriad of methods to achieve it, ELK stack, Splunk, syslog...

Another more crude way to keep a post mortem for an immutable server is to have a script on the shutdown process (an immutable server failing would be shutdown and a new one spin up to replace it) to gather a core dump of the program, a memory dump and send them to a remote system for analysis along with most of the logs.

Main advantage of this solution is that you get back only failing system information at time of problem, allowing to gather larger informations than getting them periodically.

It's hard to be more specific on how to achieve this, each distribution has some way to get things and I've no generic example.

Context

StackExchange DevOps Q#207, answer score: 9

Revisions (0)

No revisions yet.