HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Monitor program progress on multiple servers

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
serversprogramprogressmultiplemonitor

Problem

We have three servers that are running python programs that are running data analysis tasks inside a tmux session. The method we are using at the moment is ssh'ing into each of them connecting the tmux session and watching the output on the command line.

This method is tedious, so what we are looking for is a solution that automates monitoring of program progress(output on CLI) for multiple servers at the same time. We would ideally like a web UI solution but a CLI would also be perfectly suitable.

Thank you for reading.

Solution

Any time you're running ad-hoc long-running commands, you should step back and rethink your process, because that should be automated, including error handling.

Rather than connecting in to the servers to see status, a better approach is to push that information out. You can do a wide variety of things if your want to write a bunch of custom code, but the simplest thing is probably to start sending the output through syslog to a centralized logging system (syslog itself, or ELK, or whatever). That way you can monitor everything from a central location.

Thinking forward, if this isn't a one-off task, monitoring should be automated. That is, you should never have to just watch logs to see if things are progressing as they're supposed to. Instead, you should assume they are (and continue with other work) until your alerting fires off. This is an investment of time into getting reliable and wide-coverage alerting, but as your systems grow in complexity, it will pay off as you don't have to monitor everything any time you change anything.

Context

StackExchange DevOps Q#2388, answer score: 8

Revisions (0)

No revisions yet.