HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Monitoring checklist - What things should I be monitoring?

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
thingswhatchecklistmonitoringshould

Problem

We are building a (Zabbix-based) monitoring system for our applications; hovewer, I'm having difficulties in defining what to monitor?

I have so far come up with the following general categories:

  • hardware data: cpu, ram, swap, etc.



  • middleware data: perfomance/health for MySQL instantces, Tomcat instances, JVMs, etc.



  • logical or application data: the current status/health of the system, e.g. number of active users, page request, etc.



  • kpi data: data for business, e.g. user registration over time.



  • dashboard: quick overview of the system (e.g. microservices are running or not).



Are there any other fundamental categories for to monitor? Or is there another category system to use?

UPDATE: the purpose of the monitoring is

  • the see if the system functions correctly (at high-level, e.g. no services are down, etc. - much like a smoke-test)



  • see, if there are any indicators, that the system is likely to crash (e.g. historical data predicts that we will run out of disk space)



  • if any of these occur, send a warning to the appropriate staff (e.g. via e-mail)



UPDATE: the complexity of our system does not demand an extra application for reporting (e.g. monitoring KPIs); also, we are running in local/local cloud infrastructure, so the cost of the application is not (that)relevant - but it might be someday :-)

Solution

I like this video: GOTO 2016 • Monitoring Microservices • Tom Wilkie

One of the key ideas (for me at least) is to realize the difference between host monitoring and application monitoring. Basically host monitoring tells you that something is fatally wrong now, but application monitoring should be able to predict problems by detecting higher error rate or that requests are taking longer time so you can fix problems before your users notice them.

(I'm not affiliated with weaveworks or the goto conference in any way, I just like the content and think there are some interesting ideas. Use the downvote button to let me know that this answer is not good :) )

Context

StackExchange DevOps Q#1516, answer score: 5

Revisions (0)

No revisions yet.