Monit

  • https://linux.die.net/man/1/monit
  • https://mmonit.com/wiki/Monit/ConfigurationExamples

Install

sudo apt install monit

Optionally install postfix

sudo apt install postfix mailutils

Config

Mail notifications

Set monit to send to root user:

mailserver 127.0.0.1
set alert root@localhost
  but not on { instance }

set mail-format {
  from:    root@$HOST
  subject:  [$SERVICE] $DESCRIPTION
  message: Alert for $SERVICE
}

Unix socket

Control socket is used to check state

set httpd unixsocket /var/run/monit.sock
  allow user:pass

This allows status checks

sudo monit status
sudo monit summary

Minimal Config

Example config file, needing only service checks.

/etc/monit/monitrc

set daemon 60
  with start delay 30
set log /var/log/monit.log
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set mailserver 127.0.0.1
set eventqueue
    basedir /var/lib/monit/events
    slots 250
set alert root@localhost
  but not on { instance, action }
set httpd unixsocket /var/run/monit.sock
  allow user:pass

set mail-format {
  from:    root@$HOST
  subject:  [$SERVICE] $DESCRIPTION
  message: Alert for $SERVICE
Date:        $DATE
Action:      $ACTION
Host:        $HOST
Description: $DESCRIPTION
}

include /etc/monit/monitrc.d/*

Alert example configurations

Each component or service is placed in a separate file in the monitrc.d directory to make the configuration more module.

System and Disk Check

Basic system params:

/etc/monit/monitrc.d/sys

check system $HOST
  if loadavg (1min) per core > 2 for 5 cycles then alert
  if loadavg (5min) per core > 1.5 for 10 cycles then alert
  if cpu usage > 95% for 5 cycles then alert
  if memory usage > 90% then alert
  if swap usage > 50% then alert

check device root with path /
  if space usage > 90% then alert
  if inode usage > 90% then alert
  if changed fsflags then alert
  if service time > 250 milliseconds for 5 cycles then alert
  if read rate > 500 operations/s for 5 cycles then alert
  if write rate > 200 operations/s for 5 cycles then alert

Network Interface check

Network status, and usage/throughput info:

/etc/monit/monitrc.d/network

check network public with interface eth0
  if failed link then alert
  if changed link then alert
  if saturation > 90% then alert
  if download > 10 MB/s then alert
  if total uploaded > 1 GB in last hour then alert

Network Reachability check

Test if the host is able to ping an internet host:

/etc/monit/monitrc.d/network

check host REACHABILITY with address 1.1.1.1
  if failed ping with timeout 10 seconds then alert

SSH server check

OpenSSH service status:

/etc/monit/monitrc.d/sshd

check process sshd with pidfile /var/run/sshd.pid
  start program = "/usr/bin/systemctl start sshd"
  stop program  = "/usr/bin/systemctl stop sshd"
  if failed port 22 protocol ssh then restart

Nginx check

Nginx status, including an HTTP probe:

/etc/monit/monitrc.d/nginx

check process nginx with pidfile /var/run/nginx.pid
  start program = "/usr/bin/systemctl start nginx"
  stop program  = "/usr/bin/systemctl stop nginx"
  if failed port 80 protocol  http  request "/healthz" for 2 cycles then restart
  if failed port 443 protocol https request "/healthz" for 2 cycles then restart

PHP-FPM check

Check if the PHP daemon is running, and the socket is functional:

/etc/monit/monitrc.d/php-fpm

check process php-fpm with pidfile /var/run/php/php7.4-fpm.pid
  start program = "/usr/bin/systemctl start php7.4-fpm"
  stop program  = "/usr/bin/systemctl stop  php7.4-fpm"
  if failed unixsocket /var/run/php/php7.4-fpm.sock for 2 cycles then restart
  if cpu > 60% for 2 cycles then alert
  if cpu > 90% for 5 cycles then restart
  if memory usage > 1024 MB for 2 cycles then alert
  if memory usage > 8192 MB for 5 cycles then restart

Mysql and Mariadb

Check if the process is running, and using a regular amount of system resources:

/etc/monit/monitrc.d/mysqld

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
  start program = "/usr/bin/systemctl start mysqld"
  stop program  = "/usr/bin/systemctl stop  mysqld"
  if cpu > 60% for 2 cycles then alert
  if cpu > 90% for 5 cycles then restart
  if memory usage > 1024 MB for 2 cycles then alert
  if memory usage > 8192 MB for 5 cycles then restart

DNS & DHCP server checks

Bind9 status:

/etc/monit/monitrc.d/named

check process bind9 with pidfile /var/run/named/named.pid
  start program = "/usr/bin/systemctl start bind9"
  stop program  = "/usr/bin/systemctl stop  bind9"
  if failed host 127.0.0.1 
    port 53 type udp for 2 cycles then restart

DHCP server status:

/etc/monit/monitrc.d/dhcpd

check process dhcpd with pidfile /var/run/dhcpd.pid
  start program = "/usr/bin/systemctl start isc-dhcp-server"
  stop program  = "/usr/bin/systemctl stop  isc-dhcp-server"

Hardware checks

Install the sensor reading utilities:

sudo apt install smartmontools lm-sensors hddtemp

Then, create a cpu checker script at /opt/proctemp.sh

#!/bin/bash
TEMP=`tr -d '000' < /sys/class/thermal/thermal_zone0/temp`
exit $TEMP

And, a disk checker script at /opt/disktemp.sh

#!/bin/bash
TEMP=$(/usr/sbin/hddtemp -n /dev/sda)
exit $TEMP

Note that both scripts output the temperature measurement as the exit code.

Then, the monit checks can read the code and decide how to handle the event based on the temperature measurements:

/etc/monit/monitrc.d/hw

check program PROCTEMP with path "/opt/proctemp.sh"
  every 5 cycles
  if status > 75 then alert

check program DISKTEMP with path "/opt/disktemp.sh"
  every 5 cycles
  if status > 60 then alert

check program DISKHEALTH with path "/usr/sbin/smartctl -H /dev/sda"
  every 5 cycles
  if status > 0 then alert