By Alister West
A Systems Monitoring Solution
For monitoring multiple hosts there are several levels of monitoring/alerts both locally and external.
Monitor services for a host. Either user-space or root-space monitoring depending on the box is setup.
- user-space is left mostly to the user.
- ubic monitor for perl services. (this in turn is monitored by a cron script)
- custom background process monitors root services (crond, ftpd, apache,
mail (qmail), antivirus (clamd), spam (spamassissan), sshd, freespace, mount, load, memory, ntpd)
- also monitor user services installed in root space but run as different user:group (mysql, modperl, varnish, solr, memcached)
- restart if can't find. notify on problems. escalate as per config.
- snmp service for remote monitoring.
- simple snmp stats in realtime (diskstats, httpd-stat)
- services (apahce,qmail) dump info to file for quick lookup (when connection issues possible)
- run apache stats every 1min
- run mail stats every 5min
- run system processes dump into stats-HH-MM.out (1min kept till next write - 24hrs)
- monitor-box: monitors services with ping, telnet, etc. alerts/escalates on errors.
- mon sends emails to monitor machine which handles escalating/alerts/etc..
- for clustered boxes look at snmp ping (99% good enough to check host-up/down)
- machines send their syslogs to centralised server (network support is builtin to syslog-ng.conf)
- syslog-ng can also combine error-log of multiple web-apps into one log.
- ErrorLog "|/usr/bin/logger -p local6.info -t mysite" # apache ErrorLog
- graphing-box: custom multi-worker app uses snmp data from all hosts.
- host list sync'd from master server.
- pull snmp data from all hosts
- data stored in $data/$hosts/$service.rrd files.
- images generated from .rrd files.