YES! Netmar uses one of the most comprehensive network monitoring systems in place today.
Most network monitoring tools simply monitor a set of hosts by "pinging" them to make sure they are still up and running. The problem with that approach is that more often than not, a machine that has some kind of problem will usually stop serving web pages or email LONG BEFORE it stops running altogether.
This means that on other networks, your website could be down for a very long time before the problem is detected by a network monitor.
Netmar's NIMS network monitor uses an entirely different approach, however. Custom-written here at Netmar more than 4 years ago, the NIMS network monitoring tools not only "ping" network hosts to make sure they are there, but also connect to them and performs sample transactions every 15-30 seconds to ensure that they are fully functional. The responses to thes transaction are checked against expected answers. If so much as a single character is out of place or incorrect, alarms are set off.
A system administrator is instantly and automatically paged by one of the multiple monitoring systems, allowing us to address potential problems BEFORE they affect your website.
The following subsystems are monitored by NIMS and rechecked every 15-30 seconds:
- ICMP ("ping")
- All machines are checked for responsiveness.
- HTTP Checksum
- Sample webpages are accessed and checked for accuracy.
- SMTP (email)
- Sample email transmission dialogues are conducted with the mail server and checked for proper responses.
- DNS (Domain Name Service)
- Sample queries are made and checked for accuracy. Unresponsive servers can be automatically restarted by the monitoring system itself.
- Tape Backup
- State of the backup system is checked to ensure that the nightly backup will run without problems.
- Disk Usage
- The usage of critical filesystems is checked to ensure that they are not approaching their capacity. This eliminates surprises caused by unexpectedly large amounts of data storage requirements.
- Facility power is monitored continuously. System administrator is paged if an outage occurs. In case of a prolonged outage and failure of all of our power protection equipment, the monitoring system can execute an orderly shutdown of the entire network so as to ensure that no data is lost to unexpected loss of power.
If any of the above checks returns an unexpected value, a system administrator is immediately paged and alerted as to the subsystem involved, and the exact details of the error. This level of monitoring usually means that if a problem SHOULD occur, we are made aware of it long before it affects your website.