Improved Alert Notification

Prev Next

Catchpoint sends Improved Alerts, which indicate that a condition which previously triggered an alert has improved. Depending on the alert settings, improved alerts may indicate different levels of improvement.

These are the Alert statuses available in Catchpoint's alerting system:

  • Warning (optional) – Indicates a condition which warrants concern, but is less severe than Critical.
  • Critical – Indicates a condition that is considered unacceptable and needs attention.
  • Improved – Indicates when a condition which previously generated a Warning or Critical alert has improved back within the configured threshold.

Having three different notification levels provides flexibility to set up alerts for performance situations which are not necessarily binary. An alert setting with a binary condition would either indicate that the test completely failed or ran successfully (Critical vs Improved). This might be appropriate in many circumstances, but it is often preferrable to provide more gradations of failure/improvement.

A non-binary alert has two different thresholds indicating different levels of severity, and therefore the status when the alert condition improves could indicate complete or partial improvement. For example, if we set up an alert to be triggered when five nodes fail in five minutes, we will receive a critical alert notification when five nodes fail. Suppose in the following five minutes, only three nodes fail. In this situation, we cannot say the issue is 100% fixed but we can say that the test improved and it is no longer matching your alert criteria. This is why we use the term “Improved” to describe the changes in performance, and not a more definitive term like "resolved".

Another example is an alert for webpage response time. We set up a trailing-value alert comparing the 95th percentile of response time for the last hour to that for the previous week, triggering an alert when the hourly value exceeds the weekly value by 25% or more. Suppose an alert was triggered when the 95th percentile went up to 10,000ms as compared to a weekly value of 8000ms (a 25% increase.) One hour later, it goes down to 9,500ms. You would receive an "Improved" alert because the response time has improved to less than 25% above the weekly value, but this obviously doesn’t mean the issue has completely resolved itself.

Note: Improved alerts are not sent out in cases where the alert has been paused and unpaused after the original alert.