PagerDuty Integration Guide

Prev Next

Introduction

PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, gives you an overview of all of your monitoring alerts, and informs an on-duty engineer if there’s a problem.

Catchpoint's integration with PageDuty enables you to detect and fix issues faster. Trigger workflows and route alerts to the correct team-members so they can respond quickly and effectively. A quick response to an incident is vital to ensure customer and employee expectations are not just met, but exceeded. Catchpoint integrates with PagerDuty to accelerate troubleshooting using proven workflows, notifications, and benchmarks, reducing mean time to resolve (MTTR).

PagerDuty Capabilities:

  • Aggregate, classify, correlate, and manage what matters.
  • Guaranteed alert delivery to the right person with the right information every time.
  • Configure custom on-call schedules, rotations, and escalations.
  • Manage incident workflow on the go with a brilliant user experience.
  • Built-in integrations with popular ChatOps tools and helpdesk services.
  • Analyze system efficiency and track employee productivity.

Catchpoint Integration

Catchpoint’s Alert Webhook function enables Catchpoint to push alert data to other tools in real time. Any tool supporting Webhooks or providing a URL to POST data to can be used. Alert Webhook templates can be customized with Macros to fit a tool’s formatting and content type requirements. If you need help modifying a template, please contact Catchpoint Support and we will be happy to assist you in creating a template meeting your requirements. Below you will find the guide on setting up and customizing templates. The example provided represents a standard workflow created by Catchpoint, focused on creating and confirming a Webhook Template to be consumed by PagerDuty. Catchpoint does its best to ensure the relevance of these guides and will be happy to assist in implementing a template as described.

Integration Steps

There are two ways to feed Catchpoint alerts to PagerDuty: Email and Webhook.

Email

  1. Choose/Create an email address which will be used for receiving alerts in PagerDuty.
  2. Configure one or more alerts in Catchpoint to send alerts to this email address.
  3. In PagerDuty, navigate to Configuration > Services and create a new Service with the Integration type set to Integrate via email.
  4. Set the Integration Email in PagerDuty to the same email address.

If you are using email parsing in PagerDuty, you can leverage the Initial Trigger time at the bottom of the alert email to link the Catchpoint reminder and improved alerts to the original alert.

Webhook

  1. In Catchpoint portal, In Catchpoint portal, navigate to the Integrations > Webhook:
    image.png

  2. Create a new alert webhook by clicking on Add URL > Alert Webhook:
    image.png

  3. Enter a name for this webhook and set status to Active.
    image.png

  4. Set the Endpoint URL to one of the following based on the API version you are using in PagerDuty:

  5. In the Alerts Webhook Format section,select Template.
    image.png

  6. Click New Template in the template selection drop-down menu. (Existing templates can be edited in the menu by hovering over the template name and selecting the** Edit/View Properties** icon.

  7. A lightbox will appear where you can create a name for the template and define the contents. PagerDuty expects valid JSON and has three required fields: service_key, event_type, and description. The maximum size JSON object Pagerduty will accept is 512KB.

Alert Macro Usage

The JSON values can be hardcoded, or filled in dynamically with data provided from the system at runtime. This is useful for including variable data such as the test name, alert severity, conditions that triggered the alert, location of the node on which a test run triggered the alert, etc.

Macros are formatted using this syntax: ${macroName}

A common use for a Macro would be to use the AlertInitialTriggerDateLocal or AlertInitialTriggerDateUtc macros to set the incident_key value:

"incident_key": "${AlertInitialTriggerDateUtc}",

This way, related alerts from Catchpoint will also be linked in PagerDuty.

Sample for API v1:

{
	"service_key": "Your-Integration-Key",
	"event_type": "${switch("${NotificationLevelId}","0","trigger","1","trigger","3","resolve")}",
	"description": "${switch("${NotificationLevelId}","0","WARNING","1","CRITICAL","3","OK")}: ${TestUrl}",
	"incident_key": "${AlertInitialTriggerDateUtc}",
	"client": "${TestName}",
	"client_url": "${TestUrl}",
	"details": {
	    "NodeName": "${NodeDetails("${NodeName}")}",
	    "NodeClientAddress": "${NodeDetails("${NodeClientAddress}")
    }",
	"NodeMean": "${NodeDetails("${NodeMean}")}",
	"Test Name": "${TestName}",
	"Test URL": "${TestUrl}"
	}
} 

Sample for API v1 (with multiple PagerDuty services):

{
	"service_key": "${if("'${testName}' =~ test1|test2|test3","Your-Integration-Key-1","'${testName}' =~ test10|test11", "Your-Integration-Key-2","'${testName}' =~ test4|test5|test6","Your-Integration-Key-3","'${testName}' =~ test7|test8|test9","Your-Integration-Key-4","'${testName}' =~ test9|test10","Your-Integration-Key-5","default-Integration-Key")}",
	"event_type": "${switch("${NotificationLevelId}","0","trigger","1","trigger","3","resolve")}",
	"description": "${switch("${NotificationLevelId}","0","WARNING","1","CRITICAL","3","OK")}: ${TestUrl}",
	"incident_key": "${AlertInitialTriggerDateUtc}",
	"client": "${TestName}",
	"client_url": "${TestUrl}",
	"details": {
	"NodeName": "${NodeDetails("${NodeName}")}",
	"NodeClientAddress": "${NodeDetails("${NodeClientAddress}")}",
	"NodeMean": "${NodeDetails("${NodeMean}")}",
	"Test Name": "${TestName}",
	"Test URL": "${TestUrl}"
	}
} 

Sample for API V2:

{
	"payload": {
	"summary": "${TestName} Alert",
	"source": "Catchpoint",
	"severity": "info",
	"component": "API",
	"group": "prod-datapipe",
	"class": "deploy",
	"custom_details": {
	"Link to Test" : "${testLink}",
	"label": "${testLabels}",
	"event_type": "${switch('${NotificationLevelId}','0','trigger','1','trigger','3','resolve')}",
	"description": "${switch('${NotificationLevelId}','0','WARNING','1','CRITICAL','3','OK')}",
	"UrlTested": "${TestUrl}",
	"Iinitial Trigger Time": "${AlertInitialTriggerDateUtcEpoch}",
	"Scatterplot Url" : "${ScatterplotChartUrl}",
	"Product Name" : "${ProductName}",
	"NodeName": "${NodeDetails('${NodeName}')}"
	}
	},
	"routing_key": "<Integration Key>",
	"dedup_key" : "${AlertInitialTriggerDateLocalEpoch}",
	"event_action": "${switch('${NotificationLevelId}','0','trigger','1','trigger','3','resolve')}"
	}

Sample for API V2 (with multiple PagerDuty services):

{
	"payload": {
	"summary": "${TestName} Alert",
	"source": "Catchpoint",
	"severity": "info",
	"component": "API",
	"group": "prod-datapipe",
	"class": "deploy",
	"custom_details": {
	"Link to Test" : "${testLink}",
	"label": "${testLabels}",
	"event_type": "${switch('${NotificationLevelId}','0','trigger','1','trigger','3','resolve')}",
	"description": "${switch('${NotificationLevelId}','0','WARNING','1','CRITICAL','3','OK')}",
	"UrlTested": "${TestUrl}",
	"Iinitial Trigger Time": "${AlertInitialTriggerDateUtcEpoch}",
	"Scatterplot Url" : "${ScatterplotChartUrl}",
	"Product Name" : "${ProductName}",
	"NodeName": "${NodeDetails('${NodeName}')}"
	 }
	},
	"routing_key": "${if("'${testName}' = 'test2'",'<Integration-key-1>'," '${testName}' = 'test4'",'<Integration-key-2>', '<Default-key>')}",
	"dedup_key" : "${AlertInitialTriggerDateLocalEpoch}",
	"event_action": "${switch('${NotificationLevelId}','0','trigger','1','trigger','3','resolve')}"
	}

De-duplicating Incidents:

For services with API integrations, if multiple alerts are triggering for the same issue, your team will be notified for each duplicate incident. To group these incidents, you can include the dedup_key (Events API v2) or incident_key (Events API v1) in your parameters for triggering incidents.

PagerDuty de-duplicates incidents based on the dedup_key/incident_key parameter — this identifies the incident to which a trigger event should be applied. If there are no open (unresolved) incidents with this key, a new incident will be created.

If there is already an open incident with a matching key, this event will be appended to that incident's alert log as an additional Trigger log entry.

blobid0.png

If the event key field isn't provided, PagerDuty will automatically open a new incident with a unique key.

## Event Actions:

Trigger: Create a new alert if no alert exists with the same dedup-key, or log this event under the existing alert. (Use this event action when a new problem has been detected)

Acknowledge: The incident referenced with the dedup_key will enter the "Acknowledged" state.

Resolve: The incident referenced with the dedup_key will enter the "Resolved" state. Once an incident is resolved, it won't generate any additional notifications. Additional alerts will result in creation of a new incident.

You can find the PagerDuty Documentation of Alert Event here

Note: Alerts may get merged with unrelated existing incidents if the below Alert Grouping setting is enabled. This setting may cause multiple alerts (even from different tests) to be grouped into one incident if they get triggered within 2 minutes.

blobid1.png

You can find the details of different event types and JSON fields here.

Alert Macro Index

A full list of Alert Webhook Macros can be found here.