Skip to content

Conversation

@benbridts
Copy link

We were having issues where we saw the pagerduty_summary_incident_acknowledge_duration_sum constantly incease, even though there were no open incidents in pagerduty.

We realized this happend because an incident was resolved before it got acknowledged.

This PR makes it so that the acknowledge_duration is capped at the resolved_duration.

Tested locally:

Before

finch run --rm -ti -p 8080:8080 webdevops/pagerduty-exporter:latest --pagerduty.authtoken=$PD_TOKEN --log.level=trace
INFO starting pagerduty-exporter v25.12.1 (f9856f4; go1.25.5; by webdevops.io at 2025-12-25T11:34:58Z)
INFO {"Logger":{"Level":"trace","Format":"logfmt","Source":"","Color":"","Time":false},"PagerDuty":{"AuthTokenFile":"","MaxConnections":4,"Schedule":{"OverrideTimeframe":172800000000000,"EntryTimeframe":259200000000000,"EntryTimeFormat":"Mon, 02 Jan 15:04 MST"},"Incident":{"Statuses":["triggered","acknowledged"],"TimeFormat":"Mon, 02 Jan 15:04 MST","Limit":5000},"Teams":{"Disable":false,"Filter":null},"Summary":{"Since":2628000000000000}},"Server":{"Bind":":8080","ReadTimeout":5000000000,"WriteTimeout":10000000000},"Cache":{"Path":""},"ScrapeTime":{"General":300000000000,"MaintenanceWindow":300000000000,"Schedule":300000000000,"Service":300000000000,"Team":300000000000,"User":300000000000,"Summary":900000000000,"System":900000000000,"Live":60000000000}}
...
INFO finished metrics collection collector=Incident duration=0.992165954 nextRun=2026-01-14T16:00:18.336Z
...
INFO finished metrics collection collector=Summary duration=24.240854676 nextRun=2026-01-14T16:14:41.585Z
http localhost:8080/metrics | grep pagerduty_summary_incident_acknowledge_duration_sum
pagerduty_summary_incident_acknowledge_duration_sum{priority="",serviceID="REDACTED",urgency="high"} 1.86483893249303e+06

After

finch run --rm -ti -p 8080:8080 pagerduty-exporter:25.12.1-dirty --pagerduty.authtoken=$PD_TOKEN --log.level=trace
INFO starting pagerduty-exporter v25.12.1-dirty (f9856f4; go1.25.5; by webdevops.io at 2026-01-14T16:01:49Z)
INFO {"Logger":{"Level":"trace","Format":"logfmt","Source":"","Color":"","Time":false},"PagerDuty":{"AuthTokenFile":"","MaxConnections":4,"Schedule":{"OverrideTimeframe":172800000000000,"EntryTimeframe":259200000000000,"EntryTimeFormat":"Mon, 02 Jan 15:04 MST"},"Incident":{"Statuses":["triggered","acknowledged"],"TimeFormat":"Mon, 02 Jan 15:04 MST","Limit":5000},"Teams":{"Disable":false,"Filter":null},"Summary":{"Since":2628000000000000}},"Server":{"Bind":":8080","ReadTimeout":5000000000,"WriteTimeout":10000000000},"Cache":{"Path":""},"ScrapeTime":{"General":300000000000,"MaintenanceWindow":300000000000,"Schedule":300000000000,"Service":300000000000,"Team":300000000000,"User":300000000000,"Summary":900000000000,"System":900000000000,"Live":60000000000}}
...
INFO finished metrics collection collector=Incident duration=0.807753905 nextRun=2026-01-14T16:03:30.896Z
...
DEBUG Overwriting acknowledgedAt based on resolvedAt collector=Summary incident=REDACTED from=2026-01-14T16:02:33.892Z to=2025-12-24T02:05:28.000Z
...
INFO finished metrics collection collector=Summary duration=24.437684963 nextRun=2026-01-14T16:17:52.527Z
http localhost:8080/metrics | grep pagerduty_summary_incident_acknowledge_duration_sum
pagerduty_summary_incident_acknowledge_duration_sum{priority="",serviceID="REDACTED",urgency="high"} 404

this keeps MTTA metrics <= MTTR metrics
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant