Prometheus mediator:: unable to process alerts
1) Unable to process Prometheus alerts ,
2) Prometheus mediator webhook receives the alert and see following error in swpallinone.log.
2020-07-02 09:38:09,506 ERROR 58096904 <application-akka.actor.default-dispatcher-13> <com.j9tech.prometheus.PrometheusEventCollectorActor> <ControllableActor.scala:281>:Major Core - :::Exception during handling of Event(PushData({"receiver":"alert-emailer","status":"firing","alerts":
Investigation:
Out of the box Prometheus alert HTTP configuration doesn't include "sev" or "Severity" label - Alert Http configuration post
{
"version": "4",
"groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate)
"truncatedAlerts": <int>, // how many alerts have been truncated due to "max_alerts"
"status": "<resolved|firing>",
"receiver": <string>,
"groupLabels": <object>,
"commonLabels": <object>,
"commonAnnotations": <object>,
"externalURL": <string>, // backlink to the Alertmanager.
"alerts": [
{
"status": "<resolved|firing>",
"labels": <object>,
"annotations": <object>,
"startsAt": "<rfc3339>",
"endsAt": "<rfc3339>",
"generatorURL": <string> // identifies the entity that caused the alert
},
...
]
}
"version": "4",
"groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate)
"truncatedAlerts": <int>, // how many alerts have been truncated due to "max_alerts"
"status": "<resolved|firing>",
"receiver": <string>,
"groupLabels": <object>,
"commonLabels": <object>,
"commonAnnotations": <object>,
"externalURL": <string>, // backlink to the Alertmanager.
"alerts": [
{
"status": "<resolved|firing>",
"labels": <object>,
"annotations": <object>,
"startsAt": "<rfc3339>",
"endsAt": "<rfc3339>",
"generatorURL": <string> // identifies the entity that caused the alert
},
...
]
}
But alert label should include sev / severity label to be able to process to destinations as severity level is criteria for destinations.
"status": "firing",
"labels": {
"alertname": "ExcessiveRoutines",
"instance": "localhost:9090",
"job": "prometheus",
"severity": "page"
},
"annotations": {
"description": "localhost:9090 of job Prometheus has over 20 routines running.",
"summary": "Instance localhost:9090 with excessive routines"
},
"startsAt": "2019-04-23T09:33:55.050285364-04:00",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "http://MacBook-Pro.local:9090/graph?g0.expr=go_goroutines+%3E+20&g0.tab=1"
}
"labels": {
"alertname": "ExcessiveRoutines",
"instance": "localhost:9090",
"job": "prometheus",
"severity": "page"
},
"annotations": {
"description": "localhost:9090 of job Prometheus has over 20 routines running.",
"summary": "Instance localhost:9090 with excessive routines"
},
"startsAt": "2019-04-23T09:33:55.050285364-04:00",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "http://MacBook-Pro.local:9090/graph?g0.expr=go_goroutines+%3E+20&g0.tab=1"
}
Resolutions:
Configure the alert such that includes sev (or) severity label.
Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*