Skip to content

Add a k8up_last_schedule_job_succeded metric #1200

@danielpodwysocki

Description

@danielpodwysocki

Summary

As a platform engineer
I want to be able to alert only when the last job didn't succeed
So that I reduce alert fatigue in my team

Context

What I'm after is essentially alerting only when the last job for a schedule did not pass.

I'd also like to have the alert per-schedule, so that I can amend things such as what is its name, namespace, etc.

I was thinking of a k8up_schedule_last_job_succeded gauge, with a value of 1 for when we are ok and 0 when it failed.

Out of Scope

No response

Further links

No response

Acceptance Criteria

  • A metric exists with enough labels to allows a user to alert on:
    • last job failure - meaning if a backup succeded an hour ago and failed 23h ago, I get no alert
    • specific namespace
    • specific schedule name

Implementation Ideas

It'd be another metric - let me know if that sounds good and fits the project well and I'd also happy to get it contributed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions