BaseTrigger Max Retries + Pre-Ack Handling#1969
Conversation
|
| return | ||
| } | ||
|
|
||
| maxAttempts := b.maxRetries(ctx) |
There was a problem hiding this comment.
could you be consistent, either maxAttempt or maxRetries?
| activeRegistrations metric.Int64UpDownCounter | ||
| pendingEvents metric.Int64UpDownCounter | ||
| stuckEvents metric.Int64UpDownCounter | ||
| gaveUpCount metric.Int64Counter |
There was a problem hiding this comment.
gaveUp shouldn't be something more like stopResendingEvents? specially since the other metrics are stuckEvents/pendingEvents/etc.
| var toGiveUp []gaveUpEvent | ||
| for triggerID, pendingForTrigger := range b.pending { | ||
| for eventID, rec := range pendingForTrigger { | ||
| if maxAttempts > 0 && rec.Attempts >= maxAttempts { |
There was a problem hiding this comment.
replace maxAttempts > 0 by a function to make it clear
There was a problem hiding this comment.
also, could you make a new method with this new logic which is "stop resending and fire a metric", and the old behaviour the code for "appendToTryResending"
so there's a clear
if reachedMaxAttempts(rec.Attempts){
"stop resending and fire a metric"()
} else {
"appendToTryResending"()
}
| triggerID: triggerID, | ||
| eventID: eventID, | ||
| attempts: rec.Attempts, | ||
| wasCritical: wasCritical, |
There was a problem hiding this comment.
maybe for another PR but this metric of criticality somehow collides with AddPendingEvents() one.
I believe critically shouldn't be handled by the code here but in alerts, so you might drop this metric of critical and just have a metric of resending.
In an ideal scenario, resending should be almost near 0
| if ev.wasCritical { | ||
| b.metrics.DecStuckEvent(ev.triggerID, ev.eventID) | ||
| } | ||
| if err := b.store.DeleteEvent(ctx, ev.triggerID, ev.eventID); err != nil { |
There was a problem hiding this comment.
I'm really wondering if we want to do this DB deletion here, or having another long lived process that prunes old data from DB.
Specially since if this happens for a payload that's unrecoverable such as the HTTP Trigger, you have no means to somehow restore this data to the customer (just a thought)
| b.mu.Unlock() | ||
|
|
||
| if inMemory { | ||
| // Still actively tracked — scanPending will handle it (gave-up or ACK). |
There was a problem hiding this comment.
this should technically never happen, right?
If the prune time is set to 24hs, and max attempts to 20, with a retrial of 30 seconds, that's 10m of retrials, so eventually this should have been deleted from in-mem.
I believe you should error here, or even throw a metric to show there's an inconsistence
| } | ||
| cutoff := time.Now().Add(-age) | ||
|
|
||
| recs, err := b.store.List(b.ctx) |
There was a problem hiding this comment.
it might be better to have a query just to hit the DB asking for events which have been modified recently, and potentially also parameterize the maxAttempts to it
| } | ||
|
|
||
| for _, rec := range recs { | ||
| if rec.FirstAt.After(cutoff) { |
There was a problem hiding this comment.
why FirstAt field and not LastSeenAt?
Shouldn't you remove based on the last time you modified the row in the DB instead of the first time you inserted it?
…m/smartcontractkit/chainlink-common into CRE-3248-basetrigger-attempts-max
…m/smartcontractkit/chainlink-common into CRE-3248-basetrigger-attempts-max
Uh oh!
There was an error while loading. Please reload this page.