feat: Improve flexibility of the MaxRetriesFailureHandler #554

beilCrxmarkets · 2024-11-12T16:30:59Z

Enhance Flexibility in MaxRetriesExceededHandler Configuration

This pull request introduces modifications to the MaxRetriesFailureHandler to enhance the flexibility and action possibilities within the maxRetriesExceededHandler. Previously, invoking executionOperations.stop() universally, even when a maxRetriesExceededHandler was defined, limited the potential actions, such as task rescheduling.
Key Changes:

The maxRetriesExceededHandler can now decide independently whether to remove the task, allowing for more nuanced handling (e.g., rescheduling tasks).
These modifications represent a breaking change, affecting the behavior of the MaxRetriesFailureHandler. With no defined handler, the behavior remains unchanged via a default handler executing executionOperations.stop().
We considered adding a RetriesFailureHandler, which would function similarly to MaxRetriesFailureHandler but without automatic invocation of executionOperations.stop(). However, to avoid user confusion on choosing between handlers, this was not pursued.

Fixes

#538

Reminders

Added/ran automated tests
Update README and/or examples
Ran mvn spotless:apply

cc @kagkarlsson

# Conflicts: # README.md

kagkarlsson · 2024-11-29T21:32:03Z

db-scheduler/src/main/java/com/github/kagkarlsson/scheduler/task/FailureHandler.java

-        executionOperations.stop();
        maxRetriesExceededHandler.accept(executionComplete, executionOperations);


I am unsure about the original intent of the maxRetriesExceededHandler. The handler should not have taken executionOperations as a parameter since the execution had already been stopped.

kagkarlsson · 2024-11-29T21:34:37Z

What is your use-case if I might ask? (what operation rather than stop() are you considering)

beilCrxmarkets · 2024-12-02T09:17:25Z

What is your use-case if I might ask? (what operation rather than stop() are you considering)

If the executionOperation is not stoped jet, we could pass new FailureHandler.OnFailureReschedule<Void>(schedule)::onFailure or new FailureHandler.OnFailureRetryLater<Void>(schedule)::onFailure for the maxRetriesExceededHandler.

Our RecurringTask is triggered once a day. If this fails, we want to repeat it. If we exceed the maxRetires, we still want the recurring task to be triggered the next day.

kagkarlsson · 2024-12-12T11:51:56Z

Not sure I understand. Why are you using MaxRetriesFailureHandler if you do not want the "max"-behavior?

You can always implement your own fully custom failurehandler..

beilCrxmarkets · 2024-12-12T12:46:53Z

Not sure I understand. Why are you using MaxRetriesFailureHandler if you do not want the "max"-behavior?

You can always implement your own fully custom failurehandler..

At the moment we use our own implemented FailureHandler to get the desired behavior.

However, I assume that we are not the only ones who need the described behavior, which is why it makes sense to store it in a central location.

I agree with you that the name “MaxRetriesFailureHandler” implies that after the maximum number of attempts is reached, the instance is terminated. We have therefore also considered proposing to add an additional FailureHandler with the name “RetriesFailureHandler”. However, the existence of a “MaxRetriesFailureHandler” and a “RetriesFailureHandler” can be confusing for beginners at first glance, as the difference is not 100% obvious.

If we were to start this project from scratch, it would make sense to add only the “RetriesFailureHandler” instead of the “MaxRetriesFailureHandler”, as the latter offers more flexibility. Since the library and thus the “MaxRetriesFailureHandler” have been around for a while and are used by several users as they are, the compromise would be to adapt the “MaxRetriesFailureHandler” accordingly so that it offers more flexibility.

kagkarlsson · 2024-12-12T12:50:30Z

Are you looking for behavior like:

up until X retries, use this FailureHandler
after X retries, use this other FailureHandler
?

beilCrxmarkets · 2024-12-12T13:03:37Z

Are you looking for behavior like:

up until X retries, use this FailureHandler

after X retries, use this other FailureHandler
?

Yes, our task will be triggered once a day. If this fails, we want to try again 5 times shortly after (RetriesFailureHandler). After all 5 attempts are used up, we want to try again tomorrow at the usual time (OnFailureReschedule).

However, with the current implementation of the MaxRetriesFailureHandler, we cannot do this because the task is deleted from the DB after the 5 attempts are exhausted.

We also combine this with the ExponentialBackoffFailureHandler to get a delay between the 5 additional attempts. This does not cause any problems.

beilCrxmarkets · 2025-01-07T13:28:20Z

@kagkarlsson How should we continue with this topic?

As stated in #538 (comment) @adamalexandru4 has the same issue.

kagkarlsson · 2025-03-08T12:31:13Z

I don't want to add this functionality in to MaxRetries...

And it is not meant to handle a secondary FailureHandler, the signature is:

    public MaxRetriesFailureHandler(
        int maxRetries,
        FailureHandler<T> failureHandler,
        BiConsumer<ExecutionComplete, ExecutionOperations<T>> maxRetriesExceededHandler) {
      this.maxRetries = maxRetries;
      this.failureHandler = failureHandler;
      this.maxRetriesExceededHandler = maxRetriesExceededHandler;
    }

i.e. BiConsumer<ExecutionComplete, ExecutionOperations<T>>

We might add a warning if the type supplied is actually a FailureHandler?

(It might be there is a better name for MaxRetries..., not sure but since you both made that assumption...)

Your requirement I think should be captured in its own handler. If you want to supply a standard one for the project that is fine 👍. But what should it be called 😅. CompositeFailureHandler.primary(handler1).afterTries(5, handler2) (.afterTime(Duration.ofHours(1), handler3))` (just toying with ideas)

beilCrxmarkets · 2025-03-11T15:10:16Z

I have reverted the changes made to the MaxRetriesFailureHandler and introduced a new SequenceFailureHandler equipped with builder functionality.

The SequenceFailureHandler facilitates the creation of a sequence involving two FailureHandlers. Initially, the primaryFailureHandler is invoked on the first failure, followed by the secondaryFailureHandler for subsequent failures.

While we can consider alternate names, I believe "sequence" aptly describes the concept of two failure handlers operating sequentially. In contrast, "composite" suggests a combination, which may not convey the intended functionality.

However, even with this setup, we cannot achieve the desired behavior. The limitation arises because the MaxRetriesFailureHandler within the SequenceFailureHandler removes the task execution. As a workaround, I've introduced an optional parameter primaryHandlerRetryCount to allow retries for the primaryFailureHandler.

This can create the wanted behavior via

SequenceFailureHandler.builder()
  .primary(new ExponentialBackoffFailureHandler(Duration.ofSeconds(3), 1.0))
  .afterTries(5, OnFailureRescheduleUsingTaskDataSchedule())
  .build()

Despite this solution, I remain uncertain if it constitutes the optimal API, given its inherent limitations.
An alternative approach might be the adoption of a list instead of the primaryFailureHandler, illustrated as follows:

  class SequenceFailureHandler<T> {

    private final List<FailureHandler<T>> failureHandlerSequence = new ArrayList<>();
    private final FailureHandler<T> fallbackFailureHandler;

    public SequenceFailureHandler(
        List<FailureHandler<T>> failureHandlerSequence, FailureHandler<T> fallbackFailureHandler) {
      this.failureHandlerSequence.addAll(failureHandlerSequence);
      this.fallbackFailureHandler = fallbackFailureHandler;
    }

    public void onFailure(
        ExecutionComplete executionComplete, ExecutionOperations<T> executionOperations) {
      int consecutiveFailures = executionComplete.getExecution().consecutiveFailures;
      if (consecutiveFailures < failureHandlerSequence.size()) {
        FailureHandler<T> failureHandler = failureHandlerSequence.get(consecutiveFailures);
        failureHandler.onFailure(executionComplete, executionOperations);
      } else {
        fallbackFailureHandler.onFailure(executionComplete, executionOperations);
      }
    }

    public static <T> List<FailureHandler<T>> times(int times, FailureHandler<T> failureHandler) {
      List<FailureHandler<T>> failureHandlers = new ArrayList<>();
      for (int i = 0; i < times; i++) {
        failureHandlers.add(failureHandler);
      }
      return failureHandlers;
    }
  }

This can be implemented as demonstrated below:

  private static void main() {
    new SequenceFailureHandler<>(
      times(5, new ExponentialBackoffFailureHandler<>(Duration.ofSeconds(3), 1.0)),
      new OnFailureReschedule<>(new CronSchedule("*****")));
  }

What do you think?

beilCrxmarkets · 2025-03-11T16:45:26Z

I've discovered that our current approach will only partially resolve our issues. We need specific error handling behavior:

Retry the operation 5 times with a delay of 3 seconds between attempts. If the operation still fails, reschedule it for the next execution time, typically the following day, and repeat the process.

Currently, the consecutiveFailures count does not reset to 0 after rescheduling to the next day. As a result, if there is a failure the next day, the task is rescheduled immediately instead of attempting it 5 times.

You don't encounter this issue with the MaxRetriesFailureHandler, as it terminates the task when a certain number of consecutive failures is reached.

This problem isn't resolved by the SequenceFailureHandler, which utilizes a list. Once the end of the list is reached, it simply triggers the fallbackFailureHandler, causing the task to be rescheduled immediately.

I believe we need a LoopFailureHandler which accepts a list of FailureHandler objects and starts over with the first handler once the end of the list is reached.

Alternatively, we could implement a retry functionality using the modulo/remainder operator as follows:

    public void onFailure(
        final ExecutionComplete executionComplete,
        final ExecutionOperations<T> executionOperations) {
      int consecutiveFailures = executionComplete.getExecution().consecutiveFailures;
      int totalNumberOfFailures = consecutiveFailures + 1;
      if (totalNumberOfFailures % maxRetries == 0) {
        LOG.error(
            "Execution has failed {} times for task instance {}. Cancelling execution.",
            totalNumberOfFailures,
            executionComplete.getExecution().taskInstance);
        executionOperations.stop();
        maxRetriesExceededHandler.accept(executionComplete, executionOperations);
      } else {
        this.failureHandler.onFailure(executionComplete, executionOperations);
      }
    }

JBabinskas · 2025-04-24T16:47:38Z

Hey,

@kagkarlsson cool library, using it instead of good old Quartz and loving it!

On topic, I think this current MaxRetriesFailureHandler implementation is indeed confusing because it not only confused me, but I went around and asked a few colleagues in the office what would happen if one tried to pass any FailureHandler as argument for maxRetriesExceededHandler. Well, no one said that it would end in an error because MaxRetriesFailureHandler always executes executionOperations.stop();. I think this is limiting the MaxRetriesFailureHandler flexibility for no reason. It is extremely common case to retry some nightly job for X amount of times at night, then give up, and just re-schedule it for next night. And MaxRetriesFailureHandler would fit perfectly for that, but I also recognize that changing this behavior would basically be a breaking change, so not sure what would be the best approach here. Currently for those retry than re-schedule cases we just have a custom FailureHandler where I basically copy pasted MaxRetriesFailureHandler and removed the forced executionOperations.stop(); 😅 I think part of confusion comes from that one might look at MaxRetriesFailureHandler and have an idea that it is a terminating handler, but for most people it just means "retry X amount of times, if all failed, then do X" especially when it takes in something named as maxRetriesExceededHandler. Even if it is a breaking change I would vote that MaxRetriesFailureHandler would be changed, giving more freedom for the user to decide what to do, including terminating the job completely.

beilCrxmarkets · 2025-05-19T14:02:02Z

@kagkarlsson I agree with @JBabinskas.

We should adjust the MaxRetriesFailureHandler. There are multiple ways. How should we proceed?

omarfi · 2025-06-16T08:36:40Z

We have the exact same use case.

I also solved it by using a custom "max" retries handler:

@Override
    public FailureHandler<OrderFulfilmentTaskData> getFailureHandler() {
        return (executionComplete, executionOperations) -> {
            if (executionComplete.getExecution().consecutiveFailures < maxRetries) {
                new OnFailureRetryLater<OrderFulfilmentTaskData>(Duration.ofSeconds(fixedDelayMinutes)).onFailure(executionComplete, executionOperations);
            } else {
                executionOperations.removeAndScheduleNew(TASK_DESCRIPTOR
                        .instance(executionComplete.getExecution().taskInstance.getId())
                        .data((OrderFulfilmentTaskData) executionComplete.getExecution().taskInstance.getData())
                        .scheduledTo(Instant.now().plus(Duration.ofSeconds(24))));
            }
        };
    }

I'm also hoping that the library can be shipped with a "CompositeFailureHandler" as proposed in the thread which would be composed of RetryLater and Reschedule- strategy).

Nice library btw.

beilCrxmarkets · 2025-07-04T12:56:23Z

@kagkarlsson How do we proceed?

As you can see, the problem affects several users.

tstavinoha · 2025-07-14T07:02:50Z

My team came to a similar conclusion as some commenters in this thread.
MaxRetries handler stops/removes the execution, making it impossible to reschedule and thus handle this scenario.

Already in this thread we can find different workarounds being used, with using modulos, rescheduling, etc., and there could be other dirtier hacks in the wild - for what is actually a fairly common use case.

While we can and do implement a custom failure handler, having first-class support for this scenario without needing to reinvent the wheel and work around the internal details of the library would be a big improvement.

beilCrxmarkets added 2 commits November 12, 2024 17:12

Improve flexibility of the MaxRetriesFailureHandler

1432e9d

Merge branch 'master' into ImproveFlexibilityOfMaxRetriesFailureHandler

b56c7ee

# Conflicts: # README.md

beilCrxmarkets changed the title ~~Improve flexibility of the MaxRetriesFailureHandler~~ feat: Improve flexibility of the MaxRetriesFailureHandler Nov 29, 2024

kagkarlsson reviewed Nov 29, 2024

View reviewed changes

beilCrxmarkets added 2 commits March 11, 2025 10:12

Merge branch 'master' into ImproveFlexibilityOfMaxRetriesFailureHandler

135fd19

Improve flexibility of the MaxRetriesFailureHandler

26105b5

		executionOperations.stop();
		maxRetriesExceededHandler.accept(executionComplete, executionOperations);

Uh oh!

feat: Improve flexibility of the MaxRetriesFailureHandler #554

Are you sure you want to change the base?

feat: Improve flexibility of the MaxRetriesFailureHandler #554

Uh oh!

Conversation

beilCrxmarkets commented Nov 12, 2024

Enhance Flexibility in MaxRetriesExceededHandler Configuration

Fixes

Reminders

Uh oh!

kagkarlsson Nov 29, 2024

Choose a reason for hiding this comment

Uh oh!

kagkarlsson commented Nov 29, 2024

Uh oh!

beilCrxmarkets commented Dec 2, 2024

Uh oh!

kagkarlsson commented Dec 12, 2024

Uh oh!

beilCrxmarkets commented Dec 12, 2024

Uh oh!

kagkarlsson commented Dec 12, 2024

Uh oh!

beilCrxmarkets commented Dec 12, 2024

Uh oh!

beilCrxmarkets commented Jan 7, 2025

Uh oh!

kagkarlsson commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beilCrxmarkets commented Mar 11, 2025

Uh oh!

beilCrxmarkets commented Mar 11, 2025

Uh oh!

JBabinskas commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beilCrxmarkets commented May 19, 2025

Uh oh!

omarfi commented Jun 16, 2025

Uh oh!

beilCrxmarkets commented Jul 4, 2025

Uh oh!

tstavinoha commented Jul 14, 2025

Uh oh!

Uh oh!

kagkarlsson commented Mar 8, 2025 •

edited

Loading

JBabinskas commented Apr 24, 2025 •

edited

Loading