How to Retry in Spring WebFlux

A guide to adding and configuring retry logic in Spring Spring WebFlux WebClient.

Overview

Spring WebFlux provides reactive API for a non-blocking processing. Internally, it works on a client and publisher model where the client reacts to the changes on publisher. In this tutorial we will see how to apply retry behaviour in WebFlux.

The WebFlux contains a reactive and non-blocking HTTP Client – WebClient. We will see example configuring retry logic on WebClient calls. In this post we will study how to add a basic retry with maximum number of attempts. Going further we will see retry with fixed delay, retry with backoff and retry with Jitter.

What is a Retry?

In WebFlux, the subscriber demands for new events from publisher. If a publisher generates any error, the subscriber receives a notification and the subscription is finished. That means, the success or failure of a subscriber is directly dependent on the publisher. In distributed environments, where we use WebClient to access an external service, the service errors are out of our bounds. Thus the best a client can do, is to be prepared for either of the outcomes.

However, many a times publisher or upstream service errors a volatile. Like a small interruption in the network. Or like the upstream service is just recovering from a fatal error. All it means is that, not all failures are permanent. If there are volatile failures, there is a possibility that a reattempt can succeed. Thus the WebFlux API- Mono and Flux provides a mechanism to apply and configure retry behaviour.

There are two main methods – retry() and retryWhen(retrySpec), using which we can – enable retry, set maximum number of retries, add a fixed or an exponentially increasing delay in retries, or filter the errors we want to retry.

Retry N Times

The most basic way of enabling retry is to use retry() method by providing a maximum number of retry attempts. No matter what error the service throws, this will immediately re-subscribes for the given number of times.

Retry a fixed number of times

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retry(3L);
Code language: Java (java)

Alternatively, we can add retry by using retryWhen() method, as shown next

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retryWhen(Retry.max(3));
Code language: Java (java)

It is important to understand that even if the failure happens in the middle of a communication, a successful retry will restart from the beginning. If all of the specified number of attempts fail, then the subscriber fails permanently.

Retry N Times in a Row

Using Retry.max(long), as shown in the previous example puts a limit on total number of retries before a RetryExhaustedException is thrown. That means even if the errors are consecutive or intermittent the retry counter will always increment.

Using Retry.maxInARow(long), puts a similar limit on the number of retries, but the retry count increments only when there are consecutive errors. That means, whenever a retry is successful the retry counter is set back to zero. The RetryExhaustedException is thrown only if there are N number of errors in a row.

Retry N number of times in a Row

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retryWhen(Retry.maxInARow(3));
Code language: Java (java)

This throws RetryExhaustedException only when 3 retry attempt fail in a row.

Retry Infinitely

In any normal circumstances, we don’t need to retry indefinitely. However if we do, we we can omit the max attempts parameter from the retry() method.

Retry infinite times

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retry();
Code language: Java (java)

Passing a Long.MAX_VALUE is equivalent to not passing anything. Thus, it is considered indefinite retries.

Also, we can use Retry#indefinitely() method to retry immediately and indefinitely.
Retry Infinite Times with retryWhen()

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retryWhen(Retry.indefinitely());
Code language: Java (java)

Retry with Fixed Delay

The whole point of retrying a failed operation is to expect the upstream service will recover. However, an immediate retry, most likely returns the same error. That is because, the upstream service may need some time to recover. Moreover, immediate retries may keep the service busy and make it unable to recover.

Thus, it is a good idea to allow some time before we execute retry. To do that in WebFlux, we need to use retryWhen() function. This function function accepts a retry specification and is more configurable.

Retry with Fixed Delay

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retryWhen(Retry.fixedDelay(4, Duration.ofSeconds(5)));
Code language: Java (java)

This will retry for 4 number of times and there will be delay of 5 seconds between each of them.

Retry with Backoff

Backoff is a strategy in which each retry will add a progressively increasing delay. Assumption is, if a service call fails for a multiple times that means mostly like the subsequent call would fail. Thus, before each retry attempt, the backoff strategy delays the retry for a longer period than its previous delay.

Retry with Backoff

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retryWhen(Retry.backoff(4, Duration.ofSeconds(3)));
Code language: Java (java)

In this case, maximum of 4 retries will happen with initial delay of 3 seconds and subsequent delays like 6 seconds, 12 seconds, .. approximately.

Retry with Backoff and Jitter

Using Jitter along with backoff adds some randomness in the retry times. Consider, when there are a multiple clients who hits a service at same time. If they have same retry strategies, they will storm the server with retries.

To avoid that, we can some jitter to the backoff strategy. The jitterFactor ranges from 0 to 1, where zero corresponds to no jitter and 1 corresponds to 100% jitter of the originally computed value. The default jitterFactor is 0.5, which a jitter of 50% of the original computed value of delay.

Backoff with Jitter

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .bodyToFlux(Student.class) .retryWhen(Retry.backoff(4, Duration.ofSeconds(3)).jitter(0.7));
Code language: Java (java)

The example shows a Backoff Delay strategy with Jitter factor of 0.7 (70% Jitter of the computed value).

Retry on Specific Errors

When we apply the retry policy, it will retry in case of any error or exception. However, in a real life scenario we may not want to retry in case of some specific errors. For example, client errors has nothing to do with the server. Thus we shouldn’t retry upon such failures.

The Retry Specification allows specifying a certain exceptions that we want to retry against.

Retry on Specific Exception

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .onStatus( HttpStatus::is5xxServerError, response -> Mono.error(new StudentServiceException("Got" + response.statusCode() + " while executing " + GET_STUDENTS_URL))) .bodyToFlux(Student.class) .retryWhen( Retry.backoff(4, Duration.ofSeconds(3)) .jitter(0.7) .filter(throwable -> throwable instanceof StudentServiceException));
Code language: Java (java)

First, it is throwing a StudentNotFoundException when a Http status code of 5xx is received. Lastly, in the Retry Specification the filter() method specifies a predicate to match a specific exception. Having this, the retry will only happen when the server response status is 5xx.

Handle Retry Exhausted

WebFlux throws RetryExhaustedException when all of the specified number of retries were executed and failed. This may happen when the upstream service is unable to recover during the retry times.

The Retry Specification provides a convenient way to handle this case and throw a different exception instead.

Throw Exception when Retries are exhausted

WebClient.create() .get() .uri(GET_STUDENTS_URL) .retrieve() .onStatus( HttpStatus::is5xxServerError, response -> Mono.error(new StudentServiceException("Got" + response.statusCode() + " while executing " + GET_STUDENTS_URL))) .bodyToFlux(Student.class) .retryWhen( Retry.backoff(4, Duration.ofSeconds(3)) .jitter(0.7) .filter(throwable -> throwable instanceof StudentServiceException) .onRetryExhaustedThrow((retryBackoffSpec, retrySignal) -> new StudentServiceException( "Service failed to respond, after max attempts of: " + retrySignal.totalRetries())));
Code language: Java (java)

With this, when retries are exhausted a StudentServiceException with detailed message is thrown instead of the default RetryExhaustedException.

Summary

In this tutorial we learned how to add retry in Spring WebFlux. Retry mechanism helps to account transient errors in the upstream service. Both of the main publishers of Spring WebFlux – Mono and Flux support retry through retry() and retryWhen() methods.

We have seen examples of adding a basic retry with max attempts, retry with fixed delay, indefinite retry, retry with backoff and jitter. Also, we learned how to retry only in case of specific errors. Lastly we learned how to handle retry exhausted scenario and throw a custom exception.