Downloading Large Files using Spring WebClient

A quick tutorial on how to efficiently download large files with Spring WebClient. Contains example of using WebClient to read a stream of a very large file and store it on the disk.

Overview

Spring WebClient offers a non-blocking and reactive way of interaction with other HTTP resources. This tutorial focuses on accessing a large file from an external service using Spring WebClient.

We will first study using Mono publisher to download a file in the form of byte[] (byte array). We will also understand why this method is not suitable for downloading large files.

Next, we will focus on Spring DataBuffer classes and their role in the data transfer process. We will also learn why we get DataBufferLimitException and how we can avoid that by configuring DataBuffer capacity.

Lastly, we will study how to use Flux publisher to download a very large file in chunks of DataBuffer.

Setup WebClient

To begin with we will create an instance of WebClient and use it to download files.

Dependency

In order to use WebClients in a Spring Boot project include a starter dependency for Spring WebFlux.

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>Code language: HTML, XML (xml)

This dependency implicitly sets all the required dependencies including underlying Netty server.

WebClient Instance

Let’s create a WebClient instance using its own builder. We are providing the base url of a file server.

@Bean
public WebClient webClient() {
  return WebClient.builder()
    .baseUrl(props.getFileServerUrl())
    .build();
}Code language: Java (java)

Downloading as a Byte Array

When we read a file in Java, it is held in the form of byte arrays (byte[]). Hence, reading the content from response as a byte[] is the most simple way.

Example of WebClient downloading file as a byte[]

public void downloadUsingByteArray(Path destination) 
        throws IOException {

  Mono<byte[]> monoContents = webClient
    .get()
    .uri("/largefiles/1")
    .retrieve()
    .bodyToMono(byte[].class);

    Files.write(
      destination, 
      Objects.requireNonNull(monoContents.share().block()),
      StandardOpenOption.CREATE);
}Code language: Java (java)

Here, we used Spring WebClient to access a file from an URL, read the file contents in the form of a byte array, and write it to a file on the disk.

Although we have covered it as an example, we do not recommend using byte[] way for large files. That is because it reads the entire content of a file in memory and lead to OutOfMemoryException if the data exceeds the available memory.

Also, the WebClient internally uses a data buffer of a predefined size (around 256KB) to store the file content. In order to successfully download a file, it must fit into the data buffer. However, there is a way to increase the data buffer size that we are going to see in a later section.

What is Spring DataBuffer?

The WebClient internally uses data buffers to hold the data transmitted over the network. On a high level Spring DataBuffer provides useful abstraction over Java NIO ByteBuffer. Also, it offers some benefits given by Netty ByteBuf.

Some of the feature of DataBuffers are:

  • Unlike ByteBuffer, the DataBuffer have Separate read and write positions and it doesn’t need a flip to switch between a reading and writing.
  • Offers a way to create DataBuffer Pooling (using PooledDataBuffer) to have a predefined pool of DataBuffer objects that are reused.
  • Allows Dynamic Expansion and Contraction of DataBuffer capacity.
  • Offers to View the Buffer in the form of ByteBuffer, InputStream or OutputStream.

It is important to know that while using Spring WebClient we do not need to deal with DataBuffers directly. Spring offers DataBufferUtils that provides a bunch of DataBuffer utility methods. We may however need to change DataBuffer capacity if we expect to transfer larger amount of data in one go. For example, downloading file as a byte[] or using Mono publisher.

DataBufferLimitException

The DataBufferLimitException occurs when WebClient tries to transfer data bigger than DataBuffer capacity. We can reproduce this exception by transferring a large file with our byte[] example above.

Also we know that Mono is a publisher that can emit zero or 1 events. Thus, when we use Mono<DataBuffer> we get the same exception.

Mono<DataBuffer> dataBuffer = webClient
  .get()
  .uri("/largefiles/1")
  .retrieve()
  .bodyToMono(DataBuffer.class);

DataBufferUtils.write(dataBuffer, destination,
  StandardOpenOption.CREATE)
    .share().block();Code language: Java (java)

Here, we are using DataBufferUtils to subscribe to the DataBuffer contents and write them to a file as a whole. When we run this to download a bigger file we get below exception.

org.springframework.web.reactive.function.client.WebClientResponseException: 200 OK from GET http://localhost:8182/largefiles/1; nested exception is 
org.springframework.core.io.buffer.DataBufferLimitException: Exceeded limit on max bytes to buffer : 262144Code language: plaintext (plaintext)

Configuring DataBuffer Capacity

We can avoid DataBufferLimitException by increasing its capacity. To do that we need to configure the default codecs on the WebClient during building.

Example of Configuring DataBuffer size in WebClient

public WebClient webClientWithLargeBuffer() {
  return WebClient.builder()
      .baseUrl("http://localhost:8182")
      .exchangeStrategies(ExchangeStrategies.builder()
          .codecs(configurer ->
              configurer.defaultCodecs()
                  .maxInMemorySize(2 * 1024)
          )
          .build())
      .build();
}Code language: Java (java)

Here, we are building a WebClient with specifying DataBuffer size. Using that we will be able to download files up to 2MB in one go.

Remember that, increasing DataBuffer size will increase its impact on your overall memory. We should only do so when we have specific requirements.

Downloading a Large File with WebClient

The best way to download large files using WebClient it to download the file in chunks. To do that we need to use Flux publisher that can emit zero to N events.

Example of Using WebClient to download large file in chunks and write to the disk

Flux<DataBuffer> dataBuffer = webClient
  .get()
  .uri("/largefiles/1")
  .retrieve()
  .bodyToFlux(DataBuffer.class);

DataBufferUtils.write(dataBuffer, destination,
    StandardOpenOption.CREATE)
    .share().block();Code language: Java (java)

This will download a large file in parts and write each part to a file on disk. The Flux delivers N number of DataBuffer instances – each filled with parts of the downloaded contents.

Alternatively, we can also write contents of the downloaded file straight to an OutputStream.

DataBufferUtils.write(dataBuffer, outputStream)
    .share().block();Code language: Java (java)

Finally remember that to Download file in stream we do not need to alter the DataBuffer default capacity. However, we can configure DataBuffer size to achieve optimal and efficient performance and memory consumption.

Summary

In this quick tutorial we demonstrated Downloading a Large file Stream using Spring WebClient. Also, we learned we can download a file in whole or in parts and understood that downloading a file as whole has big impact on the memory. We have also studied the role of DataBuffer and DataBufferUtils and configured the DataBuffer default capacity to avoid DataBufferLimitException.

For the complete source code of the examples used in this tutorial, please visit our Github Repository.