Java 8 Streams – Laziness and Performance

This tutorial discusses the laziness of the Java Streams and how that helps optimising the performance.

Overview

We have had a quick overview of Java 8 Streams API in the last post. We looked into the Power and simplicity of the Java 8 Streams API, brief about the Intermediate and the Terminal Operations over the streams, and different ways of building the streams (e.g from collections or numerical ranges etc.). In continuation to the same discussion, in this post, we will move ahead with the streams and have a look at the most important property of Java 8 Streams that is Laziness.
If you are new to the concept of Java 8 streams, please go back and read Understanding Java 8 Streams API.

Laziness Improves Performance (?):

This is really a tricky question. If the laziness is utilised in a right manner, the answer is ‘yes‘. Consider you are on an online shopping site and you searched for a particular type of a product. Usually most of the websites will show few of the matching products immediately and a ‘loading more’ message at the bottom. Finally, all of the search results will be loaded in parts, as described. The intent behind doing this is to keep the user interested by immediately showing him some of the results. While the user is browsing through the loaded products, the rest of the products are being loaded. This is because, the site is delaying the complete loading of the entire product list. Consider, if the site does eager loading or early loading of all of the products, the response time would increase and the user might get distracted to something else.

While you are dealing with bigger data, or infinite streams the laziness is a real boon. When the data is processed, we are not sure how the processed data will be used. The eager processing will always process the entire amount of data at the cost of performance and client might end up utilising very small chunk of it, or depending upon some condition, client may not even need to utilise that data. The lazy processing is based on ‘process only on demand‘ strategy.

Laziness and Java 8 Streams:

The current era is all about Big Data, Parallel Processing, and Being Real Time. Large number of systems are being re-designed to sustain in the future challenges of the consistently growing amount of data, and high expectations of the performance and scalability. No wonder, if the processing model of the Java Collections API is being empowered in order to meet the future expectations. The Java 8 Streams API is fully based on the ‘process only on demand‘ strategy and hence supports laziness.

In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimised to make it being capable of processing the large amount of data with high performance. Let’s see it live in with an example.

//Created a Stream of a Students List 
//attached a map operation on it 
Stream<String> streamOfNames = students.stream() 
    .map(student - > { 
        System.out.println("In Map - " + student.getName()); 
        return student.getName(); 
    }); 

//Just to add some delay 
for (int i = 1; i <= 5; i++) { 
    Thread.sleep(1000); 
    System.out.println(i + " sec"); 
} 

//Called a terminal operation on the stream 
streamOfNames
    .collect(Collectors.toList());Code language: Java (java)

Output:

1 sec 
2 sec 
3 sec 
4 sec 
5 sec 
In Map - Tom 
In Map - Chris 
In Map - Dave

Here there is a map operation called up on a stream then we are putting a delay of 5 seconds and then a collect operation (Terminal Operation) is called. To demonstrate the laziness, we have put a delay of 5 seconds. The output put clearly shows the map operation was called after calling the collect method only. Think of the collection operations created at one place and probably never used in the entire program. Java 8 Streams do not process the collection operations until user actually starts using it.

Performance Optimisation

As discussed above, the internal processing model of streams is designed in order to optimise the processing flow. In the processing flow we usually create a pipe of various intermediate operations and a terminal operation in the end. Because of the streams and the optimisation considerations given to the processing model, the various intermediate operations can be clubbed and processed in a single pass.

List<String> ids = students.stream() 
    .filter(s - > { 
        System.out.println("filter - " + s); 
        return s.getAge() > 20; 
    })
    .map(s - > { 
        System.out.println("map - " + s); 
        return s.getName(); 
    }) 
    .limit(3) 
    .collect(Collectors.toList());Code language: Java (java)

Output:

filter - 8 
map - 8 
filter - 9 
map - 9 
filter - 10 
filter - 11 
map - 11

The above example demonstrates this behaviour, where we have two intermediate operations namely map and filter. The output shows, neither the map nor the filter is executed independently over the entire size of the available stream. First, the id – 8 passed the filter and immediately moved to the map. Same is the case for the id – 9, while id – 10 didn’t pass the filter test. We can see id – 8, once passed through the filter was immediately available to the map operation, no matter how many elements are still lined in the stream before the filter operation.

Short Circuit Methods

Java 8 Streams API optimises stream processing with the help of short circuiting operations. Short Circuit methods ends the stream processing as soon as their conditions are satisfied. In normal words short circuit operations, once the condition is satisfied just breaks all of the intermediate operations, lying before in the pipeline. Some of the intermediate as well as terminal operations have this behaviour.

To see it working, try the below example, where there is a list of String names. The first stream operation is (actually meaningless) map, which returns name in upper case. The second operation is filter which returns only names starting with “B”. Now somewhere down the line, if we normally call the collect operation over it, no wonder if the map and filter are seen processing all the names in the list (and it exactly works like that).

//List of names
List<String> names = Arrays.asList(new String[] {
    "barry",
    "andy",
    "ben",
    "chris",
    "bill"
});


//map and filter are piped and the stream is stored
Stream<String> namesStream = names.stream()
    .map(n - > {
        System.out.println("In map - " + n);
        return n.toUpperCase();
    })
    .filter(upperName - > {
        System.out.println("In filter - " + upperName);
        return upperName.startsWith("B");
    });Code language: JavaScript (javascript)

But instead of this if we put a limit operation before the collect, the output changes dramatically.

Output:

In map - barry 
In filter - BARRY 
In map - andy 
In filter - ANDY 
In map - ben 
In filter - BEN 
  

We can clearly see the limit (though it is called lately from some other place and it is the last intermediate operation in the pipe) has an influence over the map and filter operations. The entire pipe says, we want first two names who starts with a letter “B”. As soon as the pipe processes the first two names starting with “B”, the map and filter didn’t even process the rest of the names.

Now, this can turn out to be a very huge performance gain. Consider, if our list contains few thousand names and we just want the first couple of names matching to a certain filter condition, processing of the rest of the elements will simply be skipped once we get the intended elements.

The operations like anyMatch, allMatch, noneMatch, findFirst, findAny, limit, and sub-stream are such short-circuit methods in the Steams API.


2 thoughts on “Java 8 Streams – Laziness and Performance

  1. Hi Amit
    this article is very nice. i have one question related to optimization perspective. creating pipeline of streams and executing all of them in single pass , will that help us to achieve the performance? i want to explore bit more.. please give me your suggestions
    Thanks

    1. Hi Yohendhira,
      Good that you liked it.
      Regarding your question, creating pipeline and executing in single pass would help in performance however I would love to understand more about your situation. Like how do you process your data currently, what operations you do, and what is the output like JSON HTTP Response or writing to a file etc.

Comments are closed.