Count and Remove Duplicates from a Java Stream

Examples of finding, counting and removing duplicate elements from a Java Stream.

Overview

Java Streams are a lazily processed sequence of elements that supports sequential and parallel operations through a Stream pipeline. A Stream won’t process elements from the source until a terminal operation of the Stream’s pipeline runs.

This tutorial provides quick examples of finding, counting and removing duplicate elements from a Stream of Java objects or custom objects.

Remove Stream Duplicates using distinct()

The Java Stream interface provides several intermediate operations to process and filter elements in a Java Stream. The ‘distinct()‘ method of the Stream deduplicates Java Stream elements and returns a new Stream of the unique elements.

Example of using distinct() to remove Stream duplicates

Stream<String> stream = Stream.of("a", "b", "c", "b", "d", "a", "d");
Stream<String> output = stream.distinct();

output.forEach(System.out::print)

//prints:
//abcdCode language: Java (java)

Remove Stream Duplicates using Set

Alternatively, we can use a Java Set to remove duplicates from a Stream. As Java Sets contain unique elements, we can collect our Stream into a Set and create a new Stream with all duplicates removed.

Example of using Java HashSet to remove duplicate elements from a Stream.

Stream<String> stream = Stream.of("a", "b", "c", "b", "d", "a", "d");
Stream<String> output = stream
    .collect(Collectors.toSet())
    .stream();

output.forEach(System.out::print)

//prints:
//abcdCode language: Java (java)

Please note that the Java HashSets are unordered collections, which means they won’t preserve the order of the elements.

Remove Duplicates from a Stream of Custom Objects

The distinct() method internally uses the equals() method to check if two elements are equal. To remove duplicates from a Stream of custom objects, our custom class must provide the equality logic.

public class Student {
  private final Long studentId;
  private final String firstName;
  private final String lastName;
  private final Integer age;

  @Override
  public boolean equals(Object other) {
    if (!(other instanceof Student student2)) {
      return false;
    }
    return student2.studentId.equals(this.studentId);
  }

  @Override
  public int hashCode() {
    return studentId.hashCode();
  }
}Code language: Java (java)

The equals() method in our custom class uses the studentId field to decide if two class instances are equal. Now, we can use the distinct() method on a Stream of the Student objects.

Stream<Student> stream = Stream.of(
    new Student(1L, "Bob", "Jack", 12),
    new Student(2L, "Nick", "Stephen", 14),
    new Student(3L, "Bob", "Holden", 14),
    new Student(2L, "Nick", "Stephen", 14)
);

Stream<Student> stream = getStudentsStream();
Stream<Student> output = stream.distinct();

output.forEach(System.out::print)

//prints:
//Student(studentId=1, firstName=Bob, lastName=Jack, age=12)
//Student(studentId=2, firstName=Nick, lastName=Stephen, age=14)
//Student(studentId=3, firstName=Bob, lastName=Holden, age=14)Code language: Java (java)

Using Stream distinct() by a Particular Field

Sometimes, we cannot modify the equals() method in our custom class, or we want to use a different comparison logic than the one provided by the equals() method.

We can create a wrapper class around our custom object for such cases. The wrapper class will provide our custom comparison logic in the form of its equals() and hashCode() implementations.

Example of using a wrapper class to remove duplicates from a Java Stream based on a specific field or two.

@Getter
@RequiredArgsConstructor
class StudentWrapper {
  private final Student student;

  @Override
  public boolean equals(Object other) {
    if (!(other instanceof StudentWrapper wrapper2)) {
      return false;
    }
    return wrapper2.student.getFirstName()
        .equals(this.student.getFirstName());
  }

  @Override
  public int hashCode() {
    return student.getFirstName().hashCode();
  }
}Code language: Java (java)

Now, we can map the Stream of our custom object into a Stream of the wrapper class and use the distinct() on it.

Stream<Student> stream = Stream.of(
    new Student(1L, "Bob", "Jack", 12),
    new Student(2L, "Nick", "Stephen", 14),
    new Student(3L, "Bob", "Holden", 14),
    new Student(2L, "Nick", "Stephen", 14)
);

Stream<Student> output = stream
    .map(StudentWrapper::new)
    .distinct()
    .map(StudentWrapper::getStudent);

output.forEach(System.out::print)

//prints:
//Student(studentId=1, firstName=Bob, lastName=Jack, age=12)
//Student(studentId=2, firstName=Nick, lastName=Stephen, age=14)Code language: Java (java)

Count Duplicates in a Stream

We have seen how we can remove duplicates from a Stream using the distinct() method. However, sometimes we may wish to count the duplicates. To do that, we can use the toMap() collector.

Example of counting the duplicates in a Stream

Stream<Integer> stream = Stream.of(22, 31, 22, 34, 25, 31, 34);
Map<Integer, Long> map = stream
    .collect(toMap(Function.identity(), x -> 1L, Long::sum));

map.entrySet().forEach(System.out::println);

//prints:
//34=2
//22=2
//25=1
//31=2Code language: Java (java)

Summary

We learned how to use Java Stream’s distinct() method in different scenarios to remove duplicate elements from a Stream. The distinct() method performs an object’s equality check and returns a new Stream containing the unique elements.

We also learned that the equals() method should provide the equality logic to deduplicate a Stream of custom objects. If we want to remove duplicates from a Stream using specific fields not covered by the equals() method, we can use the wrapper class workaround. Lastly, we learned how to count duplicate elements in a Stream using the toMap() collector.

You can refer to our GitHub Repository for the complete source code of the examples used in this tutorial.