Welcome to the Introduction to Java HashSets Tutorial. Learn HashSet features, methods and also learn When and How to use HashSets with the help of examples.
HashSet is an implementation of Set Collection. Therefore, HashSet is a collection of unique data. In other words, if you try to put an object in a HashSet and that object is already present, the HashSet will ignore it.
HashSet allows you add one object at a time or bulk in a form of a collection. However, there is no way to access a specific element directly. If you want to retrieve a specific element from HashSet, you have to iterate through all the elements until you reach to the desired one.
You can use HashSets anytime you want to store unique elements or deduplicate an existing set of data. However, you should always keep in mind that HashSets are Unordered and UnSorted collections. Therefore HashSets do not guarantee the elements will be retrieved in the order of insertion or retrieved in a specific order.
You can learn more about Java Set Collection at our dedicated tutorial Introduction to Java Set with Examples.
Major Features of HashSet
- HashSets allow unique Elements.
- They do not support Sorting and do not guarantee the order of iteration.
- They allow one and only one null value.
- You can not retrieve a specific element from HashSet. But, you can use iterate to access all the elements sequentially.
- HashSets use HashTable (HashMap) in the background. Hence the add, remove, contains methods are constant in time.
- They are not synchronized. If you want thread safety, you have to handle on your own.
- If you modify HashSet after creating an Iterator, you get ConcurrentModificationException.
- The Iterators in HashSet are fail fast. In order words, if another thread tries to modify a HashSet which iterators are iterating, they will throw above exception. However, they won’t return arbitrary, or dirty data.
Firstly, create a HashSet and put some elements. Notice, we are adding the String “one” twice.
Set<String> hashSet = new HashSet<>(); hashSet.add("one"); hashSet.add("two"); hashSet.add("three"); hashSet.add("four"); hashSet.add("one");
Now, print the HashSet elements.
hashSet.forEach(System.out::println); // Output // four // one // two // three
We have added 5 elements but HashSet has only 4. Because, It ignored the second “one”.
We added elements in the incremental order of “one”, “two” etc. But, the output has different order.
The output is not sorted (for example: alphabetically).
Hence, with such an easy example, we have proved HashSets allow Unique Elements, they Do not Guarantee Order and Do not support Sorting.
When to Use HashSet
Below are the scenarios where you can use HashSets.
- Store Unique Records.
- Records do not have any specific order.
- De-duplicate records.
- You do not want to retrieve a specific record out of HashSet.
Let’s try to understand this with a real-life example. Consider, you have a large collection of Users activity. Which has the details about the activity type, time, place and id of the user who performed the activity. Your task is to find Names of all the users who performed at least one activity.
Firstly, you will have to grab ids of all the users from the activity collection. Then, get a unique list of user ids (each user may have performed multiple activities). Finally, retrieve the names of the users by ids.
Set<Long> uniqueUserIds = activities .stream() .map(Activity::getUserId) .collect(Collectors.toSet());
That’s it ! You have already got a Set of unique user ids.
This section focuses on Instantiating HashSets using Constructors. There are more ways of creating and initializing HashSets.
- HashSet(): Creates an Empty and Mutable HashSet. In other words you can add or remove elements to it. The initial size of such HashSets is 16 with a load factor of 0.75.
- HashSet(Collection<? extends E> c): Creates a new mutable HashSet, containing all the elements from the given collection.
- HashSet(int initialCapacity): Creates an empty and mutable HashSet of the given capacity. The load factor of 0.75 remains the same.
- HashSet(Int initialCapacity, float loadFactor): Creates an empty and mutable HashSet of the given capacity and load factor.
// Adds the specified element to this set if it is not already present. boolean add(E e); // Removes all of the elements from this set. void clear(); // Returns a shallow copy of this HashSet instance: the elements themselves are not cloned. Object clone(); // Returns true if this set contains the specified element. boolean contains(Object o); // Returns true if this set contains no elements. boolean isEmpty(); // Returns an iterator over the elements in this set. Iterator<E> iterator(); // Removes the specified element from this set if it is present. boolean remove(Object o); // Returns the number of elements in this set (its cardinality). int size(); // Creates a late-binding and fail-fast Spliterator over the elements in this set. Spliterator<E> spliterator();
Internals of HashSet
HashSets use HashTable (HashMap) to store the elements. The hash tables have concept of buckets, where an objects hashCode is used to to derive a key of the table. After that it stores the object in associated bucket.
When you put any object into an HashSet. It finds hashCode of the object. If the bucket associated with that hashCode is already filled, objects are compared using equals. If they match the new object is ignored, else it is stored.
HashSet and Performance
HashSets are excellent when you want to store large number of collections. Because, the basic add, remove, contains operations are constant time operations. In other words, putting an object in an empty set is same as putting it in a set having n records.
Again the underlying Hash table and the bucketing system maintains this constant time. To explain, every time you do add, remove or contains check it simply calculates the hashCode and reach to the respective bucket. Hence it is irrespective of how many elements are there in the set.
However, iterating a Set is not time constant. In other words, you can iterate a HashSet of 10 elements much faster than a HashSet of hundreds of elements.
HashSet Capacity and Load Factor
This is an important topic no matter which collection you are working on. Consider you have to store only few records and you create a Collection (or even an array) of much larger capacity. This will be heavy on the memory as well as performance. The HashSets have a certain Capacity and Load Factor.
The Capacity of a HashSet defines how many elements it can hold. However, the Load Factor defines How full a HashSet is. The default capacity of a HashSet is 16 and default load factor is 0.75. The capacity and load factors provide optimal usage experience in terms of memory and performance.
When a HashSet reaches to its load factor capacity, the hashtable in the background, start finding larger space. Also, it will pick each element from current bucket, rehash it and store to the new bucket at new location. This is called as rehashing of elements. When a HashSet gets rehashed, its capacity is increased. Also, the rehashing affects the performance and create more work to GC.
Hence, when you work on memory and performance critical applications and you have to pay extra attention to the amount of data you want to store and what capacity you are setting.
We have reached to the end of Introduction to Java HashSet Tutorial.
HashSet is an implementation of Java Set Interface. It has unique elements are does not guarantee order or sorting. HashSet uses buckets to store data and hence most of the operations are constant in time. You can use HashSets when you want to do de-duplication of elements or store elements in away where you don’t want to retrieve a specific element in specific order.