Introduction to the Java Virtual Machine (JVM)

Welcome to the Introduction to Java Virtual Machine (JVM) tutorial. We are going to see what JVM is and what are its main components. Understand how JVM is the backbone of Java Portability and Write Once, Run Anywhere (WORA) features.

Tutorial Contents

Overview

When they started developing Java, the Portability feature was one of the most anticipated features. It was when the number of devices was increasing, and the systems and types of equipment were getting more and more controlled by microcomputers. Then there was a need for a Programming platform that let us write programs once on one machine and then use/run them on any other system, device or equipment.

This gave birth to the Java Virtual Machine (JVM). As the name indicates, it’s a virtual machine that doubles for the underlying system. A Java program, when compiled, doesn’t get transformed into machine instructions, but it is compiled into a generic set of instructions called bytecode. The JVM behaves like a virtual processor that reads the instructions and transforms them into system or hardware instructions.

Hence, the developers don’t need to know anything about the Operating System level details and write a generic Java program that the JVM will be capable of running on any system.

What is Java bytecode?

The bytecode is a compiled output of a Java Program. Java Compiler scans through the program to check the correctness and completeness of the program and results in Compile Time Exceptions if found any issue. However, when no problems are found, the compiler transforms the program statements into a set of instructions and stores them in a .class file. This .class file is said to have the bytecode of the separate .java file. The collection of instructions a source code transforms into is meant explicitly for JVM to understand.

The bytecode is platform-independent, which means it can be transferred to any other machine to be executed. In Java, the bytecode is not a fully compiled source code output. It is the same program written in a JVM-understandable syntax.

The JVM has three main components: Class Loader, Memory, and an Execution Engine.

Class Loader

The Class Loader is responsible for dynamically loading the classes in the system and initializing them. The Class Loader finds the binary representation of the classes or interfaces by the name, identifies the members and links, prepares them for initialization and initializes them.

It first loads Java’s bootstrap classes, then Java’s Library classes and finally, the application or program-level classes. The Classloader scans through the application classpath and loads all the .class files found in that. When a classloader loads a .class file, it records its namespace, type of class (i.e. a class or interface or enum), and class members.

Consider we have a tiny HelloJvm program like this.

package com;

public class HelloJvm{
  public static void main(String[] args){
    System.out.println("Hello JVM !!");
  }
}Code language: Java (java)

Let’s compile the class.

~ javac com/HelloJvm.javaCode language: Bash (bash)

After this, you should see the HelloJvm.class file in the same directory. This .class file is a bytecode representation of the source class. I have compiled this file on my macOS; however, now I can transfer it to any other machine, say Windows and execute it using the java command. Let’s see how the above program is executed.

~ java com/HelloJvm
Hello JVM !!Code language: Bash (bash)

Our program is executed, and it prints the desired output as well. In the background, the class loader the Loading, Linking, and Initialization which we will see below.

Loading

When we executed the java command, the JVM was initialized, and the class loader inside JVM loaded all the Java Library classes.

Then from the command, it knew the class to be executed, and JVM finds the class with its fully qualified name ‘com/HelloJvm.class‘ and loads the class. When it loads the class, it notes the name of the class, superclass name, method and member details into a special area in memory called Method Area.

One thing to remember here is when we say ClassLoader, it is not just one class. It, more precisely, can be called a Class Loader Subsystem. Before loading the HelloJvm class, any class loader instance will check if any other instance has already loaded it. All this is to ensure no class is loaded twice.

Linking

The linking phase is a three-step process: Verification, Preparation and Resolution. Let’s look at them briefly.

Verification

Once a class is loaded, the next step is Verification. The binary representation of the class file is checked for the correctness and correctness of the compiler that generated the file. This step prevents loading any inaccurate class files generated by non-trusted compilers.

Preparation

Next is the Preparation phase, where a class or interface is prepared. The static fields get the memory allocated and are initialized with their default values.

Resolution

Then the Resolution process happens, during which the Symbolic references are replaced with actual ones. When a program is compiled, all the references (field references, method references, object references, etc.) are replaced with symbolic links pointing to the Constants Pool. These symbolic references are resolved here. Suppose a class refers to another class/interface. In that case, the referenced class/interface is Loaded (as described in the steps above), if not loaded already, and its references are loaded and resolved recursively.

During the linking phase, the main method is identified. All the static variables are initialized into memory, and references to other classes/interfaces are identified. During the compilation, the references are stored as Symbolic references, which need to be replaced by the actual references. This step of replacing symbolic references is optional in the Linking and can be postponed for later phases.

Initialization

This phase is a static initialization phase. Where all the static variables are assigned to their actual given values, please remember all the static variables were initialized to their default values during one of the phases in Linking. The static code blocks, along with static variables, are initialized in this phase.

The initialization always happens in a Top to Bottom direction. Super class static variables and blocks will be initialized before the child class. At the end of the initialization phase, a class is ready to be used by JVM.

Memory

Another essential component of JVM is the Memory area. All the classes, methods, metadata and objects are stored in memory, and JVM classifies them into Method Area, Heap Area, Stack Area, PC Registers, and Native Method Stack. Let’s have a look at each of them.

Method Area

Each JVM has its method area, which gets created upon the start of the JVM. The method area is shared across all the threads in the JVM, and there is only one Method Area per JVM, and it is logically considered as part of the Heap Area.

Method Area stores Runtime Constants Pool, Class Name, Immediate Parent information, all the static variables, and actual bytecode of methods, constructors and some special instance initializing methods. Garbage collection of the method area memory is optional to the JVM implementations.

Heap Area

Heap Area, similar to Method Area, is one per JVM and created upon initialization of JVM. Heap Area is also shared across all the threads of JVM. Unlike Method Area, which stores all the Class level information, the Heap Area is used to store Instance level information.

When a class is instantiated, all its instance variables and data structures require a share of memory allocation. This memory is allocated on the Heap. Initial Heap size and maximum Heap size can be provided before the start of JVM, and the Heap area dynamically grows or compacts in that range. Heap Area is governed by Garbage Collector and actively reclaimed.

Stack Area

The Stack Area is allocated one per thread and not shared. The thread here is a Thread of Execution like our program’s Main thread. A thread, in process, may call a method which in turn calls another. For each such invocation, the method and its short-living variables are stored on a Stack in the form of a Stack Frame.

The method invocations happen on a First In Last Out basis. When method A calls method B, which calls method C, method C finishes first, and method A is the last. That is why stack is used to store the method-level variables.

PC Register

The PC Register, whose real name is Program Counter Register, is an area of memory created once per thread of execution and not shared. When a thread of execution processes only one method simultaneously. When a thread of execution enters into a method, the physical address of the binary representation of the method instruction is stored in the threads PC Register.

The JVM uses the PC to know the current instruction a thread is executing. The PC register instruction also holds the address to the next instruction. Once the current instruction is executed, the next one is loaded in the PC Register.

Native Method Stack

The Native method stacks one per thread (only if ever created), not shared and pretty much similar to the Stack Area. As Java is written with C/C++, it allows Java methods to call native language methods. If it does, the native methods can’t be stored on Java stack frames and need separate stacks called C Stacks.

Execution Engine

The JVM Execution Engine is the one who is responsible for utilizing all that we discussed above and executing the actual code. The Execution engine gathers the program bytecode for the various memory areas instruction by instruction, interprets it, and runs it.
There are three components in the JVM Execution Engine Interpreter, JIT Compiler, and Garbage Collector.

Interpreter

The Java bytecode is a human-readable set of instructions. To execute instructions on a system or operating system, the instructions need to be transformed into a format that the machine understands. That is where the interpreter comes into the picture.

The interpreter is bound to the underlying machine as well as the language of the instructions. To elaborate on this further, consider bytecode as a code language. Instead of bytecode, if the instructions are written in different formats, we need to change the interpreter code for it to understand our language and still be able to translate it into machine language. This is how some of the JVM Languages work; all they do is compile their code into a certain type of instructions and then customize the JVM interpreter to understand that.

Just In Time (JIT) Compiler

The problem with an interpreter is it is slow, especially when a method is executed multiple times. On every execution, it interprets the same bytecode repeatedly. To overcome this, JVM has JIT Compiler.

The JIT compiler compiles the entire block of bytecode into native instructions, and then the interpreter doesn’t interpret it again but executes it. The execution of native instructions is faster than instruction by instruction execution by the interpreter. On top of this, as the method is being called repeatedly, the native interpretation of the method is cached, making it even faster.

However, if a method is called only once, the interpreter outperforms the JIT Compiler. Therefore the JVM smartly identifies the methods called multiple times, and only those methods are sent to JIT Compiler.

Garbage Collector

The Garbage Collector is a JVM’s daemon program that runs in the background and reclaims unused memory. As the program runs depending on the functionality, it creates several objects that become unusable. Garbage Collector keeps an eye on unused objects, destroys them and free the memory.

A reference variable is a way to know if an object is being used. Suppose class A refers to an instance of class B. When class A dies, the reference dies with it, but the instance of class B remains alive and consumes the amount of memory. Thanks to the Garbage Collector for continuously freeing memory for us.

The Goods

It frees the developer from manually destroying the objects created during the program execution.
No need to allocate or deallocate memory manually.
Security. No scope for explicit memory management. Hence no errors.

The Bads

It needs additional CPU usage.
Possibilities of memory leaks are always there.
When a program runs low on memory, the JVM run GC on higher priority. This further causes the program to hang or slow down.
No control over on-request Garbage Collection. It is not guaranteed

Summary

In this Introduction to Java Virtual Machine (JVM) tutorial, we have learnt:

What is JVM?
We understood what bytecode is and also understood it is just a language, and some JVM languages use a different type of bytecode and have a custom interpreter.
The JVM has three components: Class Loader, Memory and Execution Engine.
Class Loading is done by Loading, Linking, and Initialization.
Memory is divided into Method Area, Heap, Stack, Native Stack, and PC Registers.

Also Read: What are JVM, JDK and JRE