Introduction to Java Virtual Machine (JVM)

Welcome to the Introduction to Java Virtual Machine (JVM) tutorial. We are going to see what is JVM and what are the main components. Understand how JVM is the backbone of Java Portability and Write Once, Run Anywhere (WORA) features.

Java Virtual Machine | amitph

Overview

When they started with developing Java, the feature of Portability was one of the most anticipated features. It was the time when the devices were increasing and the systems and equipments were getting more and more controlled by microcomputers. Then there was a need for a Programming platform which lets you write Program once at one machine and then use/run it on any other system, devices or even equipments.

This gave birth to the Java Virtual Machine (JVM). As the name indicates its a virtual machine which doubles for the underlying system. A Java program, when compiled, doesn’t get transformed into machine instructions, but it is compiled into a generic set of instructions called as bytecode. The JVM behaves like a virtual processor who reads the instructions and transforms them into system or hardware instructions.

Hence, the developers don’t actually need to know anything about the Operating System level details and write a generic Java program which the JVM will be capable of running on any system.

What is Java byetecode?

The bytecode is a compiled output of a Java Program. Java Compiler scans through the program to check its correctness and completeness of the program and result into Compile Time Exceptions if found any issue. However, when no issues are found, the compiler transforms the program statements into a set of instructions and stores it into a .class file. This .class file is said to have bytecode of the respective .java file. The set of instructions a source code gets transformed into is specifically meant for JVM to understand.

The bytecode is platform independent, which means it can be transferred to any other machine to be executed. In Java the bytecode is not a fully compiled output of a source code. It is the same program written in JVM understandable syntax.

The JVM has 3 main components and those are : Class Loader, Memory, and an Execution Engine.

Class Loader

The Class Loader is responsible for dynamically loading the classes in the system and initialising them. The Class Loader finds the binary representation of the classes or interfaces by the name, identifies the members and links, prepares them for initialisation and initialises them.

It first loads Java’s bootstrap classes, then Java’s Library classes and finally the application or program level classes. The Class loader scans through the application classpath and loads all the .class files found in that. When a classloader loads a .class file it records its namespace, type of class (i.e. a class or interface or enum), and members of the class.

Consider, we have a very small HelloJvm program like this.

package com; public class HelloJvm{ public static void main(String[] args){ System.out.println("Hello JVM !!"); } }
Code language: Java (java)

As shown the above class resides in com package. Let’s compile this .java file

~ javac com/HelloJvm.java
Code language: Bash (bash)

After this you should see `HelloJvm.java.class` file in the same directory. This .class file is nothing but bytecode representation of the above class. I have compiled this file on my macOS, however now I can transfer this file to nay other machine, lets say Windows and execute using java. Lets how the above program is executed.

~ java com/HelloJvm Hello JVM !!
Code language: Bash (bash)

Our program is executed and it printed the desired output as well. In the background the class loader the Loading, Linking, and Initialisation which we will see in below section.

Loading

When we executed the java command the JVM is initialised and the class loader inside JVM loaded all the Java Library classes.

Then from the command it knew the class to be executed and JVM finds the class with its fully qualified name `com/HelloJvm.class` and loads the class. When it loads the class it notes the name of the class, superclass name, method and member details into a special area in memory called as Method Area.

One thing to remember here is, when we say Class Loader it is not just one class it is more precisely can be called as Class Loader Subsystem. Any class loader instance before loading HelloJvm class, will check if any other instance has already loaded it. All this is to ensure, no class is loaded twice.

Linking

Linking phase is a three step process which include Verification, Preparation and Resolution. Let’s look at them briefly.

Verification

Once a class is loaded, next step is Verification. The binary representation of the class file is checked for the correctness and correctness of the compiler that generated the file. This step prevents loading any inaccurate class files which are generated by non-trusted compilers.

Preparation

Next is Preparation phase where, a class or interface is prepared. The static fields get the memory allocated and they are initialised with their default values.

Resolution

Then happens the Resolution process during which the Symbolic references are replaced with actual references. When a program is compiled all the references (field references, method references, object references etc) are replaced with symbolic links pointing to the Constants Pool. These symbolic references are resolved here. If a class refers to another class/interface, the referenced class/interface is Loaded (as described in steps above), if not loaded already, and its own references are loaded and resolved recursively.

During the linking phase the main method is identified. All the static variables are initialised into memory, and references to another classes/interfaces are identified. During the compilation the references are stored as Symbolic references which needs to be replaced by the actual references. This step of replacing symbolic references is optional in the Linking and can be postponed for later phases.

Initialisation

This phase is actually a static initialisation phase. Where all the static variables are assigned to their actual given values. Remember, all the static variables were initialised to their default values during one of the phases in Linking. Along with static variables all the static code blocks are initialised in this phase.

The initialisation always happens in a Top to Bottom direction. Means, Super class static variables and blocks will be initialised first before the child class. At the end of initialisation phase a class ready to be used by JVM.

Memory

Another essential component of JVM is Memory area. All the classes, methods, metadata and the objects are stored in-memory and JVM classifies it into Method Area, Heap Area, Stack Area, PC Registers, and Native Method Stack. Let’s have a look at each of them.

Method Area

Each JVM has it method area, which gets created upon start of JVM. Method area is shared across all the threads in the JVM and there is only one Method Area per JVM and it is logically considered as part of Heap Area.

Method Area is used to store Runtime Constants Pool, Class Name, Immediate Parent information, all the static variables, and actual bytecode of methods, constructors and of some of the special instance initialising methods. Garbage collection of the method area memory is optional to the JVM implementations.

Heap Area

Heap Area, similar to Method Area, is one per JVM and created upon initialisation of JVM. Heap Area is also shared across all the threads of JVM. Unlike Method Area, which stores all the Class level information, the Heap Area is used to store Instance level information.

When a class is instantiated all of its instance variable, data structures require share of memory allocation. This memory is allocated on the Heap. Initial Heap size and maximum Heap size can be provided before start of JVM and the Heap area dynamically grows or compacts in that range. Heap Area is governed by Garbage Collector and actively reclaimed.

Stack Area

The Stack Area is allocated one per thread and not shared. The thread her is Thread of Execution like our programs Main thread. A thread, in process, may call a method which in turn call another for each such invocation the method and its short living variables are stored on Stack in form of Stack Frame.

The method invocations happen in First In Last Out basis. When a method A calls method B, which in turn calls method C, then it is method C who finishes first and method A be the last. That is why stack is used to store the method level variables.

PC Register

The PC Register whose real name is Program Counter Register is an area of memory which is created one per thread of execution and not shared. When a thread of execution is processing, it processes only one method at a time. When a thread of execution enters into a method the physical address of the binary representation of the method instruction is stored into the threads PC Register.

The JVM uses PC to know the current instruction a thread is executing. The PC register instruction also hold address to the next instruction. Once the current instruction is executed the next one is loaded in PC Register.

Native Method Stack

The Native method stack one per thread (only if ever created), not shared and pretty much similar to the Stack Area. As Java is written with C/C++, it allows Java methods calling native language methods. If it does the native methods can’t be stored on Java stack frames and need separate stacks called as C Stack.

Execution Engine

The JVM Execution Engine, is the one who is responsible for utilising all that we discussed above and execute the actual code. The Execution engine gathers the program bytecode for the various memory areas instruction by instruction, interprets it, and run it.
There are three components in the JVM Execution Engine Interpreter, JIT Compiler, and Garbage Collector.

Interpreter

The Java bytecode is human readable set of instructions. To execute instructions on a system or operating system the instructions needs to be transformed into a format that machine understands. That is where the interpreter comes into the picture.

Interpreter is bound to the underlying machine as well as language of the instructions. To elaborate this further, consider bytecode an a code language. Instead of bytecode if the instructions are written in different format, we need to change the interpreter code in order for it to understand our language and still be able to translate it into machine language. This is how some of the JVM Languages work, all they do it compile their code into a certain type of instructions and then customise the JVM interpreter to understand that.

Just In Time (JIT) Compiler

The problem with interpreter is it is slow especially when a method is executed multiple times. On every execution it interprets the same bytecode repeatedly. To overcome this JVM has JIT Compiler.

The JIT compiler compiles the entire block of bytecode into native instructions, and then the Interpreter doesn’t not interpret it again but just executes it. The execution of native instructions are faster compared to instruction by instruction execution by the Interpreter. On top of this the as the method is being called repeatedly the native interpretation of the method is cached and make it even faster.

However, if a method is called only once, Interpreter out-performs the JIT Compiler. Therefor the JVM smartly identifies the methods which are called multiple time and only those methods are sent to JIT Compiler.

Garbage Collector

The Garbage Collector is a JVM’s daemon program that runs in background and reclaims the unused memory. As the program runs depending upon the functionality it create number of objects and also number of objects become unusable. Garbage Collector keeps an eye on such unused objects, destroys them and frees the memory.

A reference variable is a way to know if an object is being used. Suppose, a class A refers to an instance of class B. When class A dies the reference dies with it, but the instance of class B is still remain alive and consume the amount of memory. Thanks to the Garbage Collector to continuously frees memory for us.

The Goods

  • It frees the developer from manually destroying the objects created during the program execution.
  • No need to allocate or deallocate memory manually.
  • Security. No scope for explicit memory management. Hence no errors.

The Bads

  • It needs additional CPU usage.
  • Possibilities of memory leak is always there.
  • When a program run low on memory, the JVM run GC on higher priority. This further causes the program to hang or slow down.
  • No control over on-request Garbage Collection. It is not guaranteed

Summary

In this Introduction to Java Virtual Machine (JVM) tutorial, we have learnt below things

Also Read: What is JVM, JDK and JRE