Know the JVM Series -3- When Weaker is Better: Understanding Soft, Weak and Phantom References

How many times have we created various object instances, and assign those to reference variables? We all know very well that Java has automatic garbage collection; so we just play around the reference variables, and once those variables are assigned null or falls out of scope, JVM takes care of it. No need to worry about ‘free’ as in C / C++. It’s a headache-less approach, which minimizes the risk of introducing memory leaks to our programs, and it works out great day in day out in billions of Java applications running out there 24×7. Kudos to John McCarthy for inventing GC for Lisp, and to  all the folks who implemented the concept in Java.

But there are times, where we a little bit of more control over the process of garbage collection. I’m not talking about the dark art of tuning the garbage collector (which I might cover in a later article). This is about programmatic situations where we expect some object instances to be eligible for garbage collection, to release some unwanted memory that might get accumulated over the time. Well, the classic solution of explicitly assigning null could help us out; given that particular object is referred only through that particular ref variable. What if assigning null doesn’t work out for the problem at hand?

Consider a scenario where you are required to implement an object cache. You have some objects which are pretty expensive to build. We would like to keep the objects in the cache as long as we can (just in case if a component of the application needs to use it), but we want un-used objects from our cache to be released when we need memory for other important operations of the application. If we are to implement this using standard (strong) references, this would be quite difficult to implement. The moment we add the object to a collection, we maintain a (strong) reference to the instance, making it not-eligible for garbage collection. If the cache continues to grow, we could run out of memory, making it a memory leak point for the application. Obvious solution would be to limit the size of the cache, and to drop off older objects from the cache, making those instances eligible for GC. Well, that is the mechanism used by many cache implementations out there, and it works out fine. But the drawback is that our cache is limited by size and therefore, even though we have more free memory, we cannot make use of it. Also, since the cache will always have some references in it, it will continue to allocate a significant block of memory for the lifetime of the application (ex. if the cache is fixed to 1024 references, memory consumed by those 1024 references will not be released). Yes, there are ways to address each of these problems (ex. dynamically grow / shrink cache), but that requires some fair deal of coding to get it done. This was a practical issue that I came across in one of my projects in the past.

If only there was a better (and easier) way to get this done…

The solution has been part of the JDK for a long time, from the days of Java 2. Meet java.lang.ref package, where Soft, Weak and Phantom references can be used to resolve such problems. The references that we create using the assignment operator are known as strong references, because the instance is strongly referred by the application, making it ineligible for garbage collection.

Object obj = new Object(); // Strong Ref

Soft, Weak and Phantom references are the weaker counterparts of referencing, where the garbage collection algorithm is allowed to mark an instance to be garbage collected, even though such a reference exists. What this means is that, even though you hold a weak reference to a particular instance, the JVM can sweep it out of the memory if it needs to. This works out great for the problem we discussed before, since instances in our cache will be automatically released if the JVM thinks it needs more memory for other parts.

A weak reference can be created to an instance as follows (all the reference types are available in java.lang.ref package).

WeakReference<Object> weakRef = new WeakReference<Object> (obj);

When we create a weak reference like this, the instance referred by the ref variable obj will be eligible for garbage collection if no strong reference exists. But, if some part of the application needs to use this particular object instance, we can get a strong reference back to it as follows (given that it was not gc’d during the time in-between).

Object strongRef = weakRef.get();

If the reference has been already garbage collected, calling the get method will return null.

Below is a fully working example of using weak references to demonstrate what we have covered so far.

public class TestRef {

    public static void main(String[] args) {

        // Initial Strong Ref
        Object obj = new Object();  

        System.out.println("Instance : " + obj);

        // Create a Weak Ref on obj
        WeakReference<Object> weakRef 
                  = new WeakReference<Object>(obj);
        
        // Make obj eligible for GC !
        obj = null;     
        
        // Get a strong reference again. Now its not eligible for GC
        Object strongRef = weakRef.get();  

        System.out.println("Instance : " + strongRef);

        // Make the instance eligible for GC again
        strongRef = null;

        // Keep your fingers crossed    
        System.gc();    

        // should be null if GC collected
        System.out.println("Instance : " + weakRef.get()); 
    }
}

And the output of the program would be:

Instance : java.lang.Object@a90653
Instance : java.lang.Object@a90653
Instance : null

Now that we have covered why we need weak references, and a practical example of using weak references, let’s cover some theory behind the five degrees of reachability. The following is based on the JDK API Docs.

  1. Strongly Reachable – If we have a strong reference to a particular instance, then it is said to be strongly reachable. Hence, it is not eligible for garbage collection.
  2. Softly Reachable – If we do not have a strong reference to an instance, but we can access the object through a SoftReference (more on that later) to it, then the instance is said to be softly reachable.
  3. Weakly Reachable – If we have neither a strong reference nor a soft reference, but the object can be accessed through a WeakReference, then the instance is said to be weakly reachable.
  4. Phantomly Reachable – If we don’t have any of the strong, soft or weak references to a particular instance (which has not been finalized), but, if we do have a PhantomReference (explained in a while) to the instance, then the instance is said to be phantomly reachable.
  5. Unreachable – If we do not have any of the above references to an instance, then it is unreachable from the program.

At this point, you must be wondering about the difference, and the need, to have three different levels of weaker referencing mechanisms. In the order of strength, the references can be arranged as,

Strong References > Soft References > Weak References > Phantom References

Each of these referencing mechanisms serves a specific purpose. We will look at each of these references, and some related constructs in the API next.

1. Soft References

According to the Java API Specification, the JVM implementations are encouraged not to clear out a soft reference if the JVM has enough memory. That is, if free heap space is available, chances are that a soft reference will not be freed during a garbage collection cycle (so it survives from GC).  However, before throwing an OutOfMemoryError, JVM will attempt to reclaim memory by releasing instances that are softly reachable.  This makes Soft References ideal for implementing memory sensitive caches (as in our example problem).

Consider the following example.

public class TestSoftRef {
    public static void main(String[] args) {

        // Initial Strong Ref
        Object obj = new Object();  
        System.out.println("Instance : " + obj);
        
       // Make a Soft Reference on obj
        SoftReference<Object> softReference = 
                    new SoftReference<Object>(obj); 

        // Make obj eligible for GC !
        obj = null;     
        
        System.gc();    // Run GC

        // should be null if GC collected
        System.out.println("Instance : " + softReference.get());
    }
}

And the output will be…

Instance : java.lang.Object@de6ced
Instance : java.lang.Object@de6ced

As we expected, since JVM had enough memory, it did not reclaim the memory consumed by our softly referenced instance.

2. Weak References

Unlike Soft References, Weak References can be reclaimed by the JVM during a GC cycle, even though there’s enough free memory available.  Our first example on weaker reference models was based on Weak References. As long as GC does not occur, we can retrieve a strong reference out of a weak reference by calling the ref.get() method.

3. Phantom References

Phantom references are the weakest form of referencing. Instances that are referred via a phantom reference cannot be accessed directly using a get() method (it always returns null), as in case of Soft / Weak references.

Instead, we need to rely on Reference Queues to make use of Phantom References. We will take a look at reference queues in a while. One use case of Phantom references is to keep track of active references with in an application, and to know when those instances will be garbage collected. If we use strong references, then the instance will not be eligible for GC due to the strong reference we maintain. Instead, we could rely on a phantom reference with the support of a reference queue to handle the situation. An example of Phantom References is provided under Reference Queues below.

4. Reference Queues

ReferenceQueue is the mechanism provided by the JVM to be notified when a referenced instance is about to be garbage collected. Reference Queues can be used with all of the reference types by passing it to the constructor. When creating a PhantomReference, it is a must to provide a Reference Queue.

The use of reference queue is as follows.

public class TestPhantomRefQueue {

   public static void main(String[] args) 
			throws InterruptedException {

      Object obj = new Object();
      final ReferenceQueue queue = new ReferenceQueue();

      PhantomReference pRef = 
		new PhantomReference(obj,queue);

      obj = null;

      new Thread(new Runnable() {
         public void run() {
           try {
             System.out.println("Awaiting for GC");

  	     // This will block till it is GCd
             PhantomReference pRef = 
		(PhantomReference) queue.remove(); 

             System.out.println("Referenced GC'd");

            } catch (InterruptedException e) {
              e.printStackTrace();
            }
          }
        }).start();

        // Wait for 2nd thread to start
        Thread.sleep(2000);

        System.out.println("Invoking GC");
        System.gc();       
    }
}

The output would be

Awaiting for GC
Invoking GC
Referenced GC'd

5. WeakHashMap

java.util.WeakHashMap is a special version of the HashMap, which uses weak references as the key. Therefore, when a particular key is not in use anymore, and it is garbage collected, the corresponding entry in the WeakHashMap will magically disappear from the map. And the magic relies on ReferenceQueue mechanism explained before to identify when a particular weak reference is to be garbage collected. This is useful when you want to build a cache based on weak references. In more sophisticated requirements, it is better to write your own cache implementation.

In this rather long article, we have covered the basics of the Referencing API provided by Java Specification. The content that we have discussed here are the basics of the referencing API, and you might find it helpful to glance through the Java Docs for the Referencing API.

12 comments

  1. Great write up!
    But I think your example is a little bit incorrect, I don’t think a call to System.gc(); will trigger the GC immediately. It depends on the JVM implementation and depends on the parameters passed to the JVM at startup. Moreover, the javadoc of System.gc() says:
    ——-
    Runs the garbage collector.
    Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects.
    ——
    Another point is the order (based on “strong reachability”) of the those kinds of references
    You said: Strong References > Soft References > Weak References > Phantom References
    It’s only correct on Sun JVM. On JRockit, soft, weak and phantom are treated the same (IIRC).

    1. Hi Truong,

      Great write up!

      Thanks !

      I don’t think a call to System.gc(); will trigger the GC immediately. It depends on the JVM implementation and depends on the parameters passed to the JVM at startup

      Yes, like you have mentioned, System.gc() does not guarantee that the JVM will do a GC cycle. That’s why I mentioned in the comment ‘keep your fingers crossed’ :) . But generally, JVM tries to honour the request, so for a small application like this, it is highly likely (depending on the VM implementation again. One can even ignore such a request, according to spec). So it works for most of the cases (if the VM is not busy with something else). That’s why I used it as an example.

      Another point is the order (based on “strong reachability”) of the those kinds of references
      You said: Strong References > Soft References > Weak References > Phantom References
      It’s only correct on Sun JVM. On JRockit, soft, weak and phantom are treated the same.

      You are correct again. The specification does not mandate that a VM should wait until all the memory has been consumed to finalize soft referenced instances. So VM implementers do not have to follow that, and in such cases, they can treat soft references as same as weak references. The Java Doc for Soft References reflects that. I was aware about the relaxed nature of the specification, but I wasn’t aware about specific implementations that ignored this (like you have mentioned, JRockit). Thanks for sharing this information.

  2. Usually I don’t comment on blog, but because I’ve just finished the book JRockit – The Definitive Guide, it said clearly about everything you wrote here. I just want to emphasize some specific JVM implementations. Stuff like these are the corner cases for JVM implementor, they are varied by vendors.
    Anyway, many thanks for a great article.

  3. Hm, I haven’t programmed any thing related to these special references yet in my projects, but your article is exciting me to do that, :)

    Can you suggest some more areas where it or WeakHashMap can be taken advantage of?

    Too much thanks for this article.

    –Deepak

  4. Hi Yohan Liyanage

    I have read all your Know the JVM Series and found it very useful many thanks for such a wonderful series.your way of explanation is crisp and clear looking forward for your upcomming articles.

    -Anil

  5. Pingback: iPod

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>