It's not perfect. Azul's C4 does a lot of work in read barriers, so code that lo...

BenoitP · on July 6, 2016

I heard the read storm problem has been solved by Shenandoah, a GC developed by Christine Flood (who was in the original GC G1 team at Sun in 2001). It is under the RedHat umbrella and should be merged in OpenJDK [1].

Shenandoah uses a forwarding pointer in each object, adding overhead but limiting the problem only to write barriers. Here is Christine commenting on Azul vs Shenandoah [2]

From the talk: average pause is 6-7ms, max is 15ms, and the talk is one year old.

She hints at further developments in a version 2 which would make it entirely pauseless.

She has made another talk at RedHat's DevNation conference a few days ago, but they just won't put the video online arg!

[1] http://openjdk.java.net/jeps/189

[2] https://youtu.be/4d9-FQZZoVA?t=13m11s

astrange · on July 6, 2016

Does Java still have a word in every object allocation for locking it? Adding a 64-bit pointer sounds terribly inefficient.

Did you know Objective-C does locks and retain counting without allocating any extra fields in objects?

sbanach · on July 6, 2016

On hotspot: There are two bits in the header of every object. This is enough for an object that's never been used as a contended lock, CAS operations on the header can be used to handle the locking and that's that. As soon as you actually block on it, a 'real' lock is created (you can't get around the need for a list of threads to wake up as the lock is released) and the header grows to accomodate it. The process is called 'monitor inflation'. At a later date this might be cleaned up by a 'monitor deflation'.

jdmichal · on July 6, 2016

There's a certain amount of work that has to be done for GC, and that work is going to be done somewhere. The question is just what trade-offs you want.

Don't want compacting? You'll pay for it in allocation.

Don't want pausing? You'll pay for it in application threads.

pcwalton · on July 6, 2016

Precisely, and this is what is so often missed in these discussions. Most of the time, when you see claims of GC silver bullets, there's some hidden downside that's being papered over. Latency wins (i.e. "max pause time" or whatnot) tend to be throughput losses. Less copying results in more fragmentation. Value types can result in more copying, reducing performance over pointer indirections through nursery allocations. And so forth.

weberc2 · on July 6, 2016

I don't know that anyone disputes this. The discussions I participate in don't deny this; they mostly talk about whether or not the tradeoffs result in a net gain (if you sacrifice a little from the minority of cases to gain the same amount in the majority of cases, you do indeed have a net gain).

> Value types can result in more copying, reducing performance over pointer indirections through nursery allocations

Having value types means you can pass by copy, but it also means you can allocate on the stack and pass by reference--in other words, you get performant passing without involving the GC.

cliffc · on July 8, 2016

Nearly all the work you allude to is done in other threads - which indeed consume machine resources (CPU cycles, memory bandwidth). If your application does not burn all cores/bandwidth then the GC work is all done on the idle/spare machine resources. At the limit though, indeed you'll have to slow down the Application so that GC can play catchup - and bulk/batch stop-the-world style GC's require less overhead than the background incremental GCs Cliff

needusername · on July 6, 2016

> C4 never pauses

To my knowledge this is false. AFAIK while the C4 algorithm is pauseless the C4 implementation is not. It's just that the pauses are really short.

sievebrain · on July 7, 2016

My understanding: C4 does not pause but the JVM still does for various other reasons and a part of the work on Zing has been forcing down those pause times too.

cliffc · on July 8, 2016

Sorta kinda all of the above. C4 the algo has no pauses, but individual threads stop to crawl their own stacks. i.e., threads stop doing mutator work, but only 1-by-1, and only for a very short partial self-stack crawl. C4 the impl I believe now has no pauses also. HotSpot the JVM has pauses, and yes much Zing was on forcing these pause times down. Cliff