It's hard to square these articles with the reality I see on the ground: our baseline memory usage for common types of Java service is 1 GB, vs 50 MB for Go. We do have a few mammoth servers at the top end in both languages though (e.g. 75 GB heaps)
The deploy JARs have 100+ MB of class files, so perhaps it's a function of all the dependencies that you "need" for an enterprise Java program and not something more fundamental.
These blog posts also present AOT as if its just another option you can toggle, while my impression is that it's incompatible with common Java libraries and requires swapping / maintaining a parallel compiler in the build toolchain, configuring separate rules to build it AOT, etc. I don't have actual experience with it though so I could be missing something.
> baseline memory usage for common types of Java service is 1 GB, vs 50 MB for Go.
There's nothing inherent to the JVM that'd need 1GB of memory footprint - the jars are compressed and loaded on demand so unless your server needs all of them instantiated at once that doesn't explain the memory usage. Typically the region where class metadata are stored is the PermGen and it is not more than 256M for even the most heavyweight programs. My guess is that in your case the Xms (min heap) is set to 1Gb and so that's what you see as baseline.
OpenJ9 is designed for lower memory usage than the hotspot VM and also supports AOT by the way if memory usage is your concern.
Not disputing that the JVM needs more memory than anything else because it comes with a lot of features and it is very easy to use more but many J2EE apps can be made to work with a Gig and you do get a lot of features for that. And memory is cheap so you can decide what to optimize for.
You're almost certainly right about Xms being set to 1GB. However, even if you can experimentally set it lower, the first time the JVM app hits GC pressure, the first thing anyone is going to try is bumping that back up over 1GB to give it some breathing room.
Memory may be "cheap", but wasting 950MB of memory per process because the GC might flip out at the wrong time isn't cheap when you multiply it out to many processes.
Also, I find the claim that the JVM starts and is running in 90ms very dubious. I would like to see an all-in timer, something like run each in a container, and see which completes first from container startup to shutdown.
> first time the JVM app hits GC pressure, the first thing anyone is going to try is bumping that back up over 1GB to give it some breathing room.
Yeah but in that case your app legitimately needs that much. Not the JVM. Secondly unless your app is extremely performance sensitive setting Xms to lower value doesn't make the GC flip out - it's not as if the GC is going to collect repeatedly to stay within your lower bound - on the contrary it will expand the heap upwards until Xmx limit. Sure there will be a cost to expand the heap above Xms up to Xmx but it is not at all significant due to how clever the GC is.
The OS tells processes when it needs resources freed and the JVM will tidy up then. Otherwise it’s lazy and that is correct. A JVM can run on 10s of MB of RAM and start in milliseconds. This is as-of JVM 8 that’s why vague, years ago, and it should only be better now if modules are in.
The module system actually makes start-up slower (the designers admitted in some presentation) because of the checks it has to perform. Java 9 was a lot slower than 8 and subsequent versions have gotten better but are still slower than 8.
Those numbers are without CDS on Java 8. This is what I get on my machine – Java 13 is 10% faster than Java 8 if you use a jlinked JRE, or 15% slower otherwise:
$ JAVA=/usr/lib/jvm/java-8-openjdk/bin/java; for i in {1..100}; do time -p "$JAVA" -Xshare:on Hello; done 2>&1|grep real|awk 'BEGIN { sum=0 } { sum += $2 } END { print 1000 * sum / NR " ms" }'
90.7 ms
$ JAVA=/usr/lib/jvm/java-13-openjdk/bin/java; for i in {1..100}; do time -p "$JAVA" -Xshare:on Hello; done 2>&1|grep real|awk 'BEGIN { sum=0 } { sum += $2 } END { print 1000 * sum / NR " ms" }'
106.4 ms
$ JAVA=/tmp/jlinked-java13-jre/bin/java; for i in {1..100}; do time -p "$JAVA" -Xshare:on Hello; done 2>&1|grep real|awk 'BEGIN { sum=0 } { sum += $2 } END { print 1000 * sum / NR " ms" }'
82.3 ms
Interestingly, Java 7 is faster than any of the newer versions.
$ JAVA=/usr/lib/jvm/java-7-openjdk/bin/java; for i in {1..100}; do time -p "$JAVA" -Xshare:on Hello; done 2>&1|grep real|awk 'BEGIN { sum=0 } { sum += $2 } END { print 1000 * sum / NR " ms" }'
80.6 ms
...and Java 6 is even faster:
$ JAVA=/opt/java6/bin/java; for i in {1..100}; do time -p "$JAVA" -Xshare:on Hello; done 2>&1|grep real|awk 'BEGIN { sum=0 } { sum += $2 } END { print 1000 * sum / NR " ms" }'
62.2 ms
So I appreciate that they're finally working on the startup performance regressions, but they apparently have some way to go before achieving parity with the famously lightning-fast startup time of older Java releases.
What's funny about these comments is that I originally ran the JVM on machines with something 16MB of RAM, and I've run it on smaller devices with much less RAM. ;-)
You can run the JVM on very small heaps. It's just that nobody does.
This is really odd. One of the biggest criticisms of Java is that it consumes so much memory, for which the rebuttal is the JVM can be tuned to use less! But no one does this in practice, so I assume there must be a reason that renders the “tuning” argument to be penny wise and pound foolish. I.e., you end up giving up something more valuable in exchange for that lower memory value. It seems like these Java apologists are trying to give the appearance that Java competes with (for example) Go in memory usage and startup performance and runtime performance when in reality it’s probably more like “you get to choose one of the three”, especially with respect to the top level comment about how the AOT story deceptively requires hidden tradeoffs.
Most developers don't think about tuning the runtime because performance is not one of their acceptance criteria... at best what happens is you have a JVM savvy ops engineer who looks at it in production and recommends some tuning options... these often then get rejected by the devs because they don't understand the features and are afraid tweaking things will break and cause them problems. So they tell the ops team to throw more/bigger servers at the problem.
"nobody" was deliberately an overly extreme statement. As implied by my statement, obviously some people do tune their apps, but the people complaining that the JVM needs gigabytes of memory just to run are clearly not in that group.
In late 90s I ran our JUG website with homegrown CMS written in Java with servlets on slackware linux server running also MySQL and it had only 16MB of physical memory for everything. We are _very_ spoiled nowadays and tuning is simply not necessary for most of tasks.
The current default collector doesn't give memory back to the OS. So if you have several peaky memory usage apps, you can't try and get them to elastically negotiate heap size tradeoffs with one another - you need to pack them in with max heap limits manually. That requires a lot of tuning, and it's still less than theoretically optimal.
We fork a child JVM to run our peakiest jobs for just this reason. Also help keep services up when something OOMs.
> The current default collector doesn't give memory back to the OS.
That's a pretty irrelevant point, as the current default collector in Sun's JVM does reduce the Java heap based on tuneable parameters. While it doesn't return the virtual address space to the OS, that generally doesn't impact memory consumption on the "current default" OS's. (Certainly there are specialized cases where you might care about that, and for that there are other collectors and other JVM's for that matter.)
> So if you have several peaky memory usage apps, you can't try and get them to elastically negotiate heap size tradeoffs with one another - you need to pack them in with max heap limits manually.
That's simply not true. The default GC does adjust heap size based on utilization, so you absolutely can run peaky apps that manage to negotiate different times for their peaks in a constrained memory space.
> We fork a child JVM to run our peakiest jobs for just this reason.
Well, I guess that's one way to address the problem, but you've unfortunately misunderstood how your tool works.
> Well, I guess that's one way to address the problem, but you've unfortunately misunderstood how your tool works.
No, I don't think you have the context.
The peaky process will be killed for OOM by Linux; we explicitly don't want services to die, which they would if they lived in the same process. So, the services live in the parent process, and the peaky allocation happens in the child process. For context, at steady state the services consume about 2GB, whereas the peaky process may consume 30GB for 30 minutes or a couple of hours. We use resource-aware queuing / scheduling to limit the number of these processes running concurrency.
It's true that G1 will, under duress (e.g. under micro-benchmark scenarios with explicit calls to System.gc()), give up some heap to the OS, but it's not what you see in practice, without exceptional attention paid to tuning. Process exit is particularly efficient as a garbage collector though.
The OOM killer kicks in when you run out of virtual memory, not physical memory. If you genuinely have processes that only periodically actually need their heap to be large, but don't return unused memory to the OS, you can simply allow the OS to page out the address space that isn't currently used. There are subtle differences between returning address space to the OS, and simply not using address space, but they aren't the kind of differences that impact your problem.
G1's heap sizing logic is readily adjustable. The old defaults did rarely return memory to the OS, but you could tune them to suit your needs. Either way, this is no longer accurate an accurate representation of G1's behaviour as the runtime has adapted to changing execution contexts: https://bugs.openjdk.java.net/browse/JDK-8204089
If the full amount paid for your developer including office space, taxes, salary, benefits is 200k which pays for 48 weeks x 40 hours you are paying 104 dollars per hour. Ram probably costs you 2 to 4 dollars per gb.
Saving 1gb memory is worth it if it does not cost your developer more than 2 minutes to figure out.
RAM is billed by the hour (or minute?) by cloud providers, and it’s 1GB per process, not 1GB total. If you’re running 20 virtual servers, that’s 20 GB. Moreover, if you’re shipping a desktop app, it’s 1GB * number of licenses. Finally, the “it’s not worth tuning” argument proves my point—Java proponents will tell you that Java doesn’t need to consume that much memory—you just have to tune it, but no one tunes it because it’s too hard/not worth it.
Cloud providers generally don't charge for RAM independently of other resources like CPU... and RAM isn't generally purchasable in 1GB increments.
Accordingly, shaving 1GB off all your runtimes won't save you much money.
There are more recently developed exceptions to that rule: container packing & FaaS offerings like AWS Lambda. Unsurprisingly, this has lead to the emergence of Java runtimes, frameworks, and libraries that are significantly more miserly with their use of memory (and are also designed for faster start up times as well).
That said, while a lot of people complain about their cloud bill, most places I've seen have payrolls and/or software licensing costs that make their cloud bill look like a rounding error. Sure, when you reach a certain size it is worth trying to squeeze out some extra ducats with greater efficiency, but more often than not, your efficiency concerns lie elsewhere.
Saying "no one tunes it" was deliberately overstating the case. If "everyone thinks the JVM needs 1GB just to run", then yes, "no one tunes it". Neither statement is true, but they both likely reflect some people's context.
But this, of course, applies to every project in any language and is in no way limited to Java or OOP. It is always balance between delivering functionality now with some solution or later MAYBE better optimized.
Then in round two, optimized solution may be harder to maintain and extend, or further optimization may be de-prioritized to some new functionality with higher business value. We all know it.
You are trying to project your belief to all Java applications and this simply does not work. There are both good apps and not good apps and there are many metrics to evaluate "good".
It appears to be specific to Java. Other languages don’t seem to exhibit high memory usage with the same frequency or severity as Java, and that’s not because developers of other languages spend more time optimizing.
If indeed this observed memory bloat is just a matter of poorly written Java apps, then that’s even more interesting. Why does it seem like Java has such a high incidence of poorly written apps relative to other languages? Is it OOP or some other cultural element?
> It appears to be specific to Java. Other languages don’t seem to exhibit high memory usage with the same frequency or severity as Java, and that’s not because developers of other languages spend more time optimizing.
Clearly you haven't looked at the memory overhead in scripting languages. ;-) They generally have far more object overhead, but their runtimes are designed for a very different use case, so their base runtime tends to be simple and tuned for quick startup. There are JVMs designed for similar cases with similar traits. It's just not the common choice.
> If indeed this observed memory bloat is just a matter of poorly written Java apps, then that’s even more interesting. Why does it seem like Java has such a high incidence of poorly written apps relative to other languages? Is it OOP or some other cultural element?
Your prejudice is showing in the other possibilities you haven't considered: perhaps memory intensive apps are more likely to be written in Java than other languages? Perhaps Java is more often selected in cases where memory utilization isn't a significant concern?
You can find a preponderance of poorly written apps in a lot of languages... JavaScript and PHP tend to be the butt of jokes due to their notoriety. Poorly written apps isn't a language specific phenomena.
For a variety of reasons that don't involve memory (the design of the language, the thread vs. process execution model, the JIT'd runtimes, the market penetration of the language), as well as some that do involve memory (threaded GC offers a great opportunity to trade memory for faster execution), Java applications are often long running applications that execute in environments with comparatively vague memory constraints, and so the common runtimes, frameworks, libraries, and programming techniques, etc., have evolved to trade memory for other advantages.
But if you look at what people do with Java in constrained memory environments, or even look at the hardware that Java has historically run on, you'll plainly see that what you are observing isn't intrinsic to the language.
> It appears to be specific to Java. Other languages don’t seem to exhibit high memory usage with the same frequency or severity as Java, and that’s not because developers of other languages spend more time optimizing.
It's because Java has strict memory limits. The limit of a bad C++ app is your machines whole memory (in theory more), so most people never notice if an app continues to leak memory or has weird memory spikes where it needs ten GB instead of one for a minute before it goes back to normal. Java forces you to either look at it or go the lazy route and just allocate more RAM to the JVM. Whatever you choose, you at least have to acknowledge it, so people tend to notice.
Sun's JVM has a setting for maximum heap size, but there are of course lots of other JVM's, and there are lots of other ways to consume memory.
> The limit of a bad C++ app is your machines whole memory (in theory more)
Well, that depends. Most people run operating systems that can impose limits, and you can certainly set a maximum heap size for your C++ runtime that works similarly to Java's limit. You just don't tend to do it, because you're already explicitly managing the memory, so there's no reason for setting a generalized limit for your execution environment.
> so most people never notice if an app continues to leak memory or has weird memory spikes where it needs ten GB instead of one for a minute before it goes back to normal
It also helps that short running apps and forking apps tend to hide the consequences of a lot of memory leaks, and in the specific case of C++, where memory mismanagement is often a symptom or a cause of severe bugs, you tend to invest a lot of time up front on memory management.
Just have a look on outbreak of Electron apps. People choose to use language they know and with which they can deliver value effectively instead of C or assembler.
This is actually a very good point but I don't know how this breaks down exactly. Can you give an example of a virtual server suitable for go vs java and the respective price points from a common provider?
I think you misunderstand. The argument is simply that if it were as important as some suggest it is, there'd be an effort to use memory much more efficiently. Java does run on incredibly small memory footprints, but the runtime that most people use deliberately trades memory for other advantages, and even then people choose to operate with far more memory than it requires.
That seems like empiricle evidence that other factors are far more important.
> One of the biggest criticisms of Java is that it consumes so much memory, for which the rebuttal is the JVM can be tuned to use less! But no one does this in practice, so I assume there must be a reason that renders the “tuning” argument to be penny wise and pound foolish
Nope, the main reason is simply that memory is cheap and plentiful, so there is simply no reason to spend any effort to tune base memory usage when writing the kind of applications Java is typically used for.
It's possible that your 4MB machine went into swap VM if it really consumed minutes of time. Java 1.2 would have been circa 1998 where one might've expected 32MB or better on a desktop-class machine.
To understand numbers on specific hardware, I have in the past started Tomcat bare (not even the manager app), and then Tomcat with my specific web app. The startup time has indeed been in the milliseconds ( I happen to embed Tomcat in products so I have had to understand such things).
In my use cases, startup time has been of less importance than execution time, the JIT pause, and the GC pauses. So I tend to run the jvm is server mode.
5000 sq ft?? My family of four lives in 1000 and it’s only a bit cramped. Each kid has their own room. I knew homes in the US are larger than in Europe but I’m surprised about the difference.
I live in the US and 5000 sq ft is unheard of even in most affordable of all places. 3000 is more like it - still big but only in cheaper places. As you get to costlier real estate markets 1800 sq ft is not uncommon.
New houses for a family of 6 tend to be 2300-3000 sq feet. I live in an older house with a family of 6 that is 1800 sq feet. 5,000 sq feet built for a family of 4 tends to be limited to penthouse suites that cost tens of millions of dollars.
It’s been the case ever since I can remember that “Java’s basically as efficient as native, it’s super fast” but all the actual java software I encounter is a slowish, bloated memory hog. I don’t know why, but that’s how it is.
The style of programming for the dominant JVM languages (Java, Scala, Kotlin) involves an over reliance on churning through a lot of short-term garbage. I think this partially due to how annoying the platform is to actually use. Incredibly complex and chalk full of programmer pitfalls (like type-erasure). In fact, the majority of devs have no clue at how the JVM works, treating it as just “magic.”
I would imagine that Go’s GC is worse than the tunable JVM ones, but go isn’t powerful enough that one would ever be tempted to program in an abstraction heavy style. While I would argue it takes a lot more work to program in Go ca, day Scala, I think the constraints imposed lead to better software. I for one have never seen a JVM desktop or business app that worked super well. The JVM manages to be a highly optimized efficient platform in which almost exclusively slow, laggy, and memory hungry apps are produced. With that being said, it’s a highly productive platform (if you’re using something besides Java), especially for backend business apps.
GC is complicated because what works well for one thing might not work well for other things. GC can be both about clever things and about hard trade-offs. That said, I'll try to talk about GC without having a religious war erupt. Keep in mind, something here might be wrong.
Go's GC is tuned for low latency. People hate pause times. They hate it even more than they hate slowness. Go makes a trade-off for short pause times, but in doing so they do sacrifice throughput. That's great for a web server. With a web server, you care a lot about pauses offering a bad experience. A 500ms pause is going to give a customer a bad experience. That's less good for batch processing. Let's say you're running a backfill job that you expect to take 10-15 hours. You don't really care if it pauses for 1 minute to GC every 30 minutes. No one will even know. However, you will know whether it took 10 hours or 15 hours.
Go's GC is meant to be simple for users. I think the only option is "how much memory should I be targeting to use?" That makes it dead simple for users. I think the Go authors are right that too many tuning knobs can be a bad thing. People can spend months doing nonsense work. Worst is that knobs are hard to verify that anything is really different. If you're running a web server, was the traffic really the same? What about other things happening on the machine? Wouldn't you rather be writing software instead?
Go's GC is a non-copying GC. One thing this means is that Go's (heap) memory allocation ends up being very different because Go's memory is going to become fragmented. So Go needs to keep a map of free memory and allocations are a bit more expensive. Java (with GCs like G1) can just bump allocate which is insanely cheap. This is because Java is allocating everything contiguously so it just needs to move a pointer. How does that work once something becomes freed? Java's G1 (the default in the latest LTS Java) will copy everything that's still alive to a new portion of memory and the old portion is then just empty. You kind of see this in Go's culture. Web frameworks obsess about not making heap allocations. Libraries often have you pass in pointers to be filled in rather than returning something.
Go misses out on the generational hypothesis. The generational hypothesis is one of the more durable observations we have about programming - that most things that are allocated die really quickly. C# and Java both use generational collectors by default and they've done way better than what came before. C# and Java don't have as-low pause times as Go, but part of that is that they're targeting other things like throughput or heap overhead more.
Go doesn't need GC as much. Go can allocate more on the stack than Java can and, well, Go programmers are sometimes a bit obsessed with stack allocations even when it makes for more complicated code. Having structs means creating something where you can just have contiguous memory rather than allocating separate things for the fields in your object. Go's authors have observed that a lot of their objects that die-young are stack allocated and so while the generational hypothesis holds, it's a bit different. Go has put a good amount of effort into escape analysis to get more stuff stack allocated.
Java has two new algorithms ZGC and Shenandoah which are available in the latest Java. They're pretty impressive and usually get down to sub-millisecond pause times and even 99th percentile pauses of 1-2ms.
Go's new GC was constrained by the fact that "Go also desperately needed short term success in 2015" (Rick Hudson from Google's Go team) and the fact that they wanted their foreign function interface to be simple - if you don't move objects in memory, you don't have to worry about dealing with the indirection you'd need between C/C++ expecting them to be in one place and Go moving them around. Google's going to have a lot of code they want to use without replacing it with Go code and so C/C++ interop is going to be huge in terms of their internal goals (and in terms of what the team targeted regardless of whether it's useful to you). And I think once they had shown off sub-millisecond pause times, they were really hesitant to do something that might introduce things like 5ms pause times. I think they might have also said that Google had an obsession with the long-tail at the time. Especially at Google, there's going to be a very long tail and if that's what people are all talking about and caring about, you end up wanting to target that.
Go has tried other algorithms. They had a request-oriented-collector and that worked well, but it slowed down certain applications that Go programmers care about - namely, the compiler. They tried a non-copying generational GC, but didn't have a lot of success there.
Ultimately, Go wanted fast success and to solve the #1 complaint people had: extreme pause times (pause times that would make JVM developers feel sorry for Go programmers). Going with a copying GC might have offered better performance, but would have meant a lot more work. And Go gets away with some things because more stuff gets stack allocated and Go programmers try to avoid heap allocations which would be more expensive given Go's choices (programming for the GC algorithm).
I don't think that JVM languages have a programming style that lends themselves to an over-reliance on churning through short-term garbage. Well, Clojure probably since I think it does go for the functional/immutable allocate-a-lot style. Maybe Kotlin and Scala if you're creating lots of immutable stuff just to re-allocate/copy when you want to change one field. That doesn't really apply to most Java programs. And I have covered the way that Go potentially leads to more stack allocations. However, I don't think most people know how their Go programs work any more than their JVM programs and this really just seems to be a "I want to dislike Java" kind of thing rather than something about memory.
Java programs tend to start slow because of the JVM and JIT compilation. Java has been focused on throughput more than Go has (at the expense of responsiveness). That is changing with the two latest Java GC algorithms (and even G1 which is really good). Java is also working on modularizing itself so that you won't bring along as much of what you don't need (Jigsaw) and AOT compilation. Saying that Java is slow just isn't really true, but it might feel true - things like startup times and pause times can inform our opinions a lot. There's absolutely no question that Java is a lot faster than Python, but Python can feel faster for simple programs that aren't doing a lot (or are just small amounts of Python doing most of the heavy work in C).
I mean, are you including Android in "all Java apps"?
Java, C#, and Go are all really wonderful languages/platforms (including things hosted on them like Kotlin). They're all around the same performance, but they do have some differences. I think Go should re-visit their GC decisions in the future, especially as ZGC and Shenandoah take shape, but their GC works pretty well. But there are certainly trade-offs being made (and it isn't around language features that make the platform productive for programmers). I think GC is very interesting, but ultimately Java, C#, and Go all have very good GC that offers a good experience.
If a programmer really cares for performant batch-processing (not sure it's a direction we're going for anymore but), you'll just reuse your objects and dataspace. Understanding Go in order to minimize GC is pretty trivial and the standard library includes packages and tooling exactly for such purposes. Most likely, there are only a few hotspots needing to be tended to like this.
So this sounds like Golang is optimized for what people love about computing: fast response time. Also providing programmers with basic performance "out of the box", which is another good tradeoff for me.
The tradeoff works best when you make simple designs, not huge behemoths. Spending more time reiterating clever designs, rather than jamming the keyboard until done.
Here are measurements for an extreme case of a short-running simple program:
$ cat Hello.java
class Hello { public static void main(String[] args) { System.out.println("Hello from Java!"); } }
$ cat hello.py
print("Hello from Python!")
$ time /usr/lib/jvm/java-13-openjdk/bin/java -Xshare:on -XX:+TieredCompilation -XX:TieredStopAtLevel=1 Hello
Hello from Java!
real 0m0.102s
user 0m0.095s
sys 0m0.025s
$ time python3 -S hello.py
Hello from Python!
real 0m0.034s
user 0m0.020s
sys 0m0.013s
It's a bit faster if you create a custom modular JRE with jlink:
$ /usr/lib/jvm/java-13-openjdk/bin/jlink --add-modules java.base --output /tmp/jlinked-java13-jre
$ /tmp/jlinked-java13-jre/bin/java -Xshare:dump
$ time /tmp/jlinked-java13-jre/bin/java -Xshare:on -XX:+TieredCompilation -XX:TieredStopAtLevel=1 Hello
Hello from Java!
real 0m0.087s
user 0m0.050s
sys 0m0.035s
Yes, I often write small command-line tools and that's what I found. Profiling seemed to indicate that doing anything at all in Java, such as reading a config file, is fast the second time but super slow the first time.
You can test my assertion simply by writing a "hello world" in Python and Java.
I would imagine that Go’s GC is worse than the tunable JVM ones, but go isn’t powerful enough that one would ever be tempted to program in an abstraction heavy style.
That is a compliment.
One of the design criteria for Go was to push programmers away from being architecture astronauts whose abstraction heavy style results in having no idea how much work you are hiding in all of the layers. And that design criteria was based on analysis of the actual failure modes of significant software projects.
Therefore the fact that real world Go apps generally manage to avoid that failure more is a testament that they succeeded in this design criteria.
Interestingly though, some of the worst excesses of overdesigned architectures in Java happened before Java even had generics, and heavy use of interfaces was a standout feature.
So Go has absolutely everything a true architecture astronaut needs. In my view, the big difference is cultural and has nothing to do with language features.
Also, the lack of generics often forces you to use reflection to avoid code duplication, which makes for very complex, error prone code.
As an example consider the sort package in Go's standard library. Here's what they had to write in order to generically swap two elements in a []interface{}:
I suspect a problem with Java is it's culture and ecosystem evolved in the early 90's when memory, disk, and processor speeds were doubling every few months.
Go in the late 2000s after Amdahl's law had been bitch slapping everyone for 10 years. Network bandwidth increased but laterncy didn't. Processor transistor counts increased but clock speeds didn't. Memory increased but memory bandwidth didn't keep up.
C++ templates are not the gold standard of generics. In fact, they are a glorified text substitution system, totally broken and flawed.
Generics are still absolutely necessary for ANY serious programming language. They are required by the DRY principle. Algorithms must not need be repeatedly implemented for every new data structure. To say that generics are astronaut is simply unprofessional.
>One of the design criteria for Go was to push programmers away from being architecture astronauts whose abstraction heavy style results in having no idea how much work you are hiding in all of the layers.
So they chose to create a language where there's no way to get around creating vast amounts of boilerplate. This boilerplate heavy code style results in having no idea of the true logic you are hiding in all of the boilerplate.
Right, you only pay for the memory reachable when the slow / blocking part do the garbage collection runs, so a lean program that churns through small objects can be very responsive.
Work well? I program professionally in Scala and IntelliJ is fucking terrible. Me and everyone on my team yells “fucking IntelliJ” and restarts the damn thing at least once a day.
Practically every large site you use (Google, Amazon, FB, GitHub, Apple, Twitter, and many more) are using Java/JVM in the backend in some non-trivial capacity.
A big part of that is that this benchmark is not representative of what general software is like, not in Go, not in Java.
For one thing, most software isn't churning on an array of integers. It's allocating and managing various types of objects.
I haven't written a ton of java, but I suspect that the various reflection-heavy frameworks also impose a lot of overhead into the resulting software that isn't necessarily inherently part of Java the language.
Reflection actually isn’t too inefficient, especially compared to using dynamic languages. I think a lot of the inefficiency is a bunch of small inefficiencies which combine together to make it very heavyweight.
Obtaining a Method object is, I think, pretty slow. But calling through one is fast - I'm not sure what happens exactly, but after some number of calls, some sort of JITting or some such happens and a call via a Method object is as fast as as through a handwritten equivalent (which does the same unpacking of arguments etc).
The overhead of, for example, invoking a method via reflection is measured in single-digit nanoseconds. Still not as fast as direct invocation, but unlikely to be a bottleneck unless you perform 10-of-millions of them per second.
Although I can't tell you why, it's been my experience that firing a simple method via reflection in both .NET and the JVM is roughly 2x the time of direct calls (or of calls via the 'dynamic' keyword in C#).
Java has the misfortune (especially in older versions of the language) of limited syntax that encourages bloated frameworks that promise to marginally decrease boilerplate, combined with a VM that's robust enough to actually handle the bloat fairly gracefully.
If you ignore that stuff and write Java programs in a straightforward way (accepting its verbosity, or possibly using code generation instead of runtime frameworks in places where lisp folks would use a bunch of macros), and avoid unnecessary heap allocations, they can be pretty fast.
It also depends on how much effort was put into optimizing the system. Quite often , people abuse -Xms and -Xmx by setting it to an unnecessarily high number. Or not checking their code for memory leaks. I am often surprised by how many java programmers have never run their code through a profiler.
I doubt that's true. The actual Java software you encounter includes much of Amazon, Apple, Google, and Netflix. I mean, everything is relative, but it's faster than Go and less bloated (what does that even mean?) than Python or JS.
I just mean all the developer tools, consumer-facing GUI software, and a wide variety of server software & daemons. All of those that I’ve seen in Java over the last 20ish years. Perhaps the tech giants are all writing Java that’s nearly as low-footprint and fast as C, I dunno, no way to tell from this side. I can only sample programs I’ve had access to.
[edit] bloated = much higher memory requirements than I’d expect for what they’re doing (yes I know how to configure jvm memory and such), affects the performance of other software on the same machine more than I’d expect, if running on battery, fairly likely to chew through it while doing nothing.
You have access to Java programs all the time; much of the software you depend on is Java. You're talking about software you run locally on your desktop -- that's something quite different. Desktop applications are a relatively exceptional use of Java these days, although they're still doing better in both performance and footprint than the fashionable JS apps.
> affects the performance of other software on the same machine more than I’d expect, if running on battery, fairly likely to chew through it while doing nothing.
> You're talking about software you run locally on your desktop -- that's something quite different.
> > and a wide variety of server software & daemons
Mind, I don't hate Java or anything, I'm just saying, I've been hearing folks tell me it's definitely fast & nimble while watching it not actually be fast or nimble for north of two decades. I don't know where theory and practice diverge, but with significant frequency when it comes to Java, they very noticeably do, and always have. And no, not just on desktop applications, but sure, exceptionally so there.
And god, I'd hope it'd do better than any scripting language whatsoever by practically any performance measure. If not, what's the point? Those are basically never reasonably fast at real work unless they're just tying together C libraries that're doing the heavy lifting.
As for the reference, yes, there are always benchmarks, always papers, always articles, always folks who'll patiently explain why Java must, certainly, be incredibly speedy. Seriously, 20+ years of reading that. Meanwhile when I use Java software it's usually not half as impressive on the performance and respectfulness to hardware fronts as those imply it must be. Again, I am not sure why this is, but it's been so consistently true for so long across so many types of Java program I've used that I don't think I've just managed to somehow use every single one of the worst-written Java programs in existence, or something.
> I've been hearing folks tell me it's definitely fast & nimble while watching it not actually be fast or nimble for north of two decades
You're going to have to get over some perceptual bias to ever understand the people saying thiese things. It's absolutely true that Java's weak points sit in many of the areas that are directly observable (startup time, UI - Swing etc.) while its strong points mostly sit in the ones that aren't (how fast code runs after 10 cycles of JIT profiling etc). But my suggestion is that its worth doing that because just like a million other things in the world that you can't directly observe, it's very valuable to understand nonetheless their underlying properties.
Just for a reference point, I work in a space where performance is very critical and the code I write takes ages to start (where ages = 1-2 seconds) but then competes head to head with C code in efficiency. It's probably half as efficient as well optimised C code but it brings the huge other benefits that the JVM has so it's ultimately easily worth that tradeoff to me.
If you're aware of a language/platform with a better productivity/performance ratio, I'd love to know about it. I have not found one. The only languages/platforms that can beat Java at the workloads it excels at -- large concurrent server applications -- can only do so with great effort. If Java is slow and sluggish, then JS, Python, Go, Haskell, C# and Erlang are more so, and there's a reason companies prefer it to C/C++ for a great many applications. That Java is speedy isn't theory. The world's largest tech companies and largest non-tech companies stake many of their critical applications and a lot of money on it, and depend on it being speedy. Smaller companies (like Twitter, Soundcloud, Spotify, Uber and recently GitHub) regularly migrate to Java when they need more speed or better hardware utilization. So you're not personally convinced because you just don't feel it? That's OK. Those that need to be convinced are.
I’m not saying Java is slow or fast, but just because it is used by large companies doesn’t say anything. They have the resources to throw hardware at it.
They also need to serve a lot of people. Java also uses less energy than Go, significantly less than JS, and a whole lot less than Python [1]. Those big companies also need to care about that (not to mention about Java's combination of performance, productivity, serviceability and stability).
Well, at least there's a comparison. It's incredibly annoying when zealots characterize their preferred <whatever> with "high performance", "lightweight", or, worst of all, "powerful".
In addition to the inherent lack of specificity, all of those are relative terms to <something else> but that is always omitted.
I think it's a combo of several different things, some of them real, but mostly wrong:
- most people judge by their personal realtime interaction with Java which is
generally dominated by startup times (command line / desktop) or UI (Swing)
both of which are Java's weak points. They don't have any way to experience
the efficiency that happens silently under the covers after JIT has occurred
in a server side application so they assume it doesn't exist.
- a huge amount of Java software is enterprise garbage that would be garbage
in every lanuage - Java just happened to win the lottery of adoption there
- the ecosystem DOES encourage sucking in a HUGE amount of libraries. Java
developers will pull in a 10mb dependency because it gives them a just
slightly nicer API to something because doing that is so easy. But the
result is enormous runtime library footprint.
About 80% of this is perception bias, about 20% of it is real IMHO. Java is
amazing to use for high performance work, and it really does compete with
fully compiled languages while maintaining many of the benefits of interpreted
languages (true cross platform, hot-swap code, debugging, etc.)
I think it has much more to do with the fact that there isn't a really great desktop UI framework for Java. Electron is a slow memory hog and people tolerate that fine.
The only Java deskop app I've used that doesn't feel bloated is Charles. It _still_ has that Swing "almost looks like OSX" look to it but otherwise, it's completely usable. I use IntelliJ nearly every day, and while it's insanely powerful, it feels so incredibly bloated. I'm running on a 6th gen Thinkpad X1C (Arch Linux), a 2017 Macbook Pro (both with 16G of RAM) and a brand new Ryzen 7 with 32G of Ram. It just isn't snappy. I've always wanted to write a JavaFX app and compare it across platforms. I've seen some really nice demos.
The slowest part of IntelliJ is scanning the project. I much prefer Rider over VisualStudio though. It doesn't feel any slower to me despite being Java.
You are comparing apples to oranges, then. A modern Java framework like Spring Boot, Micronaut or Quarkus can start up with a 20mb heap and will require a Docker container or jar in the range of tens of megabytes.
Now, clearly to do more complex processing you'll want to give it more than tens of megabytes of heap, but if you are running Java applications that needs hundreds of megs of dependencies and 1gb of memory then you're probably running some legacy application server.
The company I worked for a couple of jobs ago used to host approximately 130-140 java based web applications, all using Spring, on just 4 servers (primarily hosted on 2 of those) With only a few exceptions the java webapp was configured for a 256MB heap. Devs followed good practices to avoid creating excess garbage, and kept the number of libraries down to a minimum.
They also struck a good balance between doing transformations in-memory, and on the database side.
side note: There was _one_ web app that we had to host that wasn't developed in-house. It was absolutely god-awful, badly scaling, websphere based crap (I have no idea if Websphere is crap, but that application sure was bloated and underperforming). Websphere was irritating to deal with. What they did with it was worse. We eventually got the source code from them so we could fix it up, after a bunch of legal wrangles. They had 4 different custom created ORMs in it, each clearly developed by a different dev. Then for reports they never used the ORM, instead they'd make a "one query spits out the final report" query. Those queries would often be 30 - 40 subselects deep. `SELECT thing FROM foo WHERE bar in (SELECT bars IN monkey WHERE ....` and so on down the line.
It is absolutely possible to write amazingly performant Java / JVM applications, and absolutely possible to write horrible blobby slow Go applications.
But given most of the standard corporate devs that I've worked with, it is infinitely easier to write crappy Java and not-bad Go than the other way around.
It's also way easier to find a Java developer. Go is still in early adoption stage and people don't start programming on Go because they've decided to change their job from marketing to IT. As soon as any platform becomes popular the volume of crappy code increases dramatically.
> It's hard to square these articles with the reality I see on the ground
This is not a new phenomenon: back in 2000, the IBM System's Journal had a special issue on Java. Most of it was filled with articles detailing how amazingly awesome performance now was and getting better all the time.
The last article in that issue was from IBM's San Francisco Project, the only one with real-world usage. Which reported much more, er, "mixed" results. More accurately: atrocious performance on their real-world tasks, and a lot of extra engineering effort to get it to work at least half reasonably.
It's a problem that is inherent in the way the JVM's GCs work. They never release memory even if you have low heap usage. I have a micronaut based microservice. Visual VM reports less than 30MB memory usage. Sometimes I get request spikes and it goes to 200MB. It will allocate 200MB from the OS and once the spike is gone it will hang onto the 200MB.
There is also a fixed overhead. Even if I set the heap maximum to 30MB at the cost of running out of memory during spikes the total memory used by the JVM is still around 130MB.
If you want to run 10 microservices that each do very little then you will need anywhere from 1.3GB to 3.3GB (as soon as you have one traffic spike and then never again). With go you could probably get away with just 300MB and during temporary traffic spikes your maximum will be 2.3GB at most but it is unlikely that all your microservices will spike at the same time so you will need significantly less than 2.3GB.
It depends on Java version that you use. Recent GC returns unused memory back to OS [1] and class data sharing can speed up startup time of your microservices and save memory as shared classes are loaded only once for all JVMs [2]
it's not "free", but there are frameworks that may help with that; e.g. Quarkus [1] provides a common EE/MicroProfile stack that does a lot of AoT codegen to get faster boot times and lower memory footprint; it is also compatible with GraalVM AoT, for even lower footprint and faster bootup. Compile times are still large, but you get the benefit of a familiar stack. Notice that, however, if your workload is throughput-intensive, you may still want to go with JVM mode (which is still pretty fast because of codegen)
(full disclosure: I work at Red Hat and I have contributed to the project)
I've written a small project in Quarkus and I have enjoyed it.
My main problems are the windows support still isn't there yet
and the support for graal 19.3 (the one that supports java 11) isn't done yet either.
> These blog posts also present AOT as if its just another option you can toggle, while my impression is that it's incompatible with common Java libraries and requires swapping / maintaining a parallel compiler in the build toolchain, configuring separate rules to build it AOT, etc.
No, the blog post doesn't, and I quote: "Java AOT comes at a price.
Huge compilation times (may slow-down your CI/CD pipeline).
Very limited reflection API, limiting the usage of several frameworks or requiring of extra, complex, configuration: JPA, Spring... (This aspect will be treated in a future blog post)."
At one of my recent employers, we wondered why some of our services consume whopping 200 MB when a normal service would run on 50 MB. A service requiring 1-2 GB would be considered a behemoth.
Unless your code is extremely complicated, or you are running a Jack-of-all-trades monolith, the only way to reasonably consume gobs of RAM is to keep an in-memory database / cache. This trades RAM consumption for speed. This is what e.g. IntelliJ does.
Running a heap larger than you need slows you down in the long term, because GC over a large heap consumes more time and CPU.
I don't know what are you using but I've built apps using Spring (known as a resource hug) which take up 10MB of memory and start up super fast. You only need some fine tuning to get these numbers. Alternatively you can also use things like Quarkus: https://quarkus.io/ which are hilariously cheap to operate and can work with Lambda as well.
To get super-fast Spring start-up, do you have to turn off classpath scanning? That seems like it would always inherently be slow. What else do you tune?
> These blog posts also present AOT as if its just another option you can toggle, while my impression is that it's incompatible with common Java libraries
I agree. I have experimented to make one of my smaller application (~20 Java classes) to Graal native image. After 4-5 days of trying I gave up. Most of the commonly used libraries like JDBC drivers, Kafka, log4j etc work in totally opposite way that Graal Native image like to work.
The way I see it is J2EE all over again. Various vendors will find overall complexity of native images for practical or business Java apps as consulting or business opportunity. We can already see Quarkus, Micronaut and others are trying to offer native images for their framework. This would mean only a set of libraries would be compatible to native image in practical sense.
The problem is enterprise Java and the Java community, not the jvm. If people wrote more god-like code in Java, we'd have much leaner Java processes. Instead, we have heavyweight, performance-insensitive frameworks like spring and hibernate used everywhere when they shouldn't be.
Look into Micronauts and GraalVM. Simple apps have shown to have startup times of sub 20 ms and memory footprints als low as 18 mb.
It's simply not fair of a comparison to GO when you drag in heavy frameworks like Spring, Hibernate, etc... .
I'd also like to add that even without GraalVM and Micronauts, an app that requires 1 Gig of memory seems to be indicative of its architecture and not the choice of language.
Micronaut and Quarkus are fantastic! Those startup times are exactly what you're looking for if you want to run a cloud function or lambda. I will say, that for a standard microservice, where startup time is not as important, I'd not compile to GraalVM. The cpu/memory overhead to create a GraalVM image is pretty intense. Furthermore, GraalVM doesn't optimize as it warms up like HotSpot does. That need may depend on your use case. I'm currently running Micronaut services in production that take around 5 seconds to start on JDK 1.8. The Spring Boot equivalent..... well.... 10 times that amount of time maybe?
If you want to square it with what you see on the ground then you'll have to take a heap dump and analyze where the memory is actually being used. (The number of class files is unlikely to be a factor.)
Go has no proper solution to garbage collector ballast, a hack which businesses you've heard of are using in the wild. see https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i... and https://github.com/golang/go/issues/23044. A golang team member's reasons for not adding a minimum heap size include that it would require more testing each release, and that they might want to have a max heap size hint instead.
I posted a comment similar to this one recently, but it seems more relevant here.
I've noticed similar things while profiling my tests - making a large static allocation or bumping GOGC=1000 can cause them to run more than 2x faster (my favorite took a 12 second test suite and dropped it to about 2.5 seconds). So much time is spent assisting the GC to keep memory at like an 8-12mb range, as if that small of a heap was somehow the most important thing Go could be doing with the CPU.
I heard that gcc didn't free() memory because compilation speed is important and the compiler process was expected to be short-lived. I don't know if that is (still) true, but it sounds reasonable to me.
D doesn't do this (there's a flag now, not sure how well it works), if you've got a lot of compile time functions running the compiler process will rack up GBs of memory pretty quickly, especially if you're compiling in parallel. I upgraded to 32GB of RAM so I wouldn't hit swap when compiling.
Initializing a GC has almost no overhead (compared to e.g. loading some classes). The only reason to introduce this is to intentionally avoid overhead for having it actually reclaim memory under the assumption that it is in any case not going to ask for more than there is. This does not make sense for long running servers but could make sense for short running things like lambda functions, command line stuff etc. Of course if this overhead is substantial that probably means this is not a good solution since you apparently have a lot of memory allocation happening.
“Last-drop throughput improvements. Even for non-allocating workloads, the choice of GC means choosing the set of GC barriers that the workload has to use, even if no GC cycle actually happens. All OpenJDK GCs are generational (with the notable exceptions of non-mainline Shenandoah and ZGC), and they emit at least one reference write barrier. Avoiding this barrier can bring the last bit of throughput improvement. There are locality caveats to this, see below.”
When I was looking for my first car in early 90s, I knew nothing about cars or brands. One thing I noticed was that all the TV ads for most of the cars would say "more room than a Camry". I knew what I needed to buy.
If Go doesn't survive another decade, I would still be happy about what it triggered.
100 ms is nowhere near what I'd call "negligible" for a process that might only live a second or two!
python -c print 'hello' starts up and shuts down an entire Python interpreter in less than 50 ms on my machine, whereas the equivalent Java program never takes less than 120 ms. That seems pretty sad for a language that's had at least an order of magnitude more resources thrown at it.
In one particular case when a process will live for a second or two, it's indeed a visible delay. But it's just one of many scenarios and it's definitely not the top priority one for applying all those engineering resources. When you build a server application, you just don't care about startup time much - in situations where it matters, green-blue deployment will do a better job to reduce downtime (and, anyway, it's not always the platform startup which contributes to the delay before full availability).
That's certainly good news and the renewed focus on start-up performance is very welcome. I wonder what the numbers look like for small Kotlin programs.
(Although I see now that the comparison disabled CDS on Java 8, so the actual improvement is perhaps not as large as it seemed.)
The JVM is primarily used for backend server processes that run 24/7. Instant startup time is irrelevant in that context and so it was never a design objective. The HotSpot VM is specifically designed and optimized for long running server processes https://www.oracle.com/technetwork/java/whitepaper-135217.ht...
Are you sure it "was never a design objective"? Clearly the objective was never achieved, but since Java was initially aimed at set-top boxes, tablet-like devices and web browser applets (all interactive client-side applications) it would be strange to design it exclusively for server-side batch processing.
I've run small services built with Java, Go and Crystal to achieve good startup and throughput performance while minimizing memory usage. My experience with Crystal is limited but has been positive thus far.
The sweet spot for me is using Java with OpenJ9 which has very fast startup time while sacrificing only a bit of top-end throughput.
I would only choose AOT if packaging/deployment were issues with the JIT approach. In the case of Go, AOT is practically free but I prefer not being limited to array/slice, map, and channel generics.
I'd kill for a comparison with the IBM J9 AOT flag. It essentially just caches jitted code for startup. Also if startup time is super important than you can try -Xquickstart.
Interesting numbers. I have been out of the Java world for a few years and I’m unfamiliar with GraalVM but I’m curious how compatible it is with Oracle Java or OpenJDK.
Of course for small scale stuff like a QuickSort implementation, JVM starts are fast-ish, but for a nontrivial service, library load times during boot can balloon quickly depending on your build discipline.
incredibly compatible, albeit with some limitations. The compiler is quite aggressive, so e.g. you can't do reflection on "any" class, you just have to tell the compiler what you are going to use at runtime (so called "closed-world assumption"). Also most initialization code is forced to run at "compile-time" to keep boot time low. Pretty incredible piece of code.
I'm looking at a project that makes "The main benefits of doing so is to enable polyglot applications (e.g., use Java, R, or Python libraries)" sound attractive.
This article reiterates the belief that jvm start-up time is slow is somewhat a myth. When. I measured it years ago, I found jvm start-up time to be roughly equivalent to node.js.
What makes start-up time bad in any interpreted language, including Java, python, and JavaScript, is code-loading time. This time is 0(n), where n is the total size of your app, including transitive dependencies. It takes time to load, parse, and validate non-native code. This time far dwarfs any vm start-up time.
As an experiment, write a hello world node.js app and time it's execution. Then add an import statement on the aws sdk. Don't actually use it. Just import it! When I last measured it, this caused start-up time to go from 30 Ms to something like 300 Ms. The extra time mostly comes from loading code.
For a native app, the binary itself just gets mmapped into memory. Shared library loading is more expensive, but not much, and way less than loading source code or byte code.
The tldr is if you want a fast starting non-native app, you have to shrink the transitive dependency closure your app loads to do it's job. This is easy for toy benchmark apps but can be harder for real apps. It also goes against the philosophy of most devs to rely on third party libraries for everything. For instance, If you care about start-up time, it may be worth re implementing that function you'd normally get from guava or apache commons. You can alternatively use a tool like proguard to shrink your dependency closure.
It would be nice to have some automated tools to convert from Java to Go or Rust. Something like c2rust [1], but for Java. There exist[2] some kind of automation, buts it's too basic to be practical.
Why can't the JVM preload on startup the majority of the core runtime as a shared library architecture in a sort of super-permanent generation, and then the JVMs piggyback on that?
I remember solaris boxes had little of the startup cost and someone told me they preloaded the java runtime.
Wouldn't the JIT:ed code be optimized for the specific CPU, whereas the AOT perhaps can not make use of certain instructions? Also, when making a comparison off the different models one must keep in mind that the software that processes the code have different characteristics.
JIT'd code isn't just about specific CPUs, but about optimizing hot paths. For example, you don't want to inline a commonly used function all over the place, but if there's one area that calls it 10,000 times per second while the others call it once a minute at most, you can re-write the program at runtime having observed that hot code path.
To get an intuition for JITs: there are pieces of your code that you know will always be run in a certain way (or mostly run in that way), but it isn't provable at compile time that it will always be that way. JITs can notice that pattern and optimize it (or even provide optimizations for common, but not exclusive paths).
Interesting write-up - would be nice to see not just the quicksort code, but the harness/scripts used for benchmarking. As far as I can tell it's not included?
I wonder if AOT would speed up Groovy as well. My intuition is that it doesn't matter, since the runtime will get bundled and execution time will be the same.
If you have short-lived tasks popping off at that rate, the JVM looks some orders of magnitude better if you don't spin up a fresh process for every task.
You can convert it to Kotlin and just use LLVM-based native[1] compilation that is even more complex efficient. It's completely automated, and gives you the better and more modern language without much hassle.
Your question presupposes that the main reason for Java's widespread adoption was that it runs in a VM. That is not true. Java as a language was designed to be significantly more productive than C++ in the context of enterprise software development, regardless of whether its compiled or not.
The deploy JARs have 100+ MB of class files, so perhaps it's a function of all the dependencies that you "need" for an enterprise Java program and not something more fundamental.
These blog posts also present AOT as if its just another option you can toggle, while my impression is that it's incompatible with common Java libraries and requires swapping / maintaining a parallel compiler in the build toolchain, configuring separate rules to build it AOT, etc. I don't have actual experience with it though so I could be missing something.