Discussion:
Number of Parallel GC Threads
Nicolas Michael
2009-01-23 07:26:29 UTC
Permalink
Hi,

on server-class machines, you are using the ParallelGC collector as the default
collector. This collector is per default using as many parallel gc threads as
there are cpus in the system. As far as I remember, you are doing this because
you are assuming that people usually just run one JVM on their server, so you're
optimizing for that case -- and telling people to explicitly decrease their
number of gc threads when running multiple JVM's. (As some of you know, we are
doing just that for our "most important" JVM instances we're running.)

However, we also have some little Java-based agents running on our systems which
just use the defaults. On our T5240's and T5440's they end up having 128 resp.
256 parallel gc threads! I'm wondering whether this is really a "reasonable"
default behavior?! Would someone really run just one JVM on such a large system?
Does it make sense to have 256 parallel gc threads as a *default* on such
servers? I was thinking that it may be better to limit the maximum number of
parallel gc threads that are created per default to a more "conservative" number
(whatever this number would be... 16? 32? ...?).

Well, this is just a thought. I could imagine that there are lots of Java-based
tools around (management GUIs, ...) that people run on their servers. And many
of those tools usually don't set any JVM parameters except probably the max heap
size. If I was starting such a tool as an operator, I wouldn't want it to
interrupt my workload by doing garbage collection with 256 threads in parallel.

Another idea: The demand for many gc threads also depends a little on the heap
size. An application with a default 64m heap will most likely not benefit much
from 256 gc threads, while an application with 4g of heap might. So the heap
size could be another indication of how many parallel gc threads would make
sense as a default.

I'd be interested to hear your opinion on that.

Thanks,
Nick.
Thomas Viessmann
2009-01-23 07:49:32 UTC
Permalink
Hi Michael,


I totally agree in what you're saying. With the success of the
UltraSparc T1 and T2 processor families, this is becoming more
and more an issue. I'm in the customer service group and
an increasing number of service requests are due to mis-
or non-configuration of the parallel GC threads. Let's see what
the development team thinks about this. Many thanks.
Post by Nicolas Michael
Hi,
on server-class machines, you are using the ParallelGC collector as the default
collector. This collector is per default using as many parallel gc threads as
there are cpus in the system. As far as I remember, you are doing this because
you are assuming that people usually just run one JVM on their server, so you're
optimizing for that case -- and telling people to explicitly decrease their
number of gc threads when running multiple JVM's. (As some of you know, we are
doing just that for our "most important" JVM instances we're running.)
However, we also have some little Java-based agents running on our systems which
just use the defaults. On our T5240's and T5440's they end up having 128 resp.
256 parallel gc threads! I'm wondering whether this is really a "reasonable"
default behavior?! Would someone really run just one JVM on such a large system?
Does it make sense to have 256 parallel gc threads as a *default* on such
servers? I was thinking that it may be better to limit the maximum number of
parallel gc threads that are created per default to a more "conservative" number
(whatever this number would be... 16? 32? ...?).
Well, this is just a thought. I could imagine that there are lots of Java-based
tools around (management GUIs, ...) that people run on their servers. And many
of those tools usually don't set any JVM parameters except probably the max heap
size. If I was starting such a tool as an operator, I wouldn't want it to
interrupt my workload by doing garbage collection with 256 threads in parallel.
Another idea: The demand for many gc threads also depends a little on the heap
size. An application with a default 64m heap will most likely not benefit much
from 256 gc threads, while an application with 4g of heap might. So the heap
size could be another indication of how many parallel gc threads would make
sense as a default.
I'd be interested to hear your opinion on that.
Thanks,
Nick.
--
---
mit freundlichen Gruessen / with kind regards


Thomas Viessmann

Global Sales and Services - Software Support Engineering

Sun Microsystems GmbH Phone: +49 (0)89 46008 2365 / x62365
Sonnenallee 1 Mobile: +49 (0)174 300 5467
D-85551 Kirchheim-Heimstetten Pager: Thomas.Viessmann at sun.itechtool.com
Germany/Deutschland mailto: Thomas.Viessmann at sun.com
http://www.sun.de

Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
Clemens Eisserer
2009-01-23 15:36:23 UTC
Permalink
Hi Nicolas,
Post by Nicolas Michael
However, we also have some little Java-based agents running on our systems which
just use the defaults. On our T5240's and T5440's they end up having 128 resp.
256 parallel gc threads! I'm wondering whether this is really a "reasonable"
default behavior?! Would someone really run just one JVM on such a large system?
Does it make sense to have 256 parallel gc threads as a *default* on such
servers?
Well, except a bit of wasted memory (because each thread has its own
private data structures) I think it does not hurt a lot.
It should be a rather rare case that two JVMs running run a gc-cycle
at the same time for which performance will be a bit worse, however
when there are no gc-cycle "collisions" each JVM will have lowest
possible gc pause time.

- Clemens
Tony Printezis
2009-01-23 15:33:57 UTC
Permalink
Nick,

First, a correction. For very large machines, we don't set the number of
GC threads to be the same as the CPUs (there are diminishing returns if
we do that). Currently, you get 1 parallel GC thread per CPU for up to 8
CPUs, and 5/8 after that (so for 16 CPUs you get: 8 + 5/8 x 8 = 13 GC
threads).

Now, regarding what we pick for the defaults: whatever we do, someone is
going to be happy, and someone will not be. I've always seen the
defaults as trying to do a not-so-embarrassing job for naive users. If
someone is running several JVMs per box, then I assume that they know
what they are doing, so playing around with the # of parallel GC threads
should be straightforward to them! :-) And, incidentally, folks who do
run more than one JVM per large box typically use processor sets or
zones to partition the machines; in that case, our algorithm should do
the right thing.

Tony
Post by Nicolas Michael
Hi,
on server-class machines, you are using the ParallelGC collector as the default
collector. This collector is per default using as many parallel gc threads as
there are cpus in the system. As far as I remember, you are doing this because
you are assuming that people usually just run one JVM on their server, so you're
optimizing for that case -- and telling people to explicitly decrease their
number of gc threads when running multiple JVM's. (As some of you know, we are
doing just that for our "most important" JVM instances we're running.)
However, we also have some little Java-based agents running on our systems which
just use the defaults. On our T5240's and T5440's they end up having 128 resp.
256 parallel gc threads! I'm wondering whether this is really a "reasonable"
default behavior?! Would someone really run just one JVM on such a large system?
Does it make sense to have 256 parallel gc threads as a *default* on such
servers? I was thinking that it may be better to limit the maximum number of
parallel gc threads that are created per default to a more "conservative" number
(whatever this number would be... 16? 32? ...?).
Well, this is just a thought. I could imagine that there are lots of Java-based
tools around (management GUIs, ...) that people run on their servers. And many
of those tools usually don't set any JVM parameters except probably the max heap
size. If I was starting such a tool as an operator, I wouldn't want it to
interrupt my workload by doing garbage collection with 256 threads in parallel.
Another idea: The demand for many gc threads also depends a little on the heap
size. An application with a default 64m heap will most likely not benefit much
from 256 gc threads, while an application with 4g of heap might. So the heap
size could be another indication of how many parallel gc threads would make
sense as a default.
I'd be interested to hear your opinion on that.
Thanks,
Nick.
--
----------------------------------------------------------------------
| Tony Printezis, Staff Engineer | Sun Microsystems Inc. |
| | MS BUR02-311 |
| e-mail: tony.printezis at sun.com | 35 Network Drive |
| office: +1 781 442 0998 (x20998) | Burlington, MA01803-0902, USA |
----------------------------------------------------------------------
e-mail client: Thunderbird (Solaris)
Jon Masamitsu
2009-01-23 15:54:10 UTC
Permalink
Post by Tony Printezis
Nick,
First, a correction. For very large machines, we don't set the number of
GC threads to be the same as the CPUs (there are diminishing returns if
we do that). Currently, you get 1 parallel GC thread per CPU for up to 8
CPUs, and 5/8 after that (so for 16 CPUs you get: 8 + 5/8 x 8 = 13 GC
threads).
Tony,

This is a bit of embarassment on my part. What you say is true
for all the collectors except UseParallelGC. In the early jdk6's
and before, we used all the cpu's for GC for UseParallelGC
(the default for server-class machines). That was fixed in
jdk6 update 10 (I believe).

Jon
kirk
2009-01-23 17:39:45 UTC
Permalink
Post by Jon Masamitsu
Post by Tony Printezis
Nick,
First, a correction. For very large machines, we don't set the number of
GC threads to be the same as the CPUs (there are diminishing returns if
we do that). Currently, you get 1 parallel GC thread per CPU for up to 8
CPUs, and 5/8 after that (so for 16 CPUs you get: 8 + 5/8 x 8 = 13 GC
threads).
Tony,
This is a bit of embarassment on my part. What you say is true
for all the collectors except UseParallelGC. In the early jdk6's
and before, we used all the cpu's for GC for UseParallelGC
(the default for server-class machines). That was fixed in
jdk6 update 10 (I believe).
Jon
I've come into a couple of situations where I need to throttle back on
GC threads so I think the request to throttle based on the -Xmx setting
seem reasonable.

Regards,
Kirk
Jon Masamitsu
2009-01-23 18:09:42 UTC
Permalink
Post by kirk
...
I've come into a couple of situations where I need to throttle back on
GC threads so I think the request to throttle based on the -Xmx setting
seem reasonable.
How do you decide the number of GC threads given a maximum heap size?
Do you scale the number of GC threads linearly with the maximum
heap size?
kirk
2009-01-23 19:19:47 UTC
Permalink
Post by Jon Masamitsu
Post by kirk
...
I've come into a couple of situations where I need to throttle back on
GC threads so I think the request to throttle based on the -Xmx setting
seem reasonable.
How do you decide the number of GC threads given a maximum heap size?
Do you scale the number of GC threads linearly with the maximum
heap size?
Good question. As Tony pointed out, there seems to be a useful number of
threads to allocate and so his formulation deviates from linear in cases
where there are a large number of CPUs. In this case I guess I would cap
at the min offered by that value and one determined by memory. I don't
have a good feeling for what that other formula would look like but a
good starting point could be something like 1 for something like every
64mb. The actual value could be adjusted using some observations about
how GC was behaving.

Kirk
Jon Masamitsu
2009-01-23 21:12:25 UTC
Permalink
Post by kirk
Post by Jon Masamitsu
Post by kirk
...
I've come into a couple of situations where I need to throttle back on
GC threads so I think the request to throttle based on the -Xmx setting
seem reasonable.
How do you decide the number of GC threads given a maximum heap size?
Do you scale the number of GC threads linearly with the maximum
heap size?
Good question. As Tony pointed out, there seems to be a useful number of
threads to allocate and so his formulation deviates from linear in cases
where there are a large number of CPUs. In this case I guess I would cap
at the min offered by that value and one determined by memory. I don't
have a good feeling for what that other formula would look like but a
good starting point could be something like 1 for something like every
64mb. The actual value could be adjusted using some observations about
how GC was behaving.
So linear with the max size of the heap (e.g., 8 GC threads for a 512m
max heap)
up to some cap (e.g., using the 5/8's rule Tony described).

What do you mean by GC behavior? And adjusting for it?
Y Srinivas Ramakrishna
2009-01-23 21:29:17 UTC
Permalink
Post by Jon Masamitsu
Post by kirk
Good question. As Tony pointed out, there seems to be a useful number
of
Post by kirk
threads to allocate and so his formulation deviates from linear in
cases
Post by kirk
where there are a large number of CPUs. In this case I guess I would
cap
Post by kirk
at the min offered by that value and one determined by memory. I
don't
Post by kirk
have a good feeling for what that other formula would look like but a
good starting point could be something like 1 for something like
every
Post by kirk
64mb. The actual value could be adjusted using some observations
about
Post by kirk
how GC was behaving.
So linear with the max size of the heap (e.g., 8 GC threads for a 512m
max heap)
up to some cap (e.g., using the 5/8's rule Tony described).
What do you mean by GC behavior? And adjusting for it?
Not to answer the question for Kirk, but basically as Tony said
and we all agree, no single function form and coefficients we use
will work well for all applications and all workloads. I am
guessing what Kirk means here is some kind of dynamic learning
and readjustment of the coefficients (or in a model-free case
just using some kind of probing in the vicinity of the current
state (i.e. # gc threads) and based on whether that improved the
performance or not, either move in that direction or back --
basically some kind of reinforcement learning approach towards
finding -- and tracking -- an optimal value dynamically.
If that's what Kirk was getting at, I would expect any such
adjustment to happen slowly and for one to be extremely careful
in large ensembles of such JVMs to keep from getting into
oscillations/instability. I am sure (ok i am guessing)
control theorists have solved this ensemble control problem for simple
(homogeneous) cases, but i fear that it may be difficult to get right at
low cost in the kinds of (non-homogeneous, bursty) situations we would
expect to encounter.

Just some loud thinking ....
-- ramki
kirk
2009-01-24 01:18:37 UTC
Permalink
Post by Y Srinivas Ramakrishna
Post by Jon Masamitsu
Post by kirk
Good question. As Tony pointed out, there seems to be a useful number
of
Post by kirk
threads to allocate and so his formulation deviates from linear in
cases
Post by kirk
where there are a large number of CPUs. In this case I guess I would
cap
Post by kirk
at the min offered by that value and one determined by memory. I
don't
Post by kirk
have a good feeling for what that other formula would look like but a
good starting point could be something like 1 for something like
every
Post by kirk
64mb. The actual value could be adjusted using some observations
about
Post by kirk
how GC was behaving.
So linear with the max size of the heap (e.g., 8 GC threads for a 512m
max heap)
up to some cap (e.g., using the 5/8's rule Tony described).
What do you mean by GC behavior? And adjusting for it?
Not to answer the question for Kirk, but basically as Tony said
and we all agree, no single function form and coefficients we use
will work well for all applications and all workloads. I am
guessing what Kirk means here is some kind of dynamic learning
and readjustment of the coefficients (or in a model-free case
just using some kind of probing in the vicinity of the current
state (i.e. # gc threads) and based on whether that improved the
performance or not, either move in that direction or back --
basically some kind of reinforcement learning approach towards
finding -- and tracking -- an optimal value dynamically.
If that's what Kirk was getting at, I would expect any such
adjustment to happen slowly and for one to be extremely careful
in large ensembles of such JVMs to keep from getting into
oscillations/instability. I am sure (ok i am guessing)
control theorists have solved this ensemble control problem for simple
(homogeneous) cases, but i fear that it may be difficult to get right at
low cost in the kinds of (non-homogeneous, bursty) situations we would
expect to encounter.
Just some loud thinking ....
I didn't know I was so clever ;-)

I was thinking about this a bit but unless you understand what is going
on with the competing JVMs.... I think that could be a bigger influence.
At any rate, I'd consider this a gross optimiztion. Trying to fine tune
it may not make sense.

Regards,
Kirk
Nicolas Michael
2009-01-24 14:40:58 UTC
Permalink
Hi,

I didn't expect to kick off that many emails! :-) So first of all:
Thanks a lot! I'll try to reply to some of them in this "compound" mail.

I was referring to Java 1.6.0_07 (and some Java 5 builds): With these
releases, we really *do* have 256 gc threads on our T5440! I didn't know
that you had changed that in 1.6.0_10. Having "just" 80 gc threads with
the 5/8 rule sounds already much better, but would still be overkill for
an Xmx128m process (see below).
If someone is running several JVMs per box, then I assume that they
know what they are doing, so playing around with the # of parallel GC
threads should be straightforward to them!
Hmmm... I'll try to explain in a little more detail: You know our
configuration, and that's exactly what we do: Without going into
details, we configure the number of parallel gc threads over *all* our
JVM's that are part of our workload such that they don't exceed the
number of cpus. Works very well!

But that's just part of the story. Nowadays, you have lots of programs
that are developed in Java. And you run such programs in the background
on the same servers you run your workload on. Those programs are not
necessarily part of your workload, but perform monitoring/administrative
tasks in the background. They are often not developed by your own
department, but either in other parts of your company or are coming from
OEM partners. So you often don't have the possibility to change JVM
settings for them. (I'm writing this in a very general fashion since
this is a public list...)

As an example, take the management and supervision tools of Sun's
StorageTek arrays (CAM software). There's this "Sun StorageTek Fault
Management Services" process, running Java 1.5.0_11-b03 (of course, it's
coming with its own JVM...).

$ jinfo 763
java.vm.version = 1.5.0_11-b03
java.vm.name = Java HotSpot(TM) Server VM
VM Flags:
-Xms8m -Xmx128m ...
...

Currently, this FM service agent has been doing 792 Young GC's within 3
days (that's about once every 6 minutes) -- with 256 parallel gc threads:

$ pstack 763 | grep __1cMGCTaskThreadDrun6M_v_ | wc -l
256
It should be a rather rare case that two JVMs running run a gc-cycle
at the same time for which performance will be a bit worse
If a server runs during peak load at high cpu utilizations (there are
applications that can do this even with 256 cpus in the server...), a
256-threaded gc cycle of a monitoring agent could be quite disruptive
for such a workload (especially when it is sensitive to response times).

I'm not saying that we *do* have a critical problem here (in my
particular case). I was writing my first mail just to point out that in
my opinion there are situations where JVM's are maybe using too many gc
threads on large systems. Of course, I totally agree that it's difficult
(if not even impossible) for the JVM to come up with best default
settings for *any* imaginable situation. Usually the user will need to
adjust some of the settings. In some case as I'm describing above, this
is unfortunately not always possible: Afaik, there is no external
interface to change the JVM settings for this CAM software agent. (Ok,
we could consider this the fault of the agent, which could in a startup
script detect how many cpus there are in the system, and limit its
number of gc threads to let's say 4 (which should be sufficient for such
a process with 128m heap). Or use the Client VM instead...)
Unfortunately, I believe there are many programs around which fail to do
such things...

Therefore I thought it might help if the JVM would limit the number of
gc threads or large systems if it is likely that the process would not
benefit from more gc threads. It's sure difficult to tell, but I've seen
lots of ideas in these mails. 128m heap certainly don't need 256 gc
threads (or 80 with the 5/8 rule). Btw, this CAM agent has 274 threads
in total. Subtracting 256 gc threads leaves 18 threads. I don't know how
many of them really do anything, but an application with < 18 mutator
threads, a heap of 128m, minor gc intervals of 6 minutes certainly
doesn't need many gc threads -- not even on a 256-way system... ;-) Of
course, number of mutator threads and gc intervals are dynamic
parameters and can't be determined by the JVM at startup.

Nick.
Tony Printezis
2009-01-24 22:13:09 UTC
Permalink
Nick,

Hi. You do bring up, as always!!!, a good point. Maybe we can come up
with some better defaults, but my guess would be that it might deal a
bit better with a small percentage of cases, but it will not be the
magic wand that will automagically solve all related issues. We have
considered doing something like trying to monitor how many JVMs are
running on a particular machine and increase / decrease the resources
dedicated to each as needed. But that's not trivial and I don't think
it's high on our list.

Regarding the CAM agent, you said correctly that they should somehow
tune its parameters a bit better. And you are right; I would have
reported it as a bug.

Tony
Post by Nicolas Michael
Hi,
Thanks a lot! I'll try to reply to some of them in this "compound" mail.
I was referring to Java 1.6.0_07 (and some Java 5 builds): With these
releases, we really *do* have 256 gc threads on our T5440! I didn't
know that you had changed that in 1.6.0_10. Having "just" 80 gc
threads with the 5/8 rule sounds already much better, but would still
be overkill for an Xmx128m process (see below).
If someone is running several JVMs per box, then I assume that they
know what they are doing, so playing around with the # of parallel GC
threads should be straightforward to them!
Hmmm... I'll try to explain in a little more detail: You know our
configuration, and that's exactly what we do: Without going into
details, we configure the number of parallel gc threads over *all* our
JVM's that are part of our workload such that they don't exceed the
number of cpus. Works very well!
But that's just part of the story. Nowadays, you have lots of programs
that are developed in Java. And you run such programs in the
background on the same servers you run your workload on. Those
programs are not necessarily part of your workload, but perform
monitoring/administrative tasks in the background. They are often not
developed by your own department, but either in other parts of your
company or are coming from OEM partners. So you often don't have the
possibility to change JVM settings for them. (I'm writing this in a
very general fashion since this is a public list...)
As an example, take the management and supervision tools of Sun's
StorageTek arrays (CAM software). There's this "Sun StorageTek Fault
Management Services" process, running Java 1.5.0_11-b03 (of course,
it's coming with its own JVM...).
$ jinfo 763
java.vm.version = 1.5.0_11-b03
java.vm.name = Java HotSpot(TM) Server VM
-Xms8m -Xmx128m ...
...
Currently, this FM service agent has been doing 792 Young GC's within
3 days (that's about once every 6 minutes) -- with 256 parallel gc
$ pstack 763 | grep __1cMGCTaskThreadDrun6M_v_ | wc -l
256
It should be a rather rare case that two JVMs running run a gc-cycle
at the same time for which performance will be a bit worse
If a server runs during peak load at high cpu utilizations (there are
applications that can do this even with 256 cpus in the server...), a
256-threaded gc cycle of a monitoring agent could be quite disruptive
for such a workload (especially when it is sensitive to response times).
I'm not saying that we *do* have a critical problem here (in my
particular case). I was writing my first mail just to point out that
in my opinion there are situations where JVM's are maybe using too
many gc threads on large systems. Of course, I totally agree that it's
difficult (if not even impossible) for the JVM to come up with best
default settings for *any* imaginable situation. Usually the user will
need to adjust some of the settings. In some case as I'm describing
above, this is unfortunately not always possible: Afaik, there is no
external interface to change the JVM settings for this CAM software
agent. (Ok, we could consider this the fault of the agent, which could
in a startup script detect how many cpus there are in the system, and
limit its number of gc threads to let's say 4 (which should be
sufficient for such a process with 128m heap). Or use the Client VM
instead...) Unfortunately, I believe there are many programs around
which fail to do such things...
Therefore I thought it might help if the JVM would limit the number of
gc threads or large systems if it is likely that the process would not
benefit from more gc threads. It's sure difficult to tell, but I've
seen lots of ideas in these mails. 128m heap certainly don't need 256
gc threads (or 80 with the 5/8 rule). Btw, this CAM agent has 274
threads in total. Subtracting 256 gc threads leaves 18 threads. I
don't know how many of them really do anything, but an application
with < 18 mutator threads, a heap of 128m, minor gc intervals of 6
minutes certainly doesn't need many gc threads -- not even on a
256-way system... ;-) Of course, number of mutator threads and gc
intervals are dynamic parameters and can't be determined by the JVM at
startup.
Nick.
--
----------------------------------------------------------------------
| Tony Printezis, Staff Engineer | Sun Microsystems Inc. |
| | MS BUR02-311 |
| e-mail: tony.printezis at sun.com | 35 Network Drive |
| office: +1 781 442 0998 (x20998) | Burlington, MA01803-0902, USA |
----------------------------------------------------------------------
e-mail client: Thunderbird (Solaris)
Martin Buchholz
2009-01-25 22:53:58 UTC
Permalink
Regarding the CAM agent, you said correctly that they should somehow tune
its parameters a bit better. And you are right; I would have reported it as
a bug.
It's hard to fault an application for using the defaults.
Tuning should only be necessary to achieve top performance,
not to achieve reasonable performance.
If an application ends up not being a good citizen as a result of
decisions made by the JVM, that's a bug in the JVM.

In general, JDK engineers think in terms of peak performance
in benchmark-like settings. I've been guilty of that myself.
We should try harder to have reasonable performance without
unreasonable resource consumption, by default.
Relatively few apps are in the "take over the machine" category.
Those few apps should be required to ask for such status via
a JVM flag.

I like the GC work done in the past few years by Matthew Hertz,
with an emphasis on finding the right balance for resource use
(particularly memory) by GC, especially adaptively.

Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20090125/55ee162b/attachment.html
Tony Printezis
2009-01-26 15:12:28 UTC
Permalink
Martin,
On Sat, Jan 24, 2009 at 14:13, Tony Printezis
Regarding the CAM agent, you said correctly that they should
somehow tune its parameters a bit better. And you are right; I
would have reported it as a bug.
It's hard to fault an application for using the defaults.
Tuning should only be necessary to achieve top performance,
not to achieve reasonable performance.
If an application ends up not being a good citizen as a result of
decisions made by the JVM, that's a bug in the JVM.
I don't think so. We need a reasonable balance between giving reasonable
performance and being a good citizen. Once upon the time we tried to do
what you are suggesting, i.e., trying to be a good citizen by default
and requiring tuning for performance. And that was not a big success, as
users kept complaining why our settings were too conservative, why they
kept getting OOMs given the small heap, why performance was not the
best, etc. So, we've been there, done that, it didn't work.
In general, JDK engineers think in terms of peak performance
in benchmark-like settings. I've been guilty of that myself.
We should try harder to have reasonable performance without
unreasonable resource consumption, by default.
Relatively few apps are in the "take over the machine" category.
Those few apps should be required to ask for such status via
a JVM flag.
I like the GC work done in the past few years by Matthew Hertz,
with an emphasis on finding the right balance for resource use
(particularly memory) by GC, especially adaptively.
Even if we monitor what's happening in the machine and, say, start
taking resource away from a JVM because the load of the machine is going
up, we might get users asking why we're taking away resources from the
"important" JVM in favor of not so important background processes. So I
still insist: whatever we do, it will not work in some setting and we'll
have folks complaining.

Tony
--
---------------------------------------------------------------------
| Tony Printezis, Staff Engineer | Sun Microsystems Inc. |
| | MS UBUR02-311 |
| e-mail: tony.printezis at sun.com | 35 Network Drive |
| office: +1 781 442 0998 (x20998) | Burlington, MA 01803-2756, USA |
---------------------------------------------------------------------
e-mail client: Thunderbird (Linux)
Nicolas Michael
2009-01-26 20:26:18 UTC
Permalink
Hi Tony,
Post by Tony Printezis
Even if we monitor what's happening in the machine and, say, start
taking resource away from a JVM because the load of the machine is going
up, we might get users asking why we're taking away resources from the
"important" JVM in favor of not so important background processes. So I
still insist: whatever we do, it will not work in some setting and we'll
have folks complaining.
I absolutely agree with you: Such "dynamic" optimizations have a high
potential of going into the wrong direction. And what I dislike even
more about them: They make system behavior very difficult to predict.
When you look at your system, you may find everything working just fine.
Then you probably have one application going nuts, and suddenly all
JVM's start adapting their resources. Or you start another JVM (let's
say, the Sun Studio Analyzer), and suddenly your work-load JVM's reduce
their GC threads... I wouldn't want to be the poor guy who would have to
analyze after such a situation what has happened when a customer is
complaining about a weird system behavior. Or make performance
predictions when I don't know which strange situations may make the JVM
to rethink its settings. I'm just imagining a situation where the server
hits overload because of a burst of incoming requests, and the JVM's
think "Oh, there's so much load on the server, let's reduce our resource
usage and kill some of them gc threads." -- and afterwards suffer from
longer gc pauses which makes it even more difficult for them to keep up
with the load.

On the other hand, there may be some better default settings that the
JVM could derive from some "static" parameters. I believe Martin is
right that most apps are not in the "take over the machine" category,
especially when it comes to really large servers. Are there really
people who run just one JVM instance on a 64/128/256 cpu CMT server for
example?? And even if they do, they certainly have lots of mutator
threads (otherwise it wouldn't make sense to run just one JVM on such a
server), so they would have lots of object creation and would need quite
a large heap for a reasonable operation.

Those "agent" processes that I was referring to (CAM just being one of
them) are rather "small" processes (128m heap in case of the CAM agent).
And I would assume that most of those "performance-uncritical background
processes" that you might run on large servers come with small heaps (I
sure hope -- I don't want them to use up my memory!). And even
performance-hungry apps with small heaps don't need too many gc threads
(and if they are really performance-hungry *and* have small heaps *and*
run on very large servers, they are likely to be deployed in more than
one instance... if they run single-instance, they should need *large*
heaps). So it should be possible for the JVM to come up with an upper
limit for the number of gc threads depending on the server size (cpus,
memory) and max heap size of the application. Probably there are even
other or better indications that I currently don't think of. And
probably there's more than just the number of gc threads that could be
sized that way...?


That's all for now, and sorry for my long mails... ;-)

Thanks,
Nick.
Martin Buchholz
2009-01-26 22:31:44 UTC
Permalink
Post by Tony Printezis
It's hard to fault an application for using the defaults. Tuning should
only be necessary to achieve top performance,
not to achieve reasonable performance. If an application ends up not being
a good citizen as a result of decisions made by the JVM, that's a bug in
the JVM.
I don't think so. We need a reasonable balance between giving reasonable
performance and being a good citizen. Once upon the time we tried to do what
you are suggesting, i.e., trying to be a good citizen by default and
requiring tuning for performance. And that was not a big success, as users
kept complaining why our settings were too conservative, why they kept
getting OOMs given the small heap, why performance was not the best, etc.
So, we've been there, done that, it didn't work.
Being a good citizen does not include failing due to OOME.

Being a good citizen, to me, means things like:
- gc'ing when the heap has grown from previous collection by a factor of
2-3,
instead of a factor of 20
- using 5 concurrent gc threads instead of 100

In general, don't double resource consumption to get an additional 1%
performance,
by default.

I agree with Tony that customers are going to have
a harder time managing applications the more dynamic
that resource management becomes.
But I think this comes with the territory.
It is much harder to do performance measurement
of Java code in general, because the execution model
with multiple runtimes has become so complex and dynamic.
But that's called progress. We should be aiming all parts of
our runtimes to be more dynamic in the same way.
Lots of difficult work remaining to be done by VM engineers.

Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20090126/c02ef808/attachment.html
Florian Weimer
2009-01-26 21:33:16 UTC
Permalink
Post by Martin Buchholz
In general, JDK engineers think in terms of peak performance
in benchmark-like settings. I've been guilty of that myself.
We should try harder to have reasonable performance without
unreasonable resource consumption, by default.
I would be happy to have some sort of system daemon mode, which
minimizes footprint while maintaining relatively decent performance
(for things like the occasional TLS handshake). I think most such
applications wouldn't even need dynamic tuning.

Right now, I fear that I need to put all these system management tasks
into a single VM (MVM-style) because it seems that without lots of
tuning, separate VMs would consume too much resources on some of our
smaller hosts. (Of course, this is a psychological issue to some
extent, and it's note very likely that the code actually exists before
those machines are phased out. 8-/)
Y Srinivas Ramakrishna
2009-01-26 21:37:27 UTC
Permalink
As Jon pointed out towards the start of this thread
"-client" may be considered the less resource-hungry
mode that you are looking for below (or at least a close
enough approximation for now).

-- ramki

----- Original Message -----
From: Florian Weimer <fw at deneb.enyo.de>
Date: Monday, January 26, 2009 1:33 pm
Subject: Re: Number of Parallel GC Threads
To: Martin Buchholz <martinrb at google.com>
Cc: hotspot-gc-dev at openjdk.java.net, Tony Printezis <Antonios.Printezis at Sun.COM>
Post by Florian Weimer
Post by Martin Buchholz
In general, JDK engineers think in terms of peak performance
in benchmark-like settings. I've been guilty of that myself.
We should try harder to have reasonable performance without
unreasonable resource consumption, by default.
I would be happy to have some sort of system daemon mode, which
minimizes footprint while maintaining relatively decent performance
(for things like the occasional TLS handshake). I think most such
applications wouldn't even need dynamic tuning.
Right now, I fear that I need to put all these system management tasks
into a single VM (MVM-style) because it seems that without lots of
tuning, separate VMs would consume too much resources on some of our
smaller hosts. (Of course, this is a psychological issue to some
extent, and it's note very likely that the code actually exists before
those machines are phased out. 8-/)
kirk
2009-01-23 19:22:07 UTC
Permalink
Post by Jon Masamitsu
Post by kirk
...
I've come into a couple of situations where I need to throttle back on
GC threads so I think the request to throttle based on the -Xmx setting
seem reasonable.
How do you decide the number of GC threads given a maximum heap size?
Do you scale the number of GC threads linearly with the maximum
heap size?
I should add that I think that the bigger problem is that GC doesn't
play nice when you have many VMs running on the same hardware. (point
taken about partitioning and all that). Some how it has to figure out if
it has free reign or if it need to share.

Kirk
Clemens Eisserer
2009-01-23 19:41:54 UTC
Permalink
I should add that I think that the bigger problem is that GC doesn't play nice
when you have many VMs running on the same hardware.
(point taken about partitioning and all that).
Some how it has to figure out if it has free reign or if it need to share.
In theory there are slowdowns when three or more JVMs run their GC at
exactly the same time, but how often does that happen.
Do you have any evidence that this is a problem in real-world useage?

- Clemens
Y Srinivas Ramakrishna
2009-01-23 20:00:56 UTC
Permalink
Depends on how heavy the load is and whether there are
convoying/synchronizing effects between the JVM's.
Typically this will manifest as large variance in the
scavenge times (or if you plot pause distributions
they will be at least bimodal and possibly combinatorially
multi-modal -- you will see a "ringing" tail).
I have seen this when customers do not pbind to psets
and load is heavy. If you use PrintGCStats/CompareGCStats
it'll show up as a larger "std-dev" reading.
When you are concerned about pause times (as Nick is)
the longer, fatter, ringing tail can hurt.

-- ramki

----- Original Message -----
From: Clemens Eisserer <linuxhippy at gmail.com>
Date: Friday, January 23, 2009 11:42 am
Subject: Re: Number of Parallel GC Threads
To: kirk <kirk at kodewerk.com>, hotspot-gc-dev at openjdk.java.net
Post by kirk
Post by kirk
I should add that I think that the bigger problem is that GC doesn't
play nice
Post by kirk
when you have many VMs running on the same hardware.
(point taken about partitioning and all that).
Some how it has to figure out if it has free reign or if it need to
share.
In theory there are slowdowns when three or more JVMs run their GC at
exactly the same time, but how often does that happen.
Do you have any evidence that this is a problem in real-world useage?
- Clemens
kirk
2009-01-23 22:52:32 UTC
Permalink
Post by Clemens Eisserer
I should add that I think that the bigger problem is that GC doesn't play nice
when you have many VMs running on the same hardware.
(point taken about partitioning and all that).
Some how it has to figure out if it has free reign or if it need to share.
In theory there are slowdowns when three or more JVMs run their GC at
exactly the same time, but how often does that happen.
Do you have any evidence that this is a problem in real-world useage?
- Clemens
Yes, I do have customers where the only way to solve the problem was to
"de-tune" gc threads. Sorry NDA prevents me from saying a lot more
though I can talk in generalities.

Regards,
Kirk
Jon Masamitsu
2009-01-23 15:37:02 UTC
Permalink
Nick,

The simpliest solution for your problem is to run the
small applications with -client.

This is a basic "one size does not fit all" problem. For
many years we ran the JVM with -client as the default
and we got plenty of complaints about poor
performance. Some of it went along the lines of
"You're running on a huge system, why don't you
use it".

In the latest jdk 6 updates the default number of threads
is less than the number of CPU's. For a T5240 (128 hardware
threads) I think it will be about 40 GC threads.
Of course that is still too many for the small applications.

By the way, on a server class machine the default
heap size depends on the amount of physical memory
(use 1/4 of the physical memory as the default maximum
heap size). So we cannot use the maximum heap size
as a way to decide how many GC threads to use. A
customer would have to add a different maximum
heap size. If they are willing to do that, they should
just add -client when appropriate.

When the JVM initializes, it knows nothing about the
application so cannot tell if it is "big" or "small". We
don't maintain a database of applications in order to
tell what are good numbers to use for each application
and there is no project in the works to do that.

We've wanted to add to the JVM the ability to monitor
the resources of the system that it is running on and
to modify its behavior (i.e., if there are lots of processes
running, reduce the maximum size of the heap and
number of GC threads) but we have no one to work on
it.

Jon
Post by Nicolas Michael
Hi,
on server-class machines, you are using the ParallelGC collector as the default
collector. This collector is per default using as many parallel gc threads as
there are cpus in the system. As far as I remember, you are doing this because
you are assuming that people usually just run one JVM on their server, so you're
optimizing for that case -- and telling people to explicitly decrease their
number of gc threads when running multiple JVM's. (As some of you know, we are
doing just that for our "most important" JVM instances we're running.)
However, we also have some little Java-based agents running on our systems which
just use the defaults. On our T5240's and T5440's they end up having 128 resp.
256 parallel gc threads! I'm wondering whether this is really a "reasonable"
default behavior?! Would someone really run just one JVM on such a large system?
Does it make sense to have 256 parallel gc threads as a *default* on such
servers? I was thinking that it may be better to limit the maximum number of
parallel gc threads that are created per default to a more "conservative" number
(whatever this number would be... 16? 32? ...?).
Well, this is just a thought. I could imagine that there are lots of Java-based
tools around (management GUIs, ...) that people run on their servers. And many
of those tools usually don't set any JVM parameters except probably the max heap
size. If I was starting such a tool as an operator, I wouldn't want it to
interrupt my workload by doing garbage collection with 256 threads in parallel.
Another idea: The demand for many gc threads also depends a little on the heap
size. An application with a default 64m heap will most likely not benefit much
from 256 gc threads, while an application with 4g of heap might. So the heap
size could be another indication of how many parallel gc threads would make
sense as a default.
I'd be interested to hear your opinion on that.
Thanks,
Nick.
Jon Masamitsu
2009-01-23 18:03:30 UTC
Permalink
Post by Jon Masamitsu
...
In the latest jdk 6 updates the default number of threads
is less than the number of CPU's. For a T5240 (128 hardware
threads) I think it will be about 40 GC threads.
Of course that is still too many for the small applications.
So this 40 for 128 hardware threads is not the 5/8's rule as Tony
described. Niagaras with > 32 hardware threads use a more
conservative 5/16's rule.
Continue reading on narkive:
Loading...