-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][broker] Remove ForkJoin common pool in caffeine cache #24745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Java 17.0.17 will include the fix and the planned release date is October 21st: https://wiki.openjdk.org/display/JDKUpdates/JDK+17u |
why is this a problem if there are different ways to use Caffeine?
Since the fix for Java 17 will be released in 17.0.17 on October 21st, it will soon be addressed.
I understand that it's severe when that happens. How often have you observed this bug? Does it require that that JVM has been running for very long or is it just random when there's heavy load? |
It’s not necessarily a problem, but I’d like to highlight that certain parts of the code explicitly specify a thread pool, whereas other parts continue to use the default thread pool.
It occurs very rarely, only after running for a long time, and it doesn’t seem to be related to the load. |
what is "long time" here? days? weeks? |
For blocking IO, ForkJoinPool is a reasonable solution to avoid blocking Netty IO threads or the Metadata store thread. VirtualThreads in Java 21 could be a better option eventually. Instead of removing ForkJoinPool usage blindly, there should be more analysis of each case so that we don't cause performance regressions. In Pulsar, the execution model isn't clean and more effort should be put in designing how thread pools and task execution should behave in general. This problem has existed for a long time so it's waiting for contributors. :) |
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
Outdated
Show resolved
Hide resolved
We’ve gone through both weeks and months. Add more info: @forkjoinpool[ |
I agree that VirtualThreads is better. However, given the current situation, ForkJoin common pool is not very suitable.
#22589 this pr change executor from ForkJoinPool to pulsar service executor. It seems there’s no analysis regarding performance regression.
Look forward to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, it used ForkJoinPool
, but switching to MoreExecutors.directExecutor()
avoids unnecessary thread context switching. In general, specifying an executor is beneficial for clarity and control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't support making this change, since it's blindly making changes across the code base. For example in JwksCache, it's necessary to have an actual executor to run tasks in background. The refreshAfterWrite
will handle that. This PR breaks existing behavior.
Added a new executor to address it. Please take a review again. Thanks. |
Since the fix for Java 17 will be released in 17.0.17 on October 21st, it will soon be addressed.
I saw this comment just now, it should be fine to wait for the JDK patch, so I dismissed my approval. Using a dedicated executor is reasonable. But the
|
Waiting for the JDK upgrade and using the latest version can indeed solve this problem, but wouldn't it be better if we could solve this issue without relying on a JDK upgrade? If the user doesn't know that they need to upgrade the JDK to a specified version, there will always be a risk of blocking. That's really scary!
That's the purpose of this PR.
Use |
Technically, there are many references in Pulsar that use To prevent this use cases, we also have to disallow any It's a JDK standard API that should be widely used. Many projects should be affected rather than Pulsar itself. So I guess it's why @lhotari asked |
A simple question, if I'm going to change the dedicated thread pool to |
If use ForkJoinPool as the dedicated thread pool but not the common pool, even if it is blocked, the impact is localized; Unlike now when using the ForkJoin common pool, where the impact is global. |
Hi, @BewareMyPower |
In principal, a Java project's code base should assume the JDK is reliable.
Now, your question might become which Apache project uses parallel stream, virtual threads, Caffine cache (or any unknown dependency that uses the common I don't have a tool to detect usages of
However, such a bug from JDK that rarely happens should not be a key factor that changes the code base a lot. |
In short, using a dedicated pool is okay. Continue using I'd like to hear @lhotari's voice again |
@BewareMyPower I think your previous comment already covered most and was a really good answer. Just adding a few items:
|
Motivation
Same to #24383
Currently, the usage of Caffeine in pulsar code is inconsistent. Some asynchronous builds use the default ForkJoin common pool, while others specify a custom thread pool, and there are also synchronous builds.
The ForkJoin pool has a specific bug that can prevent tasks from being executed after being submitted to the thread pool. This issue has been fixed in Java 21.0.8, but it still exists in Java 17.
https://bugs.openjdk.org/browse/JDK-8330017
https://bugs.openjdk.org/browse/JDK-8351933
If the ForkJoin common pool gets blocked, it can prevent all dependent caches from fetching data.
Modifications
The choice of thread pool may not be the most appropriate. Please provide suggestions for improvement, thanks.
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: