-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Add nullability checks to the Agent's OpenTelemetry plugin #37659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request enhances the robustness of the Agent's OpenTelemetry plugin by adding null safety checks for span operations and aligns the Agent's Dockerfile with the Proxy's JDK version.
- Added null checks for
parentSpanbefore callingsetParent()to prevent NPEs when no parent span exists - Added null checks for
spanobjects before calling lifecycle methods (setStatus(),end(),recordException()) - Updated Agent's Dockerfile from JDK 23 to JDK 25 to match the Proxy's configuration
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| distribution/agent/Dockerfile | Updates base image from eclipse-temurin:23-jdk to eclipse-temurin:25-jdk to align with Proxy's Dockerfile |
| agent/plugins/tracing/type/opentelemetry/src/main/java/org/apache/shardingsphere/agent/plugin/tracing/opentelemetry/advice/OpenTelemetrySQLParserEngineAdvice.java | Adds null checks for parentSpan before setting parent context and for span before lifecycle operations |
| agent/plugins/tracing/type/opentelemetry/src/main/java/org/apache/shardingsphere/agent/plugin/tracing/opentelemetry/advice/OpenTelemetryJDBCExecutorCallbackAdvice.java | Adds null checks for parentSpan before setting parent context and for span before lifecycle operations |
| agent/plugins/tracing/type/opentelemetry/src/test/java/org/apache/shardingsphere/agent/plugin/tracing/opentelemetry/advice/OpenTelemetrySQLParserEngineAdviceTest.java | Adds test case for null parent span scenario and refactors test assertions to support both valid and invalid parent span IDs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...phere/agent/plugin/tracing/opentelemetry/advice/OpenTelemetryJDBCExecutorCallbackAdvice.java
Show resolved
Hide resolved
terrymanu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Direction is right, but a few changes are still needed before merging:
- Good parts: The null guards in both advices address the NPE root cause reported in #32793. Adding the no-parent-span test for OpenTelemetrySQLParserEngineAdvice is the right move.
- Issues to fix:
- Missing symmetric coverage for the JDBC path: please add a no-parent-span/null-attachment test for OpenTelemetryJDBCExecutorCallbackAdvice to match the SQL parser coverage and prevent regressions.
- The base image bump in distribution/agent/Dockerfile from eclipse-temurin:23-jdk to 25-jdk is unrelated to the NPE fix and introduces unvalidated compatibility risk (ByteBuddy/OTel/agent on non-LTS). Please revert it in this PR; if you want the upgrade, raise a separate PR with validation evidence.
- Scope vs. issue #32793: Elastic APM 9.2.3 connectivity/backpressure noted in the issue comments is not covered here. If the intent is NPE-only, please clarify that the remaining APM/metrics problem will be handled separately; otherwise, add the fix and validation.
- Newly introduced risk: When parentSpan is absent, spans become orphaned (invalid parent). If that is intended, document it (or add assertions). If not, consider using the current context or emitting a warning.
Please address the above, and we can re-review quickly.
1fd3fd2 to
a5ed67f
Compare
linghengqian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Missing symmetric coverage for the JDBC path: please add a no-parent-span/null-attachment test for OpenTelemetryJDBCExecutorCallbackAdvice to match the SQL parser coverage and prevent regressions.
- Solve.
The base image bump in distribution/agent/Dockerfile from eclipse-temurin:23-jdk to 25-jdk is unrelated to the NPE fix and introduces unvalidated compatibility risk (ByteBuddy/OTel/agent on non-LTS). Please revert it in this PR; if you want the upgrade, raise a separate PR with validation evidence.
- It has been rolled back.
Scope vs. issue https://github.com/apache/shardingsphere/issues/32793 : Elastic APM 9.2.3 connectivity/backpressure noted in the issue comments is not covered here. If the intent is NPE-only, please clarify that the remaining APM/metrics problem will be handled separately; otherwise, add the fix and validation.
-
This is essentially beyond the scope of the original issue and remains unresolved, as there aren't even any trace-related error logs. Either the ShardingSphere JDBC is using an outdated version of the Otel SDK; or the ShardingSphere agent's Dockerfile uses JDK 23, which is incompatible with older versions of the Otel SDK; or the Elastic Agent doesn't allow sending trace-only data, while Jaeger does; or the Otel SDK has a bug. Overall, I would say this definitely deserves a new issue. Because I can't figure out how to write integration tests for this overly complex reproduction process.
-
ShardingSphere agent lacks graceful shutdown functionality when using the Otel SDK; forcibly shutting down the main application will cause the agent to throw an exception. This has gone beyond the scope of the original issue.
Newly introduced risk: When parentSpan is absent, spans become orphaned (invalid parent). If that is intended, document it (or add assertions). If not, consider using the current context or emitting a warning.
- When there is no parent span, the Otel SDK should start a root span (because the parent span ID is invalid) instead of attaching an arbitrary context; this is consistent with Otel's behavior when a parent is missing, see the explanation at open-telemetry/opentelemetry-specification#3781 . The SLF4J Java API cannot be used here because the logs should be sent from the main application, not the agent.
|
@linghengqian Version 5.5.3 is scheduled for release soon. Should we adjust the milestone for this issue to 5.5.4? |
|
Fixes #32793 .
Changes proposed in this pull request:
Before committing this PR, I'm sure that I have checked the following options:
./mvnw clean install -B -T1C -Dmaven.javadoc.skip -Dmaven.jacoco.skip -e.