Skip to content

impl(gax-internal): integrate RequestRecorder with gRPC transport#5368

Merged
haphungw merged 7 commits into
googleapis:mainfrom
haphungw:o11y-integrate-grpc-transport
Apr 12, 2026
Merged

impl(gax-internal): integrate RequestRecorder with gRPC transport#5368
haphungw merged 7 commits into
googleapis:mainfrom
haphungw:o11y-integrate-grpc-transport

Conversation

@haphungw
Copy link
Copy Markdown
Contributor

Add hook to record when a gRPC request starts and completes (successfully or with error).

This PR implements changes to Unary streaming only. Bidirectional streaming and Server streaming will be addressed in subsequent PRs.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 91.75258% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.59%. Comparing base (a60cd8f) to head (0f820e4).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/gax-internal/src/observability/grpc_tracing.rs 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5368      +/-   ##
==========================================
- Coverage   97.80%   97.59%   -0.22%     
==========================================
  Files         222      222              
  Lines       46563    46609      +46     
==========================================
- Hits        45540    45486      -54     
- Misses       1023     1123     +100     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@haphungw haphungw marked this pull request as ready for review April 11, 2026 18:29
@haphungw haphungw requested a review from a team as a code owner April 11, 2026 18:29
Copy link
Copy Markdown
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this change.

use ::tracing::instrument::Instrument;
let span = $crate::client_request_signals!(info: $info, method: $method);
let recorder = $crate::observability::RequestRecorder::new($info);
if let Some(current) = $crate::observability::RequestRecorder::current() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow... this is where we create a request recorder and then set the recoder for a given future... Why would we notify any existing recorder (which is also unexpected) of anything?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, this block of code makes no sense. This was a helpless attempt to fix the bug in the storage integration test, but now as I double check this happened because we were not able to generate the resource name for many storage methods.

mod storage {
use google_cloud_test_utils::errors::anydump;

#[ignore]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this disabled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires BidiReadObject to pass. I'm not sure about the current implementation, so I wanna get this through before adding more code.
The test will be enabled again in the next PR when we add bidi streaming support.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks. Consider:

#[ignore = "TODO(#....) - restore when bidi streaming works again"]

@haphungw
Copy link
Copy Markdown
Contributor Author

The main motivation behind this PR (and also why I am so rushed) is to fix a regression in our gRPC observability logic. The generated gRPC transport still expects to find resource_name in req.extensions() because we already removed the insertions when we started the RequestRecorder refactor, leaving gRPC spans without resource names. https://github.com/googleapis/librarian/blob/db6b993937641a143240625be669eb97f3b24270/internal/sidekick/rust/templates/grpc-client/transport.rs.mustache#L191

If we don't want to refactor immediately, an alternative is to put extensions.insert(gaxi::observability::ResourceName::new(rn));
which I think is not ideal given our current approach to use RequestRecorder.
https://github.com/googleapis/librarian/pull/5298/changes:


I found all of these issues when I was verifying the storage integration tests:
https://github.com/westarle/google-cloud-rust/blob/0609d255b33cbc42d664a3e625e542c448643764/tests/o11y/src/storage_grpc_tracing.rs#L273-L288

We have three problems to deal with right now:

  • State propagation: gRPC transport currently cannot access resource_name because it was removed from request extensions.
  • RPC Types: gRPC needs implementations for Unary, Bidirectional, and Server Streaming RPCs (unlike http which is purely request-response).
  • Missing resource names: storage (and compute) have many methods without resource names. They were just not picked up by our current heuristic: feat(sidekick): refine resource identification heuristics librarian#5299

What I'm trying to do in this PR is to see if this is the right way to give gRPC access to RequestRecorder for Unary, and if this works I'll extend it to Bidi and Server Streaming.

@coryan
Copy link
Copy Markdown
Contributor

coryan commented Apr 11, 2026

I think (but I am really not sure) we have two choices:

  1. For gRPC, we use the options extension, set the resource name in the generated code (as shown in feat(sidekick/rust): inject ResourceName into gRPC request extensions in templates librarian#5298) and then populate the RequestRecorder from that extension, or
  2. Change the generated code to store the resource name directly into the RequestRecorder.

I am guessing (2) is cleaner, though maybe more risky. Are you trying to implement (1) or some alternative I did not consider?

A third (less desirable option) is to release without this attribute feature. It only affects the Storage admin operations which are not used that often 🤷

Copy link
Copy Markdown
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this hurts anything, so approved. I am not sure it solves anything either, but I may be wrong.

mod storage {
use google_cloud_test_utils::errors::anydump;

#[ignore]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks. Consider:

#[ignore = "TODO(#....) - restore when bidi streaming works again"]

Ok(())
}

#[ignore]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably these need a TODO(#...) also.

@haphungw
Copy link
Copy Markdown
Contributor Author

haphungw commented Apr 11, 2026

Yes, I was trying to do #1. I don't really think we have any other choices.

Re #2 being cleaner but more risky, what are the risks we are facing that is not regenerating with librarian?

@westarle
Copy link
Copy Markdown
Contributor

I'm ok with skipping the resource id attribute for storage control until after freeze to avoid risk to the release and do things 'right'

@haphungw
Copy link
Copy Markdown
Contributor Author

Since we are deferring the resource ID for storage control until post-freeze, what's the general sentiment on option 1 (extensions) vs option 2 (RequestRecorder) for the long term?

I'm already trying to make #1 work for Bidi streaming in a separate branch, but if that somehow misses the mark I hope reverting is not too bad 🤦‍♂️

@haphungw haphungw merged commit f261508 into googleapis:main Apr 12, 2026
36 checks passed
@haphungw haphungw deleted the o11y-integrate-grpc-transport branch April 15, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants