Skip to content

feat(grpc): Add tonic transport #2339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Aug 13, 2025

Conversation

arjan-bal
Copy link
Collaborator

@arjan-bal arjan-bal commented Jul 15, 2025

This PR includes the following:

  1. A transport trait that will be used by gRPC subchannels. Transports are expected to transfer serialized messages as bytes, but an in-memory transport may also transfer structs without serialization.
  2. The runtime trait is extended to include a method for creating TCP streams. Adapters are added to convert a gRPC runtime to a Hyper runtime.
  3. A transport implementation that uses tonic. To avoid a dependency on the code in tonic/src/transport, required code is copied over.
  4. A tonic codec that sends/receives Bytes. This is a temporary workaround until tonic supports bypassing the codec and receiving bytes.
  5. A test that uses the grpc tonic transport to create a bi-di stream with a tonic server.
  6. Remove the unused feature examples/tower since was causing udeps failures due to cargo's feature resolution in workspaces.

@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from 1417e9d to 66e6c10 Compare July 15, 2025 07:27
@arjan-bal
Copy link
Collaborator Author

Hi @dfawley and @LucioFranco, could you please review this PR when you have a moment?

@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from fe1436e to 957377c Compare July 21, 2025 09:30
@@ -229,6 +235,7 @@ impl InternalSubchannel {
transport: Arc<dyn Transport>,
backoff: Arc<dyn Backoff>,
unregister_fn: Box<dyn FnOnce(SubchannelKey) + Send + Sync>,
runtime: Arc<dyn Runtime>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious:

Given that we will have the same runtime for all the different gRPC components that require a runtime, did we consider something like a singleton that is initialized at init time, and all the components can use a getter to retrieve and use the singleton instead of the runtime being passed to every component that needs it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different grpc channels could theoretically use different runtimes. Maybe that isn't something we need to support, but it's pretty easily attained - it just requires passing around the runtime a bit more than if it were global.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ passes the event engine through channel args. In my opinion passing the runtime through a function param allows for cleaner dependency injection. It also enforces that the runtime is set during channel creation, before RPCs are made.

Having a singleton runtime will force all gRPC channels in a binary to use the same runtime. I don't know if this is a con though. We can discuss this in the team meeting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, thoug all of these Arc<dyn ...> should have new type wrappers to clean this up. I think passing a runtime handle around is totally fine as long as its cheap to clone. We likely do not want users to have to shuffle a runtime around though.

@@ -345,30 +353,34 @@ impl InternalSubchannel {
let transport = self.transport.clone();
let address = self.address().address;
let state_machine_tx = self.state_machine_event_sender.clone();
let connect_task = tokio::task::spawn(async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some kind of vet equivalent to ensure that task spawning (and other features provided by the runtime) are always only used from the runtime and not from other places (like tokio or the standard library)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately (pre-1.0) we want to not have any tokio runtime crates/features listed in Cargo.toml, except if you are using a tokio feature flag. That would prevent such a thing.

Copy link
Collaborator Author

@arjan-bal arjan-bal Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time looking into this. I found two approaches:

  1. Use clippy disallowed_method, disallowed_macros, etc. to block tokio symbols like tokio::spawn, tokio::task::spawn, etc. The problem with this approach is that we need to list all the types we want to block, there's not glob (*) operator available. It's also easy to miss the clippy warnings since they don't block PR submission.
  2. Introduce a separate crate, say grpc-runtime-tokio, for the default runtime implementation, and disable tokio's runtime features in the main grpc crate. If a function in the grpc crate tries to call tokio::spawn, it will fail to compile as the required feature will be disabled. The concern with this approach is that we need to export the runtime trait (and related types) which are unstable.

@LucioFranco would like to get your thoughts on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo features can be enabled even if a (transitive) dependency enabled the feature. I wasn't seein any compilation failures even after removing the tokio:rt feature from the Cargo.toml. I tracked down the depdency to tower's buffer feature:

cargo tree -i tokio -e features --edges=normal -p grpc --no-default-features
tokio v1.46.1
├── tokio feature "bytes"
│   └── tokio feature "io-util"
│       └── h2 v0.4.11
│           └── h2 feature "default"
│               └── hyper v1.6.0
│                   ├── hyper feature "client"
│                   │   └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│                   ├── hyper feature "default"
│                   │   └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│                   └── hyper feature "http2"
│                       └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
├── tokio feature "default"
│   ├── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   ├── h2 v0.4.11 (*)
│   ├── hyper v1.6.0 (*)
│   ├── tokio-stream v0.1.17
│   │   └── tonic v0.14.0 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/tonic)
│   │       └── tonic feature "codegen"
│   │           └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   │   ├── tokio-stream feature "default"
│   │   │   └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   │   └── tokio-stream feature "time"
│   │       └── tokio-stream feature "default" (*)
│   ├── tokio-util v0.7.15
│   │   └── tower v0.5.2
│   │       ├── tower feature "__common"
│   │       │   ├── tower feature "buffer"
│   │       │   │   └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   │       │   ├── tower feature "limit"
│   │       │   │   └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   │       │   └── tower feature "util"
│   │       │       └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   │       ├── tower feature "buffer" (*)
│   │       ├── tower feature "default"
│   │       │   └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   │       ├── tower feature "futures-core"
│   │       │   └── tower feature "__common" (*)
│   │       ├── tower feature "futures-util"
│   │       │   └── tower feature "util" (*)
│   │       ├── tower feature "limit" (*)
│   │       ├── tower feature "pin-project-lite"
│   │       │   ├── tower feature "__common" (*)
│   │       │   └── tower feature "util" (*)
│   │       ├── tower feature "sync_wrapper"
│   │       │   └── tower feature "util" (*)
│   │       ├── tower feature "tokio"
│   │       │   ├── tower feature "buffer" (*)
│   │       │   └── tower feature "limit" (*)
│   │       ├── tower feature "tokio-util"
│   │       │   ├── tower feature "buffer" (*)
│   │       │   └── tower feature "limit" (*)
│   │       ├── tower feature "tracing"
│   │       │   ├── tower feature "buffer" (*)
│   │       │   └── tower feature "limit" (*)
│   │       └── tower feature "util" (*)
│   │   ├── tokio-util feature "codec"
│   │   │   └── h2 v0.4.11 (*)
│   │   ├── tokio-util feature "default"
│   │   │   └── h2 v0.4.11 (*)
│   │   └── tokio-util feature "io"
│   │       └── h2 v0.4.11 (*)
│   └── tower v0.5.2 (*)
├── tokio feature "io-util" (*)
├── tokio feature "rt"
│   └── tower feature "buffer" (*)
├── tokio feature "sync"
│   ├── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
│   ├── hyper v1.6.0 (*)
│   ├── tokio-stream v0.1.17 (*)
│   ├── tokio-util v0.7.15 (*)
│   └── tower v0.5.2 (*)
│   ├── tower feature "buffer" (*)
│   └── tower feature "limit" (*)
└── tokio feature "time"
    └── grpc v0.9.0-alpha.1 (/usr/local/google/home/arjansbal/Development/tonic/grpc-tonic-transport-1/grpc)
    ├── tokio-stream feature "time" (*)
    └── tower feature "limit" (*)

Buffer has a constructor that uses tokio as the default executor. We're not using this constructor though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a default private feature for the tokio runtime that enables the tokio/rt feature flag. Due to this, tokio::spawn should not be usable outside the grpc::rt::tokio module. If tokio::spawn is used outside this module, the build will fail with default feature flags disabled, failing CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this yesterday, for now this is fine, we can rely on tokio initially until we make some more overall progress.

@@ -229,6 +235,7 @@ impl InternalSubchannel {
transport: Arc<dyn Transport>,
backoff: Arc<dyn Backoff>,
unregister_fn: Box<dyn FnOnce(SubchannelKey) + Send + Sync>,
runtime: Arc<dyn Runtime>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different grpc channels could theoretically use different runtimes. Maybe that isn't something we need to support, but it's pretty easily attained - it just requires passing around the runtime a bit more than if it were global.

@@ -345,30 +353,34 @@ impl InternalSubchannel {
let transport = self.transport.clone();
let address = self.address().address;
let state_machine_tx = self.state_machine_event_sender.clone();
let connect_task = tokio::task::spawn(async move {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately (pre-1.0) we want to not have any tokio runtime crates/features listed in Cargo.toml, except if you are using a tokio feature flag. That would prevent such a thing.

@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch 3 times, most recently from 41977ed to 38901f4 Compare July 24, 2025 20:39
Copy link
Collaborator

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some rust recommendations. Obviously I couldn't compile them but I'm pretty sure they're reasonable suggestions. If they don't work, just let me know.

@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch 2 times, most recently from 09c8dc2 to bfddcba Compare July 24, 2025 20:59
@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from bfddcba to 7388013 Compare July 24, 2025 21:03
@@ -229,6 +235,7 @@ impl InternalSubchannel {
transport: Arc<dyn Transport>,
backoff: Arc<dyn Backoff>,
unregister_fn: Box<dyn FnOnce(SubchannelKey) + Send + Sync>,
runtime: Arc<dyn Runtime>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, thoug all of these Arc<dyn ...> should have new type wrappers to clean this up. I think passing a runtime handle around is totally fine as long as its cheap to clone. We likely do not want users to have to shuffle a runtime around though.

@@ -345,30 +353,34 @@ impl InternalSubchannel {
let transport = self.transport.clone();
let address = self.address().address;
let state_machine_tx = self.state_machine_event_sender.clone();
let connect_task = tokio::task::spawn(async move {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this yesterday, for now this is fine, we can rely on tokio initially until we make some more overall progress.

@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from ef54eb2 to 77557b9 Compare July 30, 2025 14:03
@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from 77557b9 to e6afa5f Compare July 30, 2025 14:05
@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from 7cfcf7b to da64ac5 Compare July 31, 2025 09:27
@arjan-bal arjan-bal requested a review from LucioFranco July 31, 2025 13:57
@LucioFranco
Copy link
Member

@arjan-bal can you resolve the conflicts?

@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from 192600a to 58478ed Compare August 4, 2025 20:01
@arjan-bal
Copy link
Collaborator Author

@arjan-bal can you resolve the conflicts?

Merged master.

struct TransportBuilder {}

struct TonicTransport {
grpc: Grpc<TonicService>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let me know if I understand this correctly, because the new channel uses the new abstraction (Stream<item = Message>) rather than the current Request<HttpBody> abstraction we can temp use the Grpc type to add the encoding layering and http body mapping for us. I think this makes sense for now but to also note that this is the incorrect usage of the Grpc type.

So my final thought is that this is fine for now, we expect to change all of this code when I start on my overall abstraction rework. So based on that, this abuse of the type is fine and should help us get the demo out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of the exact changes that will be made to the channel.

I'm using the gRPC type because it implements the gRPC protocol over HTTP/2. Per a suggestion from @dfawley, I've implemented a GrpcService<Body> to avoid using the tonic::transport::Channel type. This approach has a couple of immediate benefits:

  1. It enables us to pass our own runtime to hyper.
  2. It allows us to await disconnection.

The Codec that encodes Bytes is a temporary workaround, as the grpc.streaming method requires a codec. If a version of grpc.streaming becomes available that can skip the codec by directly accepting and returning a stream of bytes, this workaround can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the goal of the Grpc type was to encapsulate the codec + some generic transport, the problem here is what we are nowing using it within the transport so the layering at the moment is something like codegen -> Grpc -> TonicTransport -> Grpc -> Custom tonic service. The Grpc type was never designed to be used as a raw transport but in reality we should be converting things into the correct abstraction level which at the moment is the GrpcService trait (that we want to change). I am not suggesting a change here mostly explaining that this code will need to be changed in the future.

}
};
let (metadata, stream, extensions) = response.into_parts();
let message_stream: Pin<Box<dyn Stream<Item = Result<Box<dyn Message>, Status>> + Send>> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address this clippy lint (allow is fine).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced a BoxStream type to fix this finding.

assert_eq!(echo_reponse.message, message);
}
Err(status) => {
panic!("Error from server: {status:?}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as well as here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the match statement by chaining another expect above. Using assert was becoming less readable since unwrap_err required the Ok variant to implement Debug which is not the case here.

assert!(
    result.is_ok(),
    "Error from server: {:?}",
    if let Err(e) = &result {
        e
    } else {
        unreachable!();
    }
);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use this assert alot you can make a custom one, though I wonder why the Ok variant doesn't implement debug? Honestly, I think every type should implement debug even if it just hides the internals.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I wonder why the Ok variant doesn't implement debug?

The Message type doesn't have Debug as a supertrait.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should sprinkle around Debug as supertraits (as a separate change). (Related also to @easwars PR #2380.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to enforce that all public types implement Debug, I can't find it now but we should continue to to do that. Even if they don't actually print their internals.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made Debug a supertrait of Message.

@arjan-bal arjan-bal force-pushed the grpc-tonic-transport-1 branch from f43621e to 8cc55c5 Compare August 7, 2025 10:02
@arjan-bal arjan-bal requested a review from LucioFranco August 7, 2025 11:16
Copy link
Collaborator

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the TODO

@arjan-bal arjan-bal requested a review from dfawley August 11, 2025 18:35
@@ -52,16 +54,16 @@ enum InternalSubchannelState {
}

struct InternalSubchannelConnectingState {
abort_handle: Option<AbortHandle>,
abort_handle: Option<Box<dyn TaskHandle>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo this erased type should get its own newtype to make it clear and more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an alias named BoxedTaskHandle.

@@ -0,0 +1,547 @@
// This file is @generated by prost-build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these generated files in src if they are for examples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated file is used in the unit test.

@dfawley dfawley merged commit 29163c2 into hyperium:master Aug 13, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants