Skip to content
Adam Ning edited this page Oct 1, 2025 · 38 revisions

Welcome to the crossfire-rs wiki!

The history of the project

I started the original demo in our internal shared library on 2019.10. Because async/await was relatively new at that time. There was no other option to get an MPMC channel supporting async, so I decided to create a wrapper on crossbeam, which provides lockless MPMC. The basic idea is to wrap Waker in Arc and post to the other side by a channel, so that pending coroutines will be woken by on_send() and on_recv().

On 2020.06.29, I turned this piece of code into open source and posted on https://www.reddit.com/r/rust/comments/hi9vhj/crossfire_yet_another_async_mpmcmpsc_based_on/. Since there were other async MPMC implementations around, it did not get much attention. Nevertheless, this project was heavily used in our distributed storage, and the algorithm showed reliability. In v2.0, the code was refactored to be more readable and concise, and it shows you the basic considerations of turning sync code into async.

These are the rules we follow while developing crossfire Async-and-Future-basics.

On 2025.6, I re-visited the long-dusted code, and made some comparison benchmark with other popular channels (flume, kanal...). I found our performance is still competitive, but the original API design was bad, so I went for a complete rewrite and released v2.0. Within the lifecycle of v2.0, I did these things:

  • Explore with the state transfer, some mistakes have been made, and fixed (refer to State transfer explained

  • Found a way to minimize the waker overhead in V1.x, for idle-select scenario (v2.0.17)

  • Some simple but very effective optimizations to the waker registry, led to 2x performance increase.

  • Refine the API for customized future,moved the poll_XX function into stream and sink, to prevent misuse (v2.0.20).

  • Adding test and fixing atomic ordering for Arm. (Although some problem inside Tokio is not discovered until I start testing v2.1)

In the meantime, I started to work on V2.1 with new ideas.

V2.1 compared to other channels

There are many channels based on locking, besides std and tokio. But no matter how much they have optimized for situations without contention, each time a sender or receiver accesses the shared state of the channel, it prevents access from the other side. So at max, they can only use one core.

Flume

Flume is known for rendezvous queues and no safe code implementation. Initially claimed to be faster than Crossbeam, but I failed to reproduce in our benchmarks, it's been in maintenance mode. Async timeout API is not provided.

Kanal

Kanal is very popular, probably due to aggressive idea direct-stack copying.

I agree that it's the ultimate optimization to channel based on locks. Sender will pop a receiver handle and unlock, move the message directly to the receiver, and wake it. It can reduce the time holding a lock, and skip the cost of message brokering in the channel. Receivers will try to move the message from a pending sender to the channel in on_recv(). It might reduce the cost of context switch by doing more jobs within the lock.

But this optimization came with a cost. It just makes async API not cancellation-safe, users might get uncertain results when using selection macros and timeout wrapper. They might miss the warning without looking closely at the document.

I have implemented direct-copy in Crossfire V2.1, and put a lot of effort into eliminating the extra cost of a COPY state (extra state will require extra atomic OPS). After that, I run the benchmark to compare with the code without direct-copy, there's no benifts :(

I think the reasons are the following:

  • The benefit of direct-copy in locked structure is certainty. But in a lockless channel, the sender operations and receiver operations actually run in parallel. When on_recv() tries to do direct-copy, there is a chance that the channel is full, then it falls back to logic without direct copy (the only thing to do is to wake the sender).

  • The chance for the senders to poll a receiver waker is small, because for a bounded channel with enough buffer, and both sides are busy spinning, most of the time waker registry is empty. So there's no difference in benchmark scores.

  • When on_recv() doing direct copy, instead of returning to do the message handle logic that it is supposed to do, the receiver wastes time in copying message to the channel for an unrelated sender-side waker. This is bad for latency when the number of senders and receivers is balanced.

So in the end, I removed the direct-copy code in async context, to make sure our API is cancellation-safe. While in blocking context, direct-copy is made optional, only enable when the sender-side is congested. Performance benefit is observed in the one-core VPS scenario.

Crossbeam

Previously, I thought the speed of Crossbeam is the ceiling, since Crossfire is based on Crossbeam. Because one goal in V2.1 is to make async waker into first-class citizen, the same as thread context, I read the code of crossbeam-channel, and found the difference, that the initial idea of Crossfire is notify_one during on_send()/on_recv(), but crossbeam-channel is notify_all. This is our advantage in potential performance, although there're more states in crossfire to prevent starvation.

And after the removal of crossbeam-channel, I found that our async performance is better than 2.0, because waker was always empty in async context, and now there's no more extra cost in it.

Single version of consumer/producer

There's currently no special treatment in the underlying queue (it's possible in the future), just a special RegistrySingle. The benchmark scores showed the benefit is slight; sometimes MPMC has even more throughput than MPSC with threads more than one. For RegistrySingle, it doesn't need to cancel_waker, the memory footprint is small.

Meanwhile the safety usage should be considered for MPSC / SPSC against concurrent usage. The receiver of tokio::mpsc use &mut. But we want to keep it immutable to avoid borrowing issues. In V2.0, we add a phantom marker Cell to AsyncTx/AsyncRx/Tx/Rx, to prevent them from being access concurrently within Arc.

Unbuffered (zero-bound) channels

There are zero-bound channel in Crossbeam, Flume, and kanal, while tokio::mpsc does not support it. I was thinking a draft. Because in order to achieve cancellation-safe in the async API, it might need back-and-forth ack, which is bad for performance, and not good for code simplicity. Once I created a poll to see how many people were actually interested in it https://github.com/frostyplanet/crossfire-rs/discussions/25 , it seems zero-bound channel is not as useful as they sound.

For the moment, I dropped the plan to implement this feature. If you have a different opinion, welcome to discuss.

V2.1 internal details

Crossbeam algorithm

State transfer explained

Test

We have a large test suite with various async runtimes and CPU arches, refer to the Test Status section in README.

Benchmark

2.1.4 Arm (2025-10-1)

2.1.0 vs. 2.0.26 Intel(2025-09-21)

2.0.14 Intel(2025-08-03)

2.0.0 Intel(2025-06-27)

Clone this wiki locally