bpf: tcp: Exactly-once socket iteration #9150

kernel-patches-daemon-bpf · 2025-06-18T16:36:46Z

Pull request for series with
subject: bpf: tcp: Exactly-once socket iteration
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=973515

kernel-patches-daemon-bpf · 2025-06-18T16:36:46Z

Upstream branch: cd7312a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=973515
version: 2

kernel-patches-daemon-bpf · 2025-06-19T16:58:48Z

Upstream branch: e30329b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=973515
version: 2

Prepare for the next patch which needs to be able to choose either GFP_USER or GFP_NOWAIT for calls to bpf_iter_tcp_realloc_batch. Signed-off-by: Jordan Rife <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]>

Require that iter->batch always contains a full bucket snapshot. This invariant is important to avoid skipping or repeating sockets during iteration when combined with the next few patches. Before, there were two cases where a call to bpf_iter_tcp_batch may only capture part of a bucket: 1. When bpf_iter_tcp_realloc_batch() returns -ENOMEM. 2. When more sockets are added to the bucket while calling bpf_iter_tcp_realloc_batch(), making the updated batch size insufficient. In cases where the batch size only covers part of a bucket, it is possible to forget which sockets were already visited, especially if we have to process a bucket in more than two batches. This forces us to choose between repeating or skipping sockets, so don't allow this: 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation fails instead of continuing with a partial batch. 2. Try bpf_iter_tcp_realloc_batch() with GFP_USER just as before, but if we still aren't able to capture the full bucket, call bpf_iter_tcp_realloc_batch() again while holding the bucket lock to guarantee the bucket does not change. On the second attempt use GFP_NOWAIT since we hold onto the spin lock. I did some manual testing to exercise the code paths where GFP_NOWAIT is used and where ERR_PTR(err) is returned. I used the realloc test cases included later in this series to trigger a scenario where a realloc happens inside bpf_iter_tcp_batch and made a small code tweak to force the first realloc attempt to allocate a too-small batch, thus requiring another attempt with GFP_NOWAIT. Some printks showed both reallocs with the tests passing: May 09 18:18:55 crow kernel: resize batch TCP_SEQ_STATE_LISTENING May 09 18:18:55 crow kernel: again GFP_USER May 09 18:18:55 crow kernel: resize batch TCP_SEQ_STATE_LISTENING May 09 18:18:55 crow kernel: again GFP_NOWAIT May 09 18:18:57 crow kernel: resize batch TCP_SEQ_STATE_ESTABLISHED May 09 18:18:57 crow kernel: again GFP_USER May 09 18:18:57 crow kernel: resize batch TCP_SEQ_STATE_ESTABLISHED May 09 18:18:57 crow kernel: again GFP_NOWAIT With this setup, I also forced each of the bpf_iter_tcp_realloc_batch calls to return -ENOMEM to ensure that iteration ends and that the read() in userspace fails. Signed-off-by: Jordan Rife <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]>

Get rid of the st_bucket_done field to simplify TCP iterator state and logic. Before, st_bucket_done could be false if bpf_iter_tcp_batch returned a partial batch; however, with the last patch ("bpf: tcp: Make sure iter->batch always contains a full bucket snapshot"), st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk. Signed-off-by: Jordan Rife <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]>

Prepare for the next patch that tracks cookies between iterations by converting struct sock **batch to union bpf_tcp_iter_batch_item *batch inside struct bpf_tcp_iter_state. Signed-off-by: Jordan Rife <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]>

Replace the offset-based approach for tracking progress through a bucket in the TCP table with one based on socket cookies. Remember the cookies of unprocessed sockets from the last batch and use this list to pick up where we left off or, in the case that the next socket disappears between reads, find the first socket after that point that still exists in the bucket and resume from there. This approach guarantees that all sockets that existed when iteration began and continue to exist throughout will be visited exactly once. Sockets that are added to the table during iteration may or may not be seen, but if they are they will be seen exactly once. Signed-off-by: Jordan Rife <[email protected]>

Replicate the set of test cases used for UDP socket iterators to test similar scenarios for TCP listening sockets. Signed-off-by: Jordan Rife <[email protected]>

Prepare to test TCP socket iteration over both listening and established sockets by allowing the BPF iterator programs to skip the port check. Signed-off-by: Jordan Rife <[email protected]>

kernel-patches-daemon-bpf · 2025-06-20T18:37:40Z

Upstream branch: 99fe8af
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=973515
version: 2

Add parentheses around loopback address check to fix up logic and make the socket state filter configurable for the TCP socket iterators. Iterators can skip the socket state check by setting ss to 0. Signed-off-by: Jordan Rife <[email protected]>

Prepare for bucket resume tests for established TCP sockets by making the number of ehash buckets configurable. Subsequent patches force all established sockets into the same bucket by setting ehash_buckets to one. Signed-off-by: Jordan Rife <[email protected]>

Prepare for bucket resume tests for established TCP sockets by creating established sockets. Collect socket fds from connect() and accept() sides and pass them to test cases. Signed-off-by: Jordan Rife <[email protected]>

Prepare for bucket resume tests for established TCP sockets by creating a program to immediately destroy and remove sockets from the TCP ehash table, since close() is not deterministic. Signed-off-by: Jordan Rife <[email protected]>

Replicate the set of test cases used for UDP socket iterators to test similar scenarios for TCP established sockets. Signed-off-by: Jordan Rife <[email protected]>

kernel-patches-daemon-bpf bot added new bpf-next V2 V2-ci-pass labels Jun 18, 2025

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 7801208 to a994d4a Compare June 19, 2025 16:57

kernel-patches-daemon-bpf bot force-pushed the series/964616=>bpf-next branch from 9b86ebd to 0484556 Compare June 19, 2025 16:58

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from a994d4a to e7d5deb Compare June 20, 2025 18:36

jrife added 7 commits June 20, 2025 11:37

selftests/bpf: Add tests for bucket resume logic in listening sockets

2f68540

Replicate the set of test cases used for UDP socket iterators to test similar scenarios for TCP listening sockets. Signed-off-by: Jordan Rife <[email protected]>

selftests/bpf: Allow for iteration over multiple ports

0a52767

Prepare to test TCP socket iteration over both listening and established sockets by allowing the BPF iterator programs to skip the port check. Signed-off-by: Jordan Rife <[email protected]>

jrife added 5 commits June 20, 2025 11:37

selftests/bpf: Create iter_tcp_destroy test program

06f4725

Prepare for bucket resume tests for established TCP sockets by creating a program to immediately destroy and remove sockets from the TCP ehash table, since close() is not deterministic. Signed-off-by: Jordan Rife <[email protected]>

selftests/bpf: Add tests for bucket resume logic in established sockets

9c8f86d

Replicate the set of test cases used for UDP socket iterators to test similar scenarios for TCP established sockets. Signed-off-by: Jordan Rife <[email protected]>

kernel-patches-daemon-bpf bot force-pushed the series/964616=>bpf-next branch from 0484556 to 9c8f86d Compare June 20, 2025 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpf: tcp: Exactly-once socket iteration #9150

bpf: tcp: Exactly-once socket iteration #9150

Uh oh!

kernel-patches-daemon-bpf bot commented Jun 18, 2025

Uh oh!

kernel-patches-daemon-bpf bot commented Jun 18, 2025

Uh oh!

kernel-patches-daemon-bpf bot commented Jun 19, 2025

Uh oh!

kernel-patches-daemon-bpf bot commented Jun 20, 2025

Uh oh!

Uh oh!

bpf: tcp: Exactly-once socket iteration #9150

Are you sure you want to change the base?

bpf: tcp: Exactly-once socket iteration #9150

Uh oh!

Conversation

kernel-patches-daemon-bpf bot commented Jun 18, 2025

Uh oh!

kernel-patches-daemon-bpf bot commented Jun 18, 2025

Uh oh!

kernel-patches-daemon-bpf bot commented Jun 19, 2025

Uh oh!

kernel-patches-daemon-bpf bot commented Jun 20, 2025

Uh oh!

Uh oh!