Skip to content

fuzz-tests: Add a test for the gossipd-connectd interface #8423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Chand-ra
Copy link

connectd_req() in gossipd/gossipd.c is responsible for handling gossip messages from peers handed to it by connectd. Add a stateful test simulating its behaviour.

Checklist

Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:

  • The changelog has been updated in the relevant commit(s) according to the guidelines.
  • Tests have been added or modified to reflect the changes.
  • Documentation has been reviewed and updated as needed.
  • Related issues have been listed and linked, including any that this PR closes.

CC: @morehouse

@Chand-ra
Copy link
Author

The test results in the following LeakSanitizer error when run on its corpus:

==116428==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x619eb51001f3 in malloc (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2da1f3) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #1 0x619eb54deb3d in split_node /home/chandra/lightning/ccan/ccan/intmap/intmap.c:67:9
    #2 0x619eb54de8c2 in intmap_add_ /home/chandra/lightning/ccan/ccan/intmap/intmap.c:113:9
    #3 0x619eb53122d1 in map_add /home/chandra/lightning/gossipd/gossmap_manage.c:176:6
    #4 0x619eb530f8c6 in gossmap_manage_channel_announcement /home/chandra/lightning/gossipd/gossmap_manage.c:640:8
    #5 0x619eb53b2101 in handle_recv_gossip /home/chandra/lightning/tests/fuzz/../../gossipd/gossipd.c:205:12
    #6 0x619eb53a93b2 in run /home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd.c:543:4
    #7 0x619eb513e8f8 in LLVMFuzzerTestOneInput /home/chandra/lightning/tests/fuzz/libfuzz.c:25:2
    #8 0x619eb504c0c4 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2260c4) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #9 0x619eb504b7b9 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2257b9) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #10 0x619eb504d3d6 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2273d6) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #11 0x619eb504d8e7 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2278e7) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #12 0x619eb503addf in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x214ddf) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #13 0x619eb5065466 in main (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x23f466) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #14 0x7aed7c22a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #15 0x7aed7c22a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #16 0x619eb502fdc4 in _start (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x209dc4) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)

SUMMARY: AddressSanitizer: 40 byte(s) leaked in 1 allocation(s).

INFO: a leak has been found in the initial corpus.

Copy link
Contributor

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level review: I like the target. Needs some rework so we consider exit to be a crash.

Haven't studied create_gossip_msg yet.

struct node_id id = node_id(privkey_from_index(tal_count(peer_ids)));
tal_arr_expand(&peer_ids, id);

msg = towire_gossipd_new_peer(tmpctx, &id, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting gossip_queries_feature = false will limit some of the code paths that can be executed, especially all the seeker stuff. If there's a good reason to do this, please add a comment here that explains why.

Comment on lines 520 to 539
switch (fromwire_u8(&data, &size) % 5)
{
case 0:
gossip_msg = create_gossip_msg(tmpctx, &data, &size, WIRE_CHANNEL_ANNOUNCEMENT);
break;
case 1:
gossip_msg = create_gossip_msg(tmpctx, &data, &size, WIRE_CHANNEL_UPDATE);
break;
case 2:
gossip_msg = create_gossip_msg(tmpctx, &data, &size, WIRE_NODE_ANNOUNCEMENT);
break;
case 3:
gossip_msg = create_gossip_msg(tmpctx, &data, &size, WIRE_REPLY_CHANNEL_RANGE);
break;
case 4:
gossip_msg = create_gossip_msg(tmpctx, &data, &size, WIRE_REPLY_SHORT_CHANNEL_IDS_END);
break;
default:
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps this switch could go inside create_gossip_msg, so there's one less parameter to pass.


cleanup:
if (daemon)
tal_free(daemon->connectd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the explicit tal_free necessary? Naively it looks like clean_tmpctx should take care of this already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to circumvent the dangling allocation error fixed in #8424 .

Comment on lines 478 to 470
if (setjmp(exit_jmp) != 0)
goto cleanup;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For gossipd, we actually shouldn't consider exit to be normal. If gossipd exits, the entire CLN node shuts down, which would be a DoS vulnerability.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path can only be triggered by status_failed() and towire_warningfmt().

I can get rid of the mock for towire_warningfmt() by including common/wire_error.o for linking. As for status_failed(), I don't think triggering it would mean a vulnerability, it's used pretty liberally throughout gossipd/gossipd.c to report a bad message from the peer. Maybe exit()'s behavior manifests differently in a live node than it does in our fuzzer (where it aborts the entire process).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason for us to quit the fuzz target for towire_warning_fmt.

status_failed means "print error and exit", and should never happen in the wild. If it does, it's a DoS vulnerability.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason for us to quit the fuzz target for towire_warning_fmt.

Right, I think I didn't word that clearly. Right now we're mocking the behavior of towire_warning_fmt() which we don't need to, by simply including common/wire_error.o for linking along with the other artifacts. I've already done this in the latest push and it doesn't seem to result in any crash.

status_failed means "print error and exit", and should never happen in the wild. If it does, it's a DoS vulnerability.

Oh okay, makes sense.

@morehouse
Copy link
Contributor

The test results in the following LeakSanitizer error when run on its corpus:

I looked closer at this. The accused line of code does a naked malloc (no tal use), which makes me suspect there could be a cleanup problem on each iteration of the fuzz target, since only tal-allocated memory gets freed.

We might need to manually delete all elements from the gossmap at the end of each iteration.

@Chand-ra
Copy link
Author

We might need to manually delete all elements from the gossmap at the end of each iteration.

This makes the target a bit messier but does seem to manage to evade the issue at hand.

@Chand-ra
Copy link
Author

I was fixing some of the other issues with this target when it ran into yet another memory leak:

==31969==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x57fcd5d431f3 in malloc (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2da1f3) (BuildId: 844798d4ffd40334b8c2f0d73bff69c289ab8025)
    #1 0x57fcd612359d in split_node /home/chandra/lightning/ccan/ccan/intmap/intmap.c:67:9
    #2 0x57fcd6123322 in intmap_add_ /home/chandra/lightning/ccan/ccan/intmap/intmap.c:113:9
    #3 0x57fcd5f52a16 in add_unknown_scid /home/chandra/lightning/gossipd/seeker.c:695:7
    #4 0x57fcd5f52830 in query_unknown_channel /home/chandra/lightning/gossipd/seeker.c:1159:2
    #5 0x57fcd5fd8f95 in process_channel_update /home/chandra/lightning/tests/fuzz/../../gossipd/gossmap_manage.c:806:3
    #6 0x57fcd5fd6e3d in gossmap_manage_channel_update /home/chandra/lightning/tests/fuzz/../../gossipd/gossmap_manage.c:983:9
    #7 0x57fcd5ff04be in handle_recv_gossip /home/chandra/lightning/tests/fuzz/../../gossipd/gossipd.c:210:12
    #8 0x57fcd5fe708c in run /home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd.c:515:4
    #9 0x57fcd5d818f8 in LLVMFuzzerTestOneInput /home/chandra/lightning/tests/fuzz/libfuzz.c:25:2
    #10 0x57fcd5c8f0c4 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2260c4) (BuildId: 844798d4ffd40334b8c2f0d73bff69c289ab8025)
...
...
...

This time, the leak occurs when we try to send a channel_update message for a channel that the peer hasn't already received a channel_announcement for. CLN asks the seeker to query the sending peer for a channel_announcement with query_unknown_channel() which then tried to add the unknown peer to seeker->unknown_scids using uintmap_add() which causes the leak.

@morehouse
Copy link
Contributor

This time, the leak occurs when we try to send a channel_update message for a channel that the peer hasn't already received a channel_announcement for. CLN asks the seeker to query the sending peer for a channel_announcement with query_unknown_channel() which then tried to add the unknown peer to seeker->unknown_scids using uintmap_add() which causes the leak.

Then we also need to manually free that map at the end of each iteration.

Copy link
Contributor

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you been able to run this fuzz target? I tried and hit a bunch of crashes immediately.

Only reproduces when I use multiple workers, and seems multiple workers are trying to touch the same file (yikes).

fuzz-gossipd-connectd: gossip_store_compact: rename failed: No such file or dire
ctory (version v25.05-2-gda8afec-modded)                                        
==174417== ERROR: libFuzzer: fuzz target exited                                 
    #7 0x00000057ce3c in status_failed common/status.c:208:2
    #8 0x000000526e9c in gossip_store_compact gossipd/gossip_store.c:360:3 
    #9 0x000000526e9c in gossip_store_new gossipd/gossip_store.c:404:11
    #10 0x0000005ef290 in setup_gossmap tests/fuzz/../../gossipd/gossmap_manage.c:453:11
    #11 0x0000005eed07 in gossmap_manage_new tests/fuzz/../../gossipd/gossmap_manage.c:485:7
    #12 0x0000005f8381 in new_daemon tests/fuzz/fuzz-gossipd-connectd.c:128:15
    #13 0x0000005f8381 in run tests/fuzz/fuzz-gossipd-connectd.c:472:26

I'll review further once all shallow crashes are fixed.

Comment on lines 469 to 470
if (setjmp(exit_jmp) != 0)
goto cleanup;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jmp_buf is now not needed. Please remove.

channel_flags = 1;
}

timestamp = time_now().ts.tv_sec - 3600;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peer could send any timestamp. It doesn't have to be recent. Perhaps we should let the fuzzer decide the timestamp.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's how I designed the fuzzer initially:

timestamp = fromwire_u32(cursor, max);

and how the NODE_ANNOUNCEMENT message is crafted, but the fuzzer was unable to get past this check so I swapped that out for the current setup. Maybe something like

timestamp = time_now().ts.tv_sec - fromwire_u16(cursor, max);

would be better?

@Chand-ra
Copy link
Author

Chand-ra commented Aug 6, 2025

Have you been able to run this fuzz target? I tried and hit a bunch of crashes immediately.

Only reproduces when I use multiple workers, and seems multiple workers are trying to touch the same file (yikes).

Yeah, we discussed this over our meeting a while ago, and we decided upon creating a new file for each fuzz run. I tried to do that but the file name needs to be statically defined using #define GOSSIP_STORE_FILENAME otherwise the rename() here starts complaining.

The next best thing I could muster up was resetting the gossip_store file with unlink(GOSSIP_STORE_FILENAME) at the start of each run.

Chandra Pratap added 2 commits August 6, 2025 11:42
Changelon-None: `connectd_req()` in `gossipd/gossipd.c` is
responsible for handling gossip messages from peers handed to
it by `connectd`. Add a stateful test simulating its behaviour.
Add a minimal input set as a seed corpus for the newly introduced
test. This leads to discovery of interesting code paths faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants