-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Hello,
When using the following snippet in my code (as part of a Wiegand reader program) on a Raspberry Pi 0:
let monitor_0: Next<AsyncLineEventHandle> = events0.next();
let monitor_1: Next<AsyncLineEventHandle> = events1.next();
pin!(monitor_0, monitor_1);
select! {
res = monitor_0 => { data = data << 1; },
res = monitor_1 => { data = (data << 1) | 1 as u64; },
}
it seems like the code ends up in a Deadlock in the select! macro after running this snippet for 128 times, after which the program uses 100% CPU but neither of the Next
s ever completes.
events0
and events1
are AsyncLineEventHandle
s that come from Line
s that come from the same Chip
.
Introducing a millisecond delay at the very top increases the amount of bytes that can be read to 256.
Introducing a larger delay seems to remove the deadlock all together, but the ordering of the events is lost causing the data to become garbage as the order of the events determines the final data output.
I'm not certain if this is a gpio-cdev problem, a kernel problem, or if I'm simply doing this incorrectly.
Any feedback is highly appreciated.
Activity
ryankurte commentedon Mar 6, 2021
huh, interesting failure! i have a bit of async experience but, the internals of this are unfamiliar to me... @mgottschlag no obligations but, any ideas?
absolutely grasping at straws but:
gdb
? it can be pretty rough where async is involved but will usually highlight the problem area (imestrace
andcargo flamegraph
can also be useful for getting a grasp of these things, though raspbian is missing a bunch of kernel tools so this may not be workable for you)datdenkikniet commentedon Mar 7, 2021
Thanks for your response :)
I've tried running it with
strace
, but then I get the same behaviour as with a longer-than-a-millisecond delay where deadlock seems to disappear but the ordering gets all messed up (presumably because of the overhead thatstrace
induces). I'll be sure to try it with GDB, and the other two experiments you've mentioned.A detail I forgot to mention: I'm running this in a Buildroot environment on kernel 4.19.97, in case anyone is trying to reproduce this issue. I've not quite figured out how to build a newer kernel this way, so I hope that this version isn't missing some important fixes to the gpio device driver that could affect this issue.
datdenkikniet commentedon Mar 7, 2021
Additionally: I'd completely missed it while trying to look for similar issues, but it seems that I'm effectively trying to accomplish what #18 is describing. Sadly no real solution to the problem/how to do it either, so this is just what I came up with.
datdenkikniet commentedon Mar 8, 2021
When running this with
gdb
, it seems like the main thread doesn't actually start/get to thefutex_wait
syscall that it goes to when in "working" condition. The 2nd thread does successfully go to theepoll_wait()
that it does under normal execution too. Alternatively, thefutex_wait
is completing without actually firing an event/something that tokio catches in theselect!
. (I tested this both withstd
's select +fuse
andtokio
's select, but same result).Seems like this futex_wait is actually due to the
block_on
that I'm using to wait for the reading of the bits to finish, so this information might not be all that relevant.When switching to a single thread of execution (using
tokio::main(max_threads = 1, core_threads = 1)
ortokio::main(basic_scheduler
) the program stops working all together. The program gets to aforementionedfutex_wait
but that's never completed.I'll also be trying this with the gpiomon tool.
datdenkikniet commentedon Mar 9, 2021
I've tried this with
gpiomon
now (i.e.gpiomon -lf gpiochip0 20 21
), and with that it seems to work just fine. I get the correct output consistently, and it doesn't hang. I'll see if I can somehow figure out what's going on.datdenkikniet commentedon Mar 9, 2021
After some more investigation, it seems that it might be an issue in Tokio itself or the way Streams are implemented, somehow. I made my own fork and updated it to Tokio 1 (master...datdenkikniet:tokio-update), but it seems to exhibit the exact same behaviour. What happens is: the
poll_next
function implemented for AsyncLineEventHandle is called many times, but seemingly never completes (none of the match arms are matched, nor is an error produced by theready!
call). In the background it must complete somehow, since the memory usage of the program does not change, but somewhere something is not going right.Removing the
select!
in favour of anawait
on a single one of the events doesn't change anything, either. The amount of events that can be read is still the same.I'm unsure if this issue still belongs in gpio-cdev (possibly an incorrect implementation of poll_event?) or in Tokio/Futures. Any guidance would be appreciated.
mgottschlag commentedon Mar 9, 2021
I am sorry I cannot contribute - I actually do not have much low-level tokio experience.
Anyways, so the summary right now is:
select!
? I had problems withselect!
in the past - the macro has non-obvious failure modes, requirements to the futures which trigger those failure modes when not met, and I kept confusing the two different macris fromfutures
andtokio
, so it would be nice to rule outselect!
as a reason for the problems.await
on one line instead ofselect!
on two does not fix the problem? This is weird, the code used to work fairly reliably for me for such use cases. I usetokio::select!
with a single line without problems.datdenkikniet commentedon Mar 9, 2021
Whoops, pressed the wrong button.
Thanks for the feedback.
Yes, your summary seems to reflect what I've found so far. AFAIK I'm using
select!
correctly, and I'm definitely using Tokio'sselect!
.I'll re-try and remove all of the references to the 2nd line and re-try, maybe it starts working again in that case.Yeah, even when doing everything with a single line (i.e. only getting 1 line from the Chip, getting an async handle to its events, etc.) the issue still occurs.datdenkikniet commentedon Mar 11, 2021
So, I wasn't entirely correct:
What happens is that the
file.poll_read()
in Tokio 1.2 (orpoll_read_ready()
in the older verison) is simply never ready. I'd missed that theready!
macro actually returnsPoll::Pending
if the result of thePoll
that is passed as an argument is alsoPoll::Pending
.Very unsure why it happens.