Message ID | 20231207063406.556770-2-vi@endrift.com |
---|---|
State | New |
Headers | show |
Series | Input: uinput - Multiple concurrency fixes in ff request handling | expand |
Hi Vicki, On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote: > Currently, uinput_request_submit will only fail if the request wait times out. > However, in other places this wait is interruptable, and in this specific > location it can lead to issues, such as causing system suspend to hang until > the request times out. Could you please explain how a sleeping process can cause suspend to hang? > Since the timeout is so long, this can cause the > appearance of a total system freeze. Making the wait interruptable resolves > this and possibly further issues. I think you are trying to find a justification too hard and it does not make sense, however I agree that allowing to kill the process issuing the request without waiting for the timeout to expire if the other side is stuck might be desirable. I think the best way to use wait_for_completion_killable_timeout() so that stray signals do not disturb userspace, but the processes can still be terminated. Thanks.
Hi Dmitry, On 12/8/23 11:32, Dmitry Torokhov wrote: > Hi Vicki, > > On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote: >> Currently, uinput_request_submit will only fail if the request wait times out. >> However, in other places this wait is interruptable, and in this specific >> location it can lead to issues, such as causing system suspend to hang until >> the request times out. > > Could you please explain how a sleeping process can cause suspend to > hang? While I'm not 100% sure how it happens, given I found this by reproducing it before I came up with a theory for why it happened, my guess is that as it's trying to suspend all of userspace programs, it suspends the process that owns the uinput handle, so it can't continue to service requests, while the other process hangs in the uninterruptable call, blocking suspend for 30 seconds until the call times out. > >> Since the timeout is so long, this can cause the >> appearance of a total system freeze. Making the wait interruptable resolves >> this and possibly further issues. > > I think you are trying to find a justification too hard and it does not > make sense, however I agree that allowing to kill the process issuing > the request without waiting for the timeout to expire if the other side > is stuck might be desirable. This isn't reaching. As I said above, I discovered the patched line of code *after* observing suspend hanging for 30 seconds while trying to reproduce another bug. I wrote this patch, retested, and found that it now suspended immediately, leading to a visible -ERESTARTSYS in strace on coming back from suspend. I can post the reproduction case somewhere, but the test program is only the evdev client end, with the uinput side being Steam, which I don't have source code for. > > I think the best way to use wait_for_completion_killable_timeout() > so that stray signals do not disturb userspace, but the processes can > still be terminated. There's already a mutex_lock_interruptable in uinput_request_send that could cause this to fall back to userspace under similar circumstances. The only difference I can find, which is admittedly a bug in this patch now that I look at it again, is that uinput_dev_event would get called twice, leading to the request getting duplicated. If there's a better way to handle the suspend case let me know, but this is not a hypothetical issue. > > Thanks. > Vicki
Hi Dmitry On 12/8/23 19:24, Vicki Pfau wrote: > Hi Dmitry, > > On 12/8/23 11:32, Dmitry Torokhov wrote: >> Hi Vicki, >> >> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote: >>> Currently, uinput_request_submit will only fail if the request wait times out. >>> However, in other places this wait is interruptable, and in this specific >>> location it can lead to issues, such as causing system suspend to hang until >>> the request times out. >> >> Could you please explain how a sleeping process can cause suspend to >> hang? > > While I'm not 100% sure how it happens, given I found this by reproducing it before I came up with a theory for why it happened, my guess is that as it's trying to suspend all of userspace programs, it suspends the process that owns the uinput handle, so it can't continue to service requests, while the other process hangs in the uninterruptable call, blocking suspend for 30 seconds until the call times out. > >> >>> Since the timeout is so long, this can cause the >>> appearance of a total system freeze. Making the wait interruptable resolves >>> this and possibly further issues. >> >> I think you are trying to find a justification too hard and it does not >> make sense, however I agree that allowing to kill the process issuing >> the request without waiting for the timeout to expire if the other side >> is stuck might be desirable. > > This isn't reaching. As I said above, I discovered the patched line of code *after* observing suspend hanging for 30 seconds while trying to reproduce another bug. I wrote this patch, retested, and found that it now suspended immediately, leading to a visible -ERESTARTSYS in strace on coming back from suspend. > > I can post the reproduction case somewhere, but the test program is only the evdev client end, with the uinput side being Steam, which I don't have source code for. > >> >> I think the best way to use wait_for_completion_killable_timeout() >> so that stray signals do not disturb userspace, but the processes can >> still be terminated. > > There's already a mutex_lock_interruptable in uinput_request_send that could cause this to fall back to userspace under similar circumstances. The only difference I can find, which is admittedly a bug in this patch now that I look at it again, is that uinput_dev_event would get called twice, leading to the request getting duplicated. After further investigation, it seems this would still be the case even if the request times out--an invalid request would get left in the buffer, which means that while this is a new way to trigger the issue, it's not actually a new issue. It seems to me that this driver could use a lot of love to get it into better shape, which I could work on, but I'm not actually sure where to begin. Especially if we don't want to break ABI. > > If there's a better way to handle the suspend case let me know, but this is not a hypothetical issue. > >> >> Thanks. >> > > Vicki Vici
Hello Dmitry, It's been almost a month since I replied addressing your concerns on this patch. Can you please comment? On 12/14/23 19:04, Vicki Pfau wrote: > Hi Dmitry > > On 12/8/23 19:24, Vicki Pfau wrote: >> Hi Dmitry, >> >> On 12/8/23 11:32, Dmitry Torokhov wrote: >>> Hi Vicki, >>> >>> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote: >>>> Currently, uinput_request_submit will only fail if the request wait times out. >>>> However, in other places this wait is interruptable, and in this specific >>>> location it can lead to issues, such as causing system suspend to hang until >>>> the request times out. >>> >>> Could you please explain how a sleeping process can cause suspend to >>> hang? >> >> While I'm not 100% sure how it happens, given I found this by reproducing it before I came up with a theory for why it happened, my guess is that as it's trying to suspend all of userspace programs, it suspends the process that owns the uinput handle, so it can't continue to service requests, while the other process hangs in the uninterruptable call, blocking suspend for 30 seconds until the call times out. >> >>> >>>> Since the timeout is so long, this can cause the >>>> appearance of a total system freeze. Making the wait interruptable resolves >>>> this and possibly further issues. >>> >>> I think you are trying to find a justification too hard and it does not >>> make sense, however I agree that allowing to kill the process issuing >>> the request without waiting for the timeout to expire if the other side >>> is stuck might be desirable. >> >> This isn't reaching. As I said above, I discovered the patched line of code *after* observing suspend hanging for 30 seconds while trying to reproduce another bug. I wrote this patch, retested, and found that it now suspended immediately, leading to a visible -ERESTARTSYS in strace on coming back from suspend. >> >> I can post the reproduction case somewhere, but the test program is only the evdev client end, with the uinput side being Steam, which I don't have source code for. >> >>> >>> I think the best way to use wait_for_completion_killable_timeout() >>> so that stray signals do not disturb userspace, but the processes can >>> still be terminated. >> >> There's already a mutex_lock_interruptable in uinput_request_send that could cause this to fall back to userspace under similar circumstances. The only difference I can find, which is admittedly a bug in this patch now that I look at it again, is that uinput_dev_event would get called twice, leading to the request getting duplicated. > > After further investigation, it seems this would still be the case even if the request times out--an invalid request would get left in the buffer, which means that while this is a new way to trigger the issue, it's not actually a new issue. > > It seems to me that this driver could use a lot of love to get it into better shape, which I could work on, but I'm not actually sure where to begin. Especially if we don't want to break ABI. > >> >> If there's a better way to handle the suspend case let me know, but this is not a hypothetical issue. >> >>> >>> Thanks. >>> >> >> Vicki > > Vici Vicki
Hi Vicki, On Thu, Dec 14, 2023 at 07:04:09PM -0800, Vicki Pfau wrote: > Hi Dmitry > > On 12/8/23 19:24, Vicki Pfau wrote: > > Hi Dmitry, > > > > On 12/8/23 11:32, Dmitry Torokhov wrote: > >> Hi Vicki, > >> > >> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote: > >>> Currently, uinput_request_submit will only fail if the request wait times out. > >>> However, in other places this wait is interruptable, and in this specific > >>> location it can lead to issues, such as causing system suspend to hang until > >>> the request times out. > >> > >> Could you please explain how a sleeping process can cause suspend to > >> hang? > > > > While I'm not 100% sure how it happens, given I found this by > > reproducing it before I came up with a theory for why it happened, > > my guess is that as it's trying to suspend all of userspace > > programs, it suspends the process that owns the uinput handle, so it > > can't continue to service requests, while the other process hangs in > > the uninterruptable call, blocking suspend for 30 seconds until the > > call times out. > > > >> > >>> Since the timeout is so long, this can cause the > >>> appearance of a total system freeze. Making the wait interruptable resolves > >>> this and possibly further issues. > >> > >> I think you are trying to find a justification too hard and it does not > >> make sense, however I agree that allowing to kill the process issuing > >> the request without waiting for the timeout to expire if the other side > >> is stuck might be desirable. > > > > This isn't reaching. As I said above, I discovered the patched line > > of code *after* observing suspend hanging for 30 seconds while > > trying to reproduce another bug. I wrote this patch, retested, and > > found that it now suspended immediately, leading to a visible > > -ERESTARTSYS in strace on coming back from suspend. > > I must apologize, you indeed weren't reaching. As far as I can see, putting tasks into the freezer (which happens during system suspend) is done via delivering a fake signal to the task. So the task indeed needs to be in an interruptible state, uninterruptible tasks result in system failing to suspend. > > I can post the reproduction case somewhere, but the test program is > > only the evdev client end, with the uinput side being Steam, which I > > don't have source code for. > > > >> > >> I think the best way to use wait_for_completion_killable_timeout() > >> so that stray signals do not disturb userspace, but the processes can > >> still be terminated. > > > > There's already a mutex_lock_interruptable in uinput_request_send > > that could cause this to fall back to userspace under similar > > circumstances. The only difference I can find, which is admittedly a > > bug in this patch now that I look at it again, is that > > uinput_dev_event would get called twice, leading to the request > > getting duplicated. > > After further investigation, it seems this would still be the case > even if the request times out--an invalid request would get left in > the buffer, which means that while this is a new way to trigger the > issue, it's not actually a new issue. No, I disagree that it is the same issue. The timeout condition is pretty much fatal, I expect the caller to exit or stop using the device if request times out (because either the real device is not responding, or userspace is not responding, and there is no indication that they will start responding any time soon). That is why the timeout value is so generous (30 seconds). In this case we definitely not expect the request to be re-submitted, either automatically, or explicitly by userspace. If we make waiting on the request interruptible we may get interrupted by a stray signal, and I do not know how both producer (the process issuing the uinput request) and consumer of the request, will react to essentially duplicate requests being sent. I believe we can split this into 2 separate issues: 1. The fact that it is not possible terminate the producer process while it is waiting for request to be handled (for 30 seconds). I think this can be safely resolved by switching to wait_for_completion_killable_timeout(). This will allow fatal signals to break the wait, and for the process to exit. 2. Producer task failing to enter refrigerator and breaking suspend. I wonder if the best way to handle that is for uinput to create and register wakeup source, and then use __pm_stay_awake() and __pm_relax() to indicate to the rest of the system that suspend is blocked. I believe userspace should be able to handle this and repeat suspend attempt when the condition clears... Rafael, do you have any suggestions here? And I wonder, could we make killable tasks also enter refrigerator? Also, now that I think about it more, we should not use slot number for request->id, as I expect the common situation is to have 1 outstanding request, so all requests have id 0. Instead we should have a counter and increase it. This way timeout handling will be more robust, we will not mistake delayed response to the previous request as response to the current one. Thanks.
diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c index d98212d55108..0330e72798db 100644 --- a/drivers/input/misc/uinput.c +++ b/drivers/input/misc/uinput.c @@ -183,7 +183,11 @@ static int uinput_request_submit(struct uinput_device *udev, if (retval) goto out; - if (!wait_for_completion_timeout(&request->done, 30 * HZ)) { + retval = wait_for_completion_interruptible_timeout(&request->done, 30 * HZ); + if (retval == -ERESTARTSYS) + goto out; + + if (!retval) { retval = -ETIMEDOUT; goto out; }
Currently, uinput_request_submit will only fail if the request wait times out. However, in other places this wait is interruptable, and in this specific location it can lead to issues, such as causing system suspend to hang until the request times out. Since the timeout is so long, this can cause the appearance of a total system freeze. Making the wait interruptable resolves this and possibly further issues. Signed-off-by: Vicki Pfau <vi@endrift.com> --- drivers/input/misc/uinput.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)