Message ID | 20170124101757.19676-1-linus.walleij@linaro.org |
---|---|
Headers | show |
Series | mmc: block: command issue cleanups | expand |
On 24 January 2017 at 11:17, Linus Walleij <linus.walleij@linaro.org> wrote: > The function mmc_blk_issue_rw_rq() is hopelessly convoluted and > need to be refactored to it can be understood by humans. > > In the process I found some weird magic return values passed > around for no good reason. > > Things are more readable after this. > > This work is done towards the goal of breaking the function in > two parts: one just submitting the requests and one checking the > result and possibly resubmitting the command on error, so we > can make the usual path (non-errorpath) smooth and quick, and > be called directly when the driver completes a request. > > That in turn is a prerequisite for proper blk-mq integration > with the MMC/SD stack. > > All that comes later. > > Linus Walleij (6): > mmc: block: break out mmc_blk_rw_cmd_abort() > mmc: block: break out mmc_blk_rw_start_new() > mmc: block: do not assign mq_rq when aborting command > mmc: block: inline command abortions > mmc: block: introduce new_areq and old_areq > mmc: block: stop passing around pointless return values > > drivers/mmc/core/block.c | 108 ++++++++++++++++++++++++++--------------------- > drivers/mmc/core/block.h | 2 +- > 2 files changed, 60 insertions(+), 50 deletions(-) > > -- > 2.9.3 > Thanks, applied for next! Kind regards Uffe -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
+Maxime On 26 January 2017 at 09:07, Ulf Hansson <ulf.hansson@linaro.org> wrote: > On 24 January 2017 at 11:17, Linus Walleij <linus.walleij@linaro.org> wrote: >> The function mmc_blk_issue_rw_rq() is hopelessly convoluted and >> need to be refactored to it can be understood by humans. >> >> In the process I found some weird magic return values passed >> around for no good reason. >> >> Things are more readable after this. >> >> This work is done towards the goal of breaking the function in >> two parts: one just submitting the requests and one checking the >> result and possibly resubmitting the command on error, so we >> can make the usual path (non-errorpath) smooth and quick, and >> be called directly when the driver completes a request. >> >> That in turn is a prerequisite for proper blk-mq integration >> with the MMC/SD stack. >> >> All that comes later. >> >> Linus Walleij (6): >> mmc: block: break out mmc_blk_rw_cmd_abort() >> mmc: block: break out mmc_blk_rw_start_new() >> mmc: block: do not assign mq_rq when aborting command >> mmc: block: inline command abortions >> mmc: block: introduce new_areq and old_areq >> mmc: block: stop passing around pointless return values >> >> drivers/mmc/core/block.c | 108 ++++++++++++++++++++++++++--------------------- >> drivers/mmc/core/block.h | 2 +- >> 2 files changed, 60 insertions(+), 50 deletions(-) >> >> -- >> 2.9.3 >> > > Thanks, applied for next! Linus, Seems like this series may have issues. I have looked at boot reports from kernelci, and particular the reports for https://kernelci.org/boot/sun7i-a20-bananapi/job/ulfh/ are interesting. Apparently, this board has an SD card attached. There have been errors reported in the log for a while when doing data transfers, although none of these errors have triggered the kernelci to report a boot error. However, I suspect that some of the changes in this series make it worse. Perhaps because of a changed error handling the mmc block layer!? Particular, look at the difference between these [1] boot logs, it might give you some hints. I have also added Maxime to this thread, perhaps he can help out with the sunxi mmc driver. Kind regards Uffe [1] Successful boot - sun7i-a20-bananapi: https://storage.kernelci.org/ulfh/v4.10-rc5-86-g2ed14a2aac8a/arm-sunxi_defconfig/lab-baylibre-seattle/boot-sun7i-a20-bananapi.txt Failing boot - sun7i-a20-bananapi: https://storage.kernelci.org/ulfh/v4.10-rc5-97-ge1defa2da4d3/arm-sunxi_defconfig/lab-baylibre-seattle/boot-sun7i-a20-bananapi.txt Kind regards Uffe -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ulf, On Fri, Jan 27, 2017 at 08:58:16AM +0100, Ulf Hansson wrote: > On 26 January 2017 at 09:07, Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On 24 January 2017 at 11:17, Linus Walleij <linus.walleij@linaro.org> wrote: > >> The function mmc_blk_issue_rw_rq() is hopelessly convoluted and > >> need to be refactored to it can be understood by humans. > >> > >> In the process I found some weird magic return values passed > >> around for no good reason. > >> > >> Things are more readable after this. > >> > >> This work is done towards the goal of breaking the function in > >> two parts: one just submitting the requests and one checking the > >> result and possibly resubmitting the command on error, so we > >> can make the usual path (non-errorpath) smooth and quick, and > >> be called directly when the driver completes a request. > >> > >> That in turn is a prerequisite for proper blk-mq integration > >> with the MMC/SD stack. > >> > >> All that comes later. > >> > >> Linus Walleij (6): > >> mmc: block: break out mmc_blk_rw_cmd_abort() > >> mmc: block: break out mmc_blk_rw_start_new() > >> mmc: block: do not assign mq_rq when aborting command > >> mmc: block: inline command abortions > >> mmc: block: introduce new_areq and old_areq > >> mmc: block: stop passing around pointless return values > >> > >> drivers/mmc/core/block.c | 108 ++++++++++++++++++++++++++--------------------- > >> drivers/mmc/core/block.h | 2 +- > >> 2 files changed, 60 insertions(+), 50 deletions(-) > >> > >> -- > >> 2.9.3 > >> > > > > Thanks, applied for next! > > Linus, > > Seems like this series may have issues. I have looked at boot reports > from kernelci, and particular the reports for > https://kernelci.org/boot/sun7i-a20-bananapi/job/ulfh/ are > interesting. > > Apparently, this board has an SD card attached. There have been errors > reported in the log for a while when doing data transfers, although > none of these errors have triggered the kernelci to report a boot > error. > > However, I suspect that some of the changes in this series make it > worse. Perhaps because of a changed error handling the mmc block > layer!? > > Particular, look at the difference between these [1] boot logs, it > might give you some hints. I have also added Maxime to this thread, > perhaps he can help out with the sunxi mmc driver. We also had a patch on the pinctrl side in 4.10, whose side effects would be to disable the pull ups on some boards that have not been configuring their pins properly (while the previous behaviour was to keep what the bootloader set). That board seems to fall in that category. That behaviour has been reverted, and Linus merged it already, so it might be entirely unrelated to that serie. Maxime -- Maxime Ripard, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com
On Fri, Jan 27, 2017 at 8:58 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >>> Linus Walleij (6): >>> mmc: block: break out mmc_blk_rw_cmd_abort() >>> mmc: block: break out mmc_blk_rw_start_new() >>> mmc: block: do not assign mq_rq when aborting command >>> mmc: block: inline command abortions >>> mmc: block: introduce new_areq and old_areq >>> mmc: block: stop passing around pointless return values (...) > Seems like this series may have issues. I have looked at boot reports > from kernelci, and particular the reports for > https://kernelci.org/boot/sun7i-a20-bananapi/job/ulfh/ are > interesting. > > Apparently, this board has an SD card attached. There have been errors > reported in the log for a while when doing data transfers, although > none of these errors have triggered the kernelci to report a boot > error. Damned I wish I could be hands-on with this system and bisect it. It's very helpful with shaky systems really. Sadly the errors are hard to reproduce :( The old errors look like so: [ 6.099124] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD SBE !! [ 6.105211] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 6.122394] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 6.665013] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 6.671011] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 6.677812] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 7.123727] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 7.129692] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 7.136489] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 7.143349] blk_update_request: I/O error, dev mmcblk0, sector 124800 [ 7.493691] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 7.499651] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 7.506229] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 7.943641] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 7.949595] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 7.956222] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 7.963010] blk_update_request: I/O error, dev mmcblk0, sector 124800 [ 7.969499] Buffer I/O error on dev mmcblk0p1, logical block 15344, async page read [ 8.321411] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 8.327378] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 8.334018] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 8.763338] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 8.769276] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 8.775960] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 8.782750] blk_update_request: I/O error, dev mmcblk0, sector 124928 [ 9.125126] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 9.131084] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 9.137624] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 9.144445] blk_update_request: I/O error, dev mmcblk0, sector 124928 [ 9.150881] Buffer I/O error on dev mmcblk0p2, logical block 0, async page read So something was causing errors on the read command. > However, I suspect that some of the changes in this series make it > worse. Perhaps because of a changed error handling the mmc block > layer!? > > Particular, look at the difference between these [1] boot logs, it > might give you some hints. I have also added Maxime to this thread, > perhaps he can help out with the sunxi mmc driver. The new errors look like so: [ 6.099171] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD SBE !! [ 6.105259] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 6.127415] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 6.666628] sunxi-mmc 1c0f000.mmc: smc 0 err, cmd 18, RD DTO !! [ 6.672626] sunxi-mmc 1c0f000.mmc: data error, sending stop command [ 6.679420] mmcblk0: timed out sending r/w cmd command, card status 0x900 [ 7.503256] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout [ 8.623257] sunxi-mmc 1c0f000.mmc: fatal err update clk timeout This "fatal err update clk timeout" is new and is coming from the driver. [ 8.630370] mmc0: tried to reset card, got error -5 [ 8.635309] blk_update_request: I/O error, dev mmcblk0, sector 124800 [ 8.642366] mmcblk0: error -5 sending status command, retrying [ 8.648279] mmcblk0: error -5 sending status command, retrying [ 8.654132] mmcblk0: error -5 sending status command, aborting [ 8.659961] blk_update_request: I/O error, dev mmcblk0, sector 7167872 [ 8.667201] mmcblk0: error -5 sending status command, retrying [ 8.673031] mmcblk0: error -5 sending status command, retrying [ 8.678916] mmcblk0: error -5 sending status command, aborting [ 8.684758] blk_update_request: I/O error, dev mmcblk0, sector 124800 [ 8.691195] Buffer I/O error on dev mmcblk0p1, logical block 15344, async page read [ 8.700492] mmcblk0: error -5 sending status command, retrying [ 8.706405] mmcblk0: error -5 sending status command, retrying [ 8.712232] mmcblk0: error -5 sending status command, aborting [ 8.718103] blk_update_request: I/O error, dev mmcblk0, sector 7167872 [ 8.724643] Buffer I/O error on dev mmcblk0p2, logical block 880368, async page read [ 8.732403] Unable to handle kernel NULL pointer dereference at virtual address 00000028 [ 8.740501] pgd = c0004000 [ 8.743204] [00000028] *pgd=00000000 [ 8.746794] Internal error: Oops: 17 [#1] SMP ARM [ 8.751491] Modules linked in: [ 8.754547] CPU: 0 PID: 65 Comm: mmcqd/0 Not tainted 4.10.0-rc5-00097-ge1defa2da4d3 #1 [ 8.762450] Hardware name: Allwinner sun7i (A20) Family [ 8.767666] task: ee9b0fc0 task.stack: ee9d8000 [ 8.772201] PC is at mmc_blk_rw_rq_prep+0x20/0x39c [ 8.776985] LR is at mmc_blk_issue_rw_rq+0x124/0x370 And now it is crashing at mmc_blk_rw_rq_prep() + 0x20 so I suspect it is one of these: struct mmc_blk_request *brq = &mqrq->brq; struct request *req = mqrq->req; struct mmc_blk_data *md = mq->blkdata; I guess the first: mqrq is NULL. So I suspect the oneliner in commit 0ebd6e72b5ee2592625d5ae567a729345dfe07b6 "mmc: block: do not assign mq_rq when aborting command" That is not easily reverted so I will sent a RTF patch to restore the behaviour. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 30, 2017 at 2:05 PM, Linus Walleij <linus.walleij@linaro.org> wrote: > On Fri, Jan 27, 2017 at 8:58 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > And now it is crashing at mmc_blk_rw_rq_prep() + 0x20 so I suspect it is one of > these: > > struct mmc_blk_request *brq = &mqrq->brq; > struct request *req = mqrq->req; > struct mmc_blk_data *md = mq->blkdata; > > I guess the first: mqrq is NULL. > > So I suspect the oneliner in commit 0ebd6e72b5ee2592625d5ae567a729345dfe07b6 > "mmc: block: do not assign mq_rq when aborting command" Bah looking closer at that it just doesn't make any logical sense at all. I now suspect the NULL check removed in mmc_blk_rw_start_new() to be behind this, so sent a patch to restore it. No idea how to test it or if it's the real problem. I had to rebase my further patches on top too, so if you merge this I need to resend the next series. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html