Message ID | 20170207172446.4528-1-paolo.valente@linaro.org |
---|---|
Headers | show |
Series | WIP branch for bfq-mq | expand |
On Tue, 2017-02-07 at 18:24 +0100, Paolo Valente wrote:
> [1] https://github.com/Algodev-github/bfq-mq
Hello Paolo,
That branch includes two changes of the version suffix (EXTRAVERSION in Makefile).
Please don't do that but set CONFIG_LOCALVERSION in .config to add a suffix to
the kernel version string.
Thanks,
Bart.
On Tue, 2017-02-07 at 18:24 +0100, Paolo Valente wrote:
> 2) Enable people to test this first version bfq-mq.
Hello Paolo,
I installed this version of bfq-mq on a server that boots from a SATA
disk. That server boots fine with kernel v4.10-rc7 but not with this
tree. The first 30 seconds of the boot process seem to proceed normally
but after that time the messages on the console stop scrolling and
about another 30 seconds later the server reboots. I haven't found
anything useful in the system log. I configured the block layer as
follows:
$ grep '^C.*_MQ_' .config
CONFIG_BLK_MQ_PCI=y
CONFIG_MQ_IOSCHED_BFQ=y
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_NONE=y
CONFIG_DEFAULT_MQ_BFQ_MQ=y
CONFIG_DEFAULT_MQ_IOSCHED="bfq-mq"
CONFIG_SCSI_MQ_DEFAULT=y
CONFIG_DM_MQ_DEFAULT=y
Bart.
> Il giorno 10 feb 2017, alle ore 17:45, Bart Van Assche <Bart.VanAssche@sandisk.com> ha scritto: > > On Tue, 2017-02-07 at 18:24 +0100, Paolo Valente wrote: >> 2) Enable people to test this first version bfq-mq. > > Hello Paolo, > > I installed this version of bfq-mq on a server that boots from a SATA > disk. That server boots fine with kernel v4.10-rc7 but not with this > tree. The first 30 seconds of the boot process seem to proceed normally > but after that time the messages on the console stop scrolling and > about another 30 seconds later the server reboots. I haven't found > anything useful in the system log. I configured the block layer as > follows: > > $ grep '^C.*_MQ_' .config > CONFIG_BLK_MQ_PCI=y > CONFIG_MQ_IOSCHED_BFQ=y > CONFIG_MQ_IOSCHED_DEADLINE=y > CONFIG_MQ_IOSCHED_NONE=y > CONFIG_DEFAULT_MQ_BFQ_MQ=y > CONFIG_DEFAULT_MQ_IOSCHED="bfq-mq" > CONFIG_SCSI_MQ_DEFAULT=y > CONFIG_DM_MQ_DEFAULT=y > Could you reconfigure with none or mq-deadline as default, check whether the system boots, and, it it does, switch manually to bfq-mq, check what happens, and, in the likely case of a failure, try to get the oops? Thank you very much, Paolo > Bart. > Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: > > This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. >
> Il giorno 10 feb 2017, alle ore 17:08, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto: > > On Tue, 2017-02-07 at 18:24 +0100, Paolo Valente wrote: >> [1] https://github.com/Algodev-github/bfq-mq > > Hello Paolo, > > That branch includes two changes of the version suffix (EXTRAVERSION in Makefile). > Please don't do that but set CONFIG_LOCALVERSION in .config to add a suffix to > the kernel version string. > I know it, thanks. Unfortunately, many other irregular things you will probably find in that sort of private branch (as for that suffix, for some reason it was handy for me to have it tracked by git). Thanks, Paolo > Thanks, > > Bart.
On 02/10/2017 08:49 AM, Paolo Valente wrote: >> $ grep '^C.*_MQ_' .config >> CONFIG_BLK_MQ_PCI=y >> CONFIG_MQ_IOSCHED_BFQ=y >> CONFIG_MQ_IOSCHED_DEADLINE=y >> CONFIG_MQ_IOSCHED_NONE=y >> CONFIG_DEFAULT_MQ_BFQ_MQ=y >> CONFIG_DEFAULT_MQ_IOSCHED="bfq-mq" >> CONFIG_SCSI_MQ_DEFAULT=y >> CONFIG_DM_MQ_DEFAULT=y >> > > Could you reconfigure with none or mq-deadline as default, check > whether the system boots, and, it it does, switch manually to bfq-mq, > check what happens, and, in the likely case of a failure, try to get > the oops? Hello Paolo, I just finished performing that test with the following kernel config: $ grep '^C.*_MQ_' .config CONFIG_BLK_MQ_PCI=y CONFIG_MQ_IOSCHED_BFQ=y CONFIG_MQ_IOSCHED_DEADLINE=y CONFIG_MQ_IOSCHED_NONE=y CONFIG_DEFAULT_MQ_DEADLINE=y CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline" CONFIG_SCSI_MQ_DEFAULT=y CONFIG_DM_MQ_DEFAULT=y After the system came up I logged in, switched to the bfq-mq scheduler and ran several I/O tests against the boot disk. Sorry but nothing interesting appeared in the kernel log. Bart.
On Tue, 2017-02-07 at 18:24 +0100, Paolo Valente wrote:
> (lock assertions, BUG_ONs, ...).
Hello Paolo,
If you are using BUG_ON(), does that mean that you are not aware of Linus'
opinion about BUG_ON()? Please read https://lkml.org/lkml/2016/10/4/1.
Thanks,
Bart.
> Il giorno 10 feb 2017, alle ore 19:34, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto: > > On Tue, 2017-02-07 at 18:24 +0100, Paolo Valente wrote: >> (lock assertions, BUG_ONs, ...). > > Hello Paolo, > > If you are using BUG_ON(), does that mean that you are not aware of Linus' > opinion about BUG_ON()? Please read https://lkml.org/lkml/2016/10/4/1. > I am, thanks. But this is a testing version, overfull of assertions as a form of hysteric defensive programming. I will of course remove all halting assertions in the submission for merging. Thanks, Paolo > Thanks, > > Bart. > Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: > > This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. >
> Il giorno 10 feb 2017, alle ore 19:13, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto: > > On 02/10/2017 08:49 AM, Paolo Valente wrote: >>> $ grep '^C.*_MQ_' .config >>> CONFIG_BLK_MQ_PCI=y >>> CONFIG_MQ_IOSCHED_BFQ=y >>> CONFIG_MQ_IOSCHED_DEADLINE=y >>> CONFIG_MQ_IOSCHED_NONE=y >>> CONFIG_DEFAULT_MQ_BFQ_MQ=y >>> CONFIG_DEFAULT_MQ_IOSCHED="bfq-mq" >>> CONFIG_SCSI_MQ_DEFAULT=y >>> CONFIG_DM_MQ_DEFAULT=y >>> >> >> Could you reconfigure with none or mq-deadline as default, check >> whether the system boots, and, it it does, switch manually to bfq-mq, >> check what happens, and, in the likely case of a failure, try to get >> the oops? > > Hello Paolo, > > I just finished performing that test with the following kernel config: > $ grep '^C.*_MQ_' .config > CONFIG_BLK_MQ_PCI=y > CONFIG_MQ_IOSCHED_BFQ=y > CONFIG_MQ_IOSCHED_DEADLINE=y > CONFIG_MQ_IOSCHED_NONE=y > CONFIG_DEFAULT_MQ_DEADLINE=y > CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline" > CONFIG_SCSI_MQ_DEFAULT=y > CONFIG_DM_MQ_DEFAULT=y > > After the system came up I logged in, switched to the bfq-mq scheduler > and ran several I/O tests against the boot disk. Without any failure, right? Unfortunately, as you can imagine, no boot failure occurred on any of my test systems so far :( This version of bfq-mq can be configured to print all its activity in the kernel log, by just defining a macro. This will of course slow down the system so much to make it probably unusable, if bfq-mq is active from boot. Yet, the failure may still occur so early to make this approach useful to discover where bfq-mq gets stuck. As of now I have no better ideas. Any suggestion is welcome. Thanks, Paolo > Sorry but nothing > interesting appeared in the kernel log. > > Bart. > Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: > > This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. >
> Il giorno 10 feb 2017, alle ore 20:49, Paolo Valente <paolo.valente@linaro.org> ha scritto: > >> >> Il giorno 10 feb 2017, alle ore 19:13, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto: >> >> On 02/10/2017 08:49 AM, Paolo Valente wrote: >>>> $ grep '^C.*_MQ_' .config >>>> CONFIG_BLK_MQ_PCI=y >>>> CONFIG_MQ_IOSCHED_BFQ=y >>>> CONFIG_MQ_IOSCHED_DEADLINE=y >>>> CONFIG_MQ_IOSCHED_NONE=y >>>> CONFIG_DEFAULT_MQ_BFQ_MQ=y >>>> CONFIG_DEFAULT_MQ_IOSCHED="bfq-mq" >>>> CONFIG_SCSI_MQ_DEFAULT=y >>>> CONFIG_DM_MQ_DEFAULT=y >>>> >>> >>> Could you reconfigure with none or mq-deadline as default, check >>> whether the system boots, and, it it does, switch manually to bfq-mq, >>> check what happens, and, in the likely case of a failure, try to get >>> the oops? >> >> Hello Paolo, >> >> I just finished performing that test with the following kernel config: >> $ grep '^C.*_MQ_' .config >> CONFIG_BLK_MQ_PCI=y >> CONFIG_MQ_IOSCHED_BFQ=y >> CONFIG_MQ_IOSCHED_DEADLINE=y >> CONFIG_MQ_IOSCHED_NONE=y >> CONFIG_DEFAULT_MQ_DEADLINE=y >> CONFIG_DEFAULT_MQ_IOSCHED="mq-deadline" >> CONFIG_SCSI_MQ_DEFAULT=y >> CONFIG_DM_MQ_DEFAULT=y >> >> After the system came up I logged in, switched to the bfq-mq scheduler >> and ran several I/O tests against the boot disk. > > Without any failure, right? > > Unfortunately, as you can imagine, no boot failure occurred on > any of my test systems so far :( > > This version of bfq-mq can be configured to print all its activity in > the kernel log, by just defining a macro. This will of course slow > down the system so much to make it probably unusable, if bfq-mq is > active from boot. Yet, the failure may still occur so early to make > this approach useful to discover where bfq-mq gets stuck. As of now I > have no better ideas. Any suggestion is welcome. > Hi Bart, I have found a machine crashing at boot, yet not only when bfq-mq is chosen, but also when mq-deadline is chosen as the default scheduler. I have found and just reported the cause of the failure, together with a fix. Probably this is not the cause of your failure, but what do you think about trying this fix? BTW, I have rebased the branch [1] against the new commits in Jens for-4.11/next. Otherwise, if you have no news or suggestions, would you be willing to try my micro-logging proposal? Thanks, Paolo [1] https://github.com/Algodev-github/bfq-mq > Thanks, > Paolo > >> Sorry but nothing >> interesting appeared in the kernel log. >> >> Bart. >> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: >> >> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
> Il giorno 07 feb 2017, alle ore 18:24, Paolo Valente <paolo.valente@linaro.org> ha scritto: > > Hi, > > I have finally pushed here [1] the current WIP branch of bfq for > blk-mq, which I have tentatively named bfq-mq. > > This branch *IS NOT* meant for merging into mainline and contain code > that mau easily violate code style, and not only, in many > places. Commits implement the following main steps: > 1) Add the last version of bfq for blk > 2) Clone bfq source files into identical bfq-mq source files > 3) Modify bfq-mq files to get a working version of bfq for blk-mq > (cgroups support not yet functional) > > In my intentions, the main goals of this branch are: > > 1) Show, as soon as I could, the changes I made to let bfq-mq comply > with blk-mq-sched framework. I though this could be particularly > useful for Jens, being BFQ identical to CFQ in terms of hook > interfaces and io-context handling, and almost identical in terms > request-merging. > > 2) Enable people to test this first version bfq-mq. Code is purposely > overfull of log messages and invariant checks that halt the system on > failure (lock assertions, BUG_ONs, ...). > > To make it easier to revise commits, I'm sending the patches that > transform bfq into bfq-mq (last four patches in the branch [1]). They > work on two files, bfq-mq-iosched.c and bfq-mq.h, which, at the > beginning, are just copies of bfq-iosched.c and bfq.h. > Hi, this is just to inform that, as I just wrote to Bart, I have rebase the branch [1] against the current content of for-4.11/next. Jens, Omar, did you find the time to have a look at the main commits or to run some test? Thanks, Paolo [1] https://github.com/Algodev-github/bfq-mq > Thanks, > Paolo > > [1] https://github.com/Algodev-github/bfq-mq > > Paolo Valente (4): > blk-mq: pass bio to blk_mq_sched_get_rq_priv > Move thinktime from bic to bfqq > Embed bfq-ioc.c and add locking on request queue > Modify interface and operation to comply with blk-mq-sched > > block/bfq-cgroup.c | 4 - > block/bfq-mq-iosched.c | 852 +++++++++++++++++++++++++++++------------------ > block/bfq-mq.h | 65 ++-- > block/blk-mq-sched.c | 8 +- > block/blk-mq-sched.h | 5 +- > include/linux/elevator.h | 2 +- > 6 files changed, 567 insertions(+), 369 deletions(-) > > -- > 2.10.0
On Mon, 2017-02-13 at 22:07 +0100, Paolo Valente wrote: > but what do you think about trying this fix? Sorry but with ... the same server I used for the previous test still didn't boot up properly. A screenshot is available at https://goo.gl/photos/Za9QVGCNe2BJBwxVA. > Otherwise, if you have no news or suggestions, would you be willing to > try my micro-logging proposal https://github.com/Algodev-github/bfq-mq? Sorry but it's not clear to me what logging mechanism you are referring to and how to enable it? Are you perhaps referring to CONFIG_BFQ_REDIRECT_TO_CONSOLE? Thanks, Bart.
On 02/13/2017 02:09 PM, Paolo Valente wrote: > >> Il giorno 07 feb 2017, alle ore 18:24, Paolo Valente <paolo.valente@linaro.org> ha scritto: >> >> Hi, >> >> I have finally pushed here [1] the current WIP branch of bfq for >> blk-mq, which I have tentatively named bfq-mq. >> >> This branch *IS NOT* meant for merging into mainline and contain code >> that mau easily violate code style, and not only, in many >> places. Commits implement the following main steps: >> 1) Add the last version of bfq for blk >> 2) Clone bfq source files into identical bfq-mq source files >> 3) Modify bfq-mq files to get a working version of bfq for blk-mq >> (cgroups support not yet functional) >> >> In my intentions, the main goals of this branch are: >> >> 1) Show, as soon as I could, the changes I made to let bfq-mq comply >> with blk-mq-sched framework. I though this could be particularly >> useful for Jens, being BFQ identical to CFQ in terms of hook >> interfaces and io-context handling, and almost identical in terms >> request-merging. >> >> 2) Enable people to test this first version bfq-mq. Code is purposely >> overfull of log messages and invariant checks that halt the system on >> failure (lock assertions, BUG_ONs, ...). >> >> To make it easier to revise commits, I'm sending the patches that >> transform bfq into bfq-mq (last four patches in the branch [1]). They >> work on two files, bfq-mq-iosched.c and bfq-mq.h, which, at the >> beginning, are just copies of bfq-iosched.c and bfq.h. >> > > Hi, > this is just to inform that, as I just wrote to Bart, I have rebase > the branch [1] against the current content of for-4.11/next. > > Jens, Omar, did you find the time to have a look at the main commits > or to run some test? I only looked at the core change you proposed for passing in the bio as well, and Omar fixed up the icq exit part and I also applied that patch. I haven't look at any of the bfq-mq patches at all yet. Not sure what I can do with those, I don't think those are particularly useful to anyone but you. Might make more sense to post the conversion for review as a whole. -- Jens Axboe
On Mon, 2017-02-13 at 22:07 +0000, Bart Van Assche wrote: > On Mon, 2017-02-13 at 22:07 +0100, Paolo Valente wrote: > > but what do you think about trying this fix? > > Sorry but with ... the same server I used for the previous test still > didn't boot up properly. A screenshot is available at > https://goo.gl/photos/Za9QVGCNe2BJBwxVA. > > > Otherwise, if you have no news or suggestions, would you be willing to > > try my micro-logging proposal https://github.com/Algodev-github/bfq-mq? > > Sorry but it's not clear to me what logging mechanism you are referring > to and how to enable it? Are you perhaps referring to > CONFIG_BFQ_REDIRECT_TO_CONSOLE? Anyway, a second screenshot has been added to the same album after I had applied the following patch: blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \ Bart.diff --git a/block/Makefile b/block/Makefile index 1c04fe19e825..bf472ac82c08 100644 --- a/block/Makefile +++ b/block/Makefile @@ -2,6 +2,8 @@ # Makefile for the kernel block layer # +KBUILD_CFLAGS += -DCONFIG_BFQ_REDIRECT_TO_CONSOLE + obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \ blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
> Il giorno 13 feb 2017, alle ore 23:38, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto: > > On Mon, 2017-02-13 at 22:07 +0000, Bart Van Assche wrote: >> On Mon, 2017-02-13 at 22:07 +0100, Paolo Valente wrote: >>> but what do you think about trying this fix? >> >> Sorry but with ... the same server I used for the previous test still >> didn't boot up properly. A screenshot is available at >> https://goo.gl/photos/Za9QVGCNe2BJBwxVA. >> >>> Otherwise, if you have no news or suggestions, would you be willing to >>> try my micro-logging proposal https://github.com/Algodev-github/bfq-mq? >> >> Sorry but it's not clear to me what logging mechanism you are referring >> to and how to enable it? Are you perhaps referring to >> CONFIG_BFQ_REDIRECT_TO_CONSOLE? > > Anyway, a second screenshot has been added to the same album after I had > applied the following patch: > Hi Bart, thanks for this second attempt of yours. Although, unfortunately, not providing some clear indication of the exact cause of your hang (apart from a possible deadlock), your log helped me notice another bug. At any rate, as I have just written to Jens, I have pushed a new version of the branch [1] (not just added new commits, but also integrated some old commit with new changes, to make it more quickly). The branch now contains both a fix for the above bug, and, more importantly, a fix for the circular dependencies that were still lurking around. Could you please test it? Crossing my fingers, Paolo [1] https://github.com/Algodev-github/bfq-mq > diff --git a/block/Makefile b/block/Makefile > index 1c04fe19e825..bf472ac82c08 100644 > --- a/block/Makefile > +++ b/block/Makefile > @@ -2,6 +2,8 @@ > # Makefile for the kernel block layer > # > > +KBUILD_CFLAGS += -DCONFIG_BFQ_REDIRECT_TO_CONSOLE > + > > obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \ > blk-flush.o blk-settings.o blk-ioc.o blk-map.o \ > blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \ > > Bart. > Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: > > This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. >
On Wed, 2017-02-22 at 22:29 +0100, Paolo Valente wrote: > thanks for this second attempt of yours. Although, unfortunately, not > providing some clear indication of the exact cause of your hang (apart > from a possible deadlock), your log helped me notice another bug. > > At any rate, as I have just written to Jens, I have pushed a new > version of the branch [1] (not just added new commits, but also > integrated some old commit with new changes, to make it more quickly). > The branch now contains both a fix for the above bug, and, more > importantly, a fix for the circular dependencies that were still > lurking around. Could you please test it? Hello Paolo, I have good news: the same test system boots normally with the same kernel config I used during my previous tests and with the latest bfq-mq code (commit a965d19585c0) merged with kernel v4.10. Thanks, Bart.
> Il giorno 24 feb 2017, alle ore 19:44, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto: > > On Wed, 2017-02-22 at 22:29 +0100, Paolo Valente wrote: >> thanks for this second attempt of yours. Although, unfortunately, not >> providing some clear indication of the exact cause of your hang (apart >> from a possible deadlock), your log helped me notice another bug. >> >> At any rate, as I have just written to Jens, I have pushed a new >> version of the branch [1] (not just added new commits, but also >> integrated some old commit with new changes, to make it more quickly). >> The branch now contains both a fix for the above bug, and, more >> importantly, a fix for the circular dependencies that were still >> lurking around. Could you please test it? > > Hello Paolo, > Hi > I have good news: the same test system boots normally with the same > kernel config I used during my previous tests and with the latest > bfq-mq code (commit a965d19585c0) merged with kernel v4.10. > Whew, I was longing for you reply, thanks :) Should you want to have a look at it, I have just finished completing cgroups support too, as you have probably already read from my previous email. Thanks, Paolo > Thanks, > > Bart.