Message ID | cover.1563782844.git.baolin.wang@linaro.org |
---|---|
Headers | show |
Series | Add MMC packed function | expand |
Hi, On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: > > Hi All, > > Now some SD/MMC controllers can support packed command or packed request, > that means it can package multiple requests to host controller to be handled > at one time, which can improve the I/O performence. Thus this patchset is > used to add the MMC packed function to support packed request or packed > command. > > In this patch set, I implemented the SD host ADMA3 transfer mode to support > packed request. The ADMA3 transfer mode can process a multi-block data transfer > by using a pair of command descriptor and ADMA2 descriptor. In future we can > easily expand the MMC packed function to support packed command. > > Below are some comparison data between packed request and non-packed request > with fio tool. The fio command I used is like below with changing the > '--rw' parameter and enabling the direct IO flag to measure the actual hardware > transfer speed. > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read > > My eMMC card working at HS400 Enhanced strobe mode: > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > > 1. Non-packed request > I tested 3 times for each case and output a average speed. > > 1) Sequential read: > Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s > Average speed: 28.7MiB/s > > 2) Random read: > Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s > Average speed: 14.3MiB/s > > 3) Sequential write: > Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s > Average speed: 24.7MiB/s > > 4) Random write: > Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s > Average speed: 19.2MiB/s > > 2. Packed request > In packed request mode, I set the host controller can package maximum 10 > requests at one time (Actually I can increase the package number), and I > enabled read/write packed request mode. Also I tested 3 times for each > case and output a average speed. > > 1) Sequential read: > Speed: 165MiB/s, 167MiB/s, 164MiB/s > Average speed: 165.3MiB/s > > 2) Random read: > Speed: 147MiB/s, 141MiB/s, 144MiB/s > Average speed: 144MiB/s > > 3) Sequential write: > Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s > Average speed: 89MiB/s > > 4) Random write: > Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s > Average speed: 90.4MiB/s > > Form above data, we can see the packed request can improve the performance greatly. > Any comments are welcome. Thanks a lot. Any comments for this patch set? Thanks. > > Baolin Wang (7): > blk-mq: Export blk_mq_hctx_has_pending() function > mmc: core: Add MMC packed request function > mmc: host: sdhci: Introduce ADMA3 transfer mode > mmc: host: sdhci: Factor out the command configuration > mmc: host: sdhci: Remove redundant sg_count member of struct > sdhci_host > mmc: host: sdhci: Add MMC packed request support > mmc: host: sdhci-sprd: Add MMC packed request support > > block/blk-mq.c | 3 +- > drivers/mmc/core/Kconfig | 2 + > drivers/mmc/core/Makefile | 1 + > drivers/mmc/core/block.c | 71 +++++- > drivers/mmc/core/block.h | 3 +- > drivers/mmc/core/core.c | 51 ++++ > drivers/mmc/core/core.h | 3 + > drivers/mmc/core/packed.c | 478 ++++++++++++++++++++++++++++++++++++++ > drivers/mmc/core/queue.c | 28 ++- > drivers/mmc/host/Kconfig | 1 + > drivers/mmc/host/sdhci-sprd.c | 22 +- > drivers/mmc/host/sdhci.c | 513 +++++++++++++++++++++++++++++++++++------ > drivers/mmc/host/sdhci.h | 59 ++++- > include/linux/blk-mq.h | 1 + > include/linux/mmc/core.h | 1 + > include/linux/mmc/host.h | 3 + > include/linux/mmc/packed.h | 123 ++++++++++ > 17 files changed, 1286 insertions(+), 77 deletions(-) > create mode 100644 drivers/mmc/core/packed.c > create mode 100644 include/linux/mmc/packed.h > > -- > 1.7.9.5 > -- Baolin Wang Best Regards
On 12/08/19 12:44 PM, Baolin Wang wrote: > Hi Adrian, > > On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote: >> >> On 12/08/19 8:20 AM, Baolin Wang wrote: >>> Hi, >>> >>> On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: >>>> >>>> Hi All, >>>> >>>> Now some SD/MMC controllers can support packed command or packed request, >>>> that means it can package multiple requests to host controller to be handled >>>> at one time, which can improve the I/O performence. Thus this patchset is >>>> used to add the MMC packed function to support packed request or packed >>>> command. >>>> >>>> In this patch set, I implemented the SD host ADMA3 transfer mode to support >>>> packed request. The ADMA3 transfer mode can process a multi-block data transfer >>>> by using a pair of command descriptor and ADMA2 descriptor. In future we can >>>> easily expand the MMC packed function to support packed command. >>>> >>>> Below are some comparison data between packed request and non-packed request >>>> with fio tool. The fio command I used is like below with changing the >>>> '--rw' parameter and enabling the direct IO flag to measure the actual hardware >>>> transfer speed. >>>> >>>> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read >>>> >>>> My eMMC card working at HS400 Enhanced strobe mode: >>>> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 >>>> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB >>>> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB >>>> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB >>>> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) >>>> >>>> 1. Non-packed request >>>> I tested 3 times for each case and output a average speed. >>>> >>>> 1) Sequential read: >>>> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s >>>> Average speed: 28.7MiB/s >> >> This seems surprising low for a HS400ES card. Do you know why that is? > > I've set the clock to 400M, but it seems the hardware did not output > the corresponding clock. I will check my hardware. > >>>> >>>> 2) Random read: >>>> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s >>>> Average speed: 14.3MiB/s >>>> >>>> 3) Sequential write: >>>> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s >>>> Average speed: 24.7MiB/s >>>> >>>> 4) Random write: >>>> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s >>>> Average speed: 19.2MiB/s >>>> >>>> 2. Packed request >>>> In packed request mode, I set the host controller can package maximum 10 >>>> requests at one time (Actually I can increase the package number), and I >>>> enabled read/write packed request mode. Also I tested 3 times for each >>>> case and output a average speed. >>>> >>>> 1) Sequential read: >>>> Speed: 165MiB/s, 167MiB/s, 164MiB/s >>>> Average speed: 165.3MiB/s >>>> >>>> 2) Random read: >>>> Speed: 147MiB/s, 141MiB/s, 144MiB/s >>>> Average speed: 144MiB/s >>>> >>>> 3) Sequential write: >>>> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s >>>> Average speed: 89MiB/s >>>> >>>> 4) Random write: >>>> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s >>>> Average speed: 90.4MiB/s >>>> >>>> Form above data, we can see the packed request can improve the performance greatly. >>>> Any comments are welcome. Thanks a lot. >>> >>> Any comments for this patch set? Thanks. >> >> Did you consider adapting the CQE interface? > > I am not very familiar with CQE, since my controller did not support > it. But the MMC packed function had introduced some callbacks to help > for different controllers to do packed request, so I think it is easy > to adapt the CQE interface. > I meant did you consider using the CQE interface instead of creating another one?
On Mon, 12 Aug 2019 at 18:52, Adrian Hunter <adrian.hunter@intel.com> wrote: > > On 12/08/19 12:44 PM, Baolin Wang wrote: > > Hi Adrian, > > > > On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote: > >> > >> On 12/08/19 8:20 AM, Baolin Wang wrote: > >>> Hi, > >>> > >>> On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: > >>>> > >>>> Hi All, > >>>> > >>>> Now some SD/MMC controllers can support packed command or packed request, > >>>> that means it can package multiple requests to host controller to be handled > >>>> at one time, which can improve the I/O performence. Thus this patchset is > >>>> used to add the MMC packed function to support packed request or packed > >>>> command. > >>>> > >>>> In this patch set, I implemented the SD host ADMA3 transfer mode to support > >>>> packed request. The ADMA3 transfer mode can process a multi-block data transfer > >>>> by using a pair of command descriptor and ADMA2 descriptor. In future we can > >>>> easily expand the MMC packed function to support packed command. > >>>> > >>>> Below are some comparison data between packed request and non-packed request > >>>> with fio tool. The fio command I used is like below with changing the > >>>> '--rw' parameter and enabling the direct IO flag to measure the actual hardware > >>>> transfer speed. > >>>> > >>>> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read > >>>> > >>>> My eMMC card working at HS400 Enhanced strobe mode: > >>>> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > >>>> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > >>>> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > >>>> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > >>>> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > >>>> > >>>> 1. Non-packed request > >>>> I tested 3 times for each case and output a average speed. > >>>> > >>>> 1) Sequential read: > >>>> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s > >>>> Average speed: 28.7MiB/s > >> > >> This seems surprising low for a HS400ES card. Do you know why that is? > > > > I've set the clock to 400M, but it seems the hardware did not output > > the corresponding clock. I will check my hardware. > > > >>>> > >>>> 2) Random read: > >>>> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s > >>>> Average speed: 14.3MiB/s > >>>> > >>>> 3) Sequential write: > >>>> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s > >>>> Average speed: 24.7MiB/s > >>>> > >>>> 4) Random write: > >>>> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s > >>>> Average speed: 19.2MiB/s > >>>> > >>>> 2. Packed request > >>>> In packed request mode, I set the host controller can package maximum 10 > >>>> requests at one time (Actually I can increase the package number), and I > >>>> enabled read/write packed request mode. Also I tested 3 times for each > >>>> case and output a average speed. > >>>> > >>>> 1) Sequential read: > >>>> Speed: 165MiB/s, 167MiB/s, 164MiB/s > >>>> Average speed: 165.3MiB/s > >>>> > >>>> 2) Random read: > >>>> Speed: 147MiB/s, 141MiB/s, 144MiB/s > >>>> Average speed: 144MiB/s > >>>> > >>>> 3) Sequential write: > >>>> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s > >>>> Average speed: 89MiB/s > >>>> > >>>> 4) Random write: > >>>> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s > >>>> Average speed: 90.4MiB/s > >>>> > >>>> Form above data, we can see the packed request can improve the performance greatly. > >>>> Any comments are welcome. Thanks a lot. > >>> > >>> Any comments for this patch set? Thanks. > >> > >> Did you consider adapting the CQE interface? > > > > I am not very familiar with CQE, since my controller did not support > > it. But the MMC packed function had introduced some callbacks to help > > for different controllers to do packed request, so I think it is easy > > to adapt the CQE interface. > > > > I meant did you consider using the CQE interface instead of creating another > one? Sorry for misunderstanding. I think the core/core.c modification can use the CQE interface, but there are some difference in core/block.c, and I think they are different mechanisms, also I want to keep avoid affecting CQE and normal transfer, so I think adding MMC packed related interfaces will be easy to read and maintain. -- Baolin Wang Best Regards
Hi Adrian, On Mon, 12 Aug 2019 at 17:44, Baolin Wang <baolin.wang@linaro.org> wrote: > > Hi Adrian, > > On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote: > > > > On 12/08/19 8:20 AM, Baolin Wang wrote: > > > Hi, > > > > > > On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: > > >> > > >> Hi All, > > >> > > >> Now some SD/MMC controllers can support packed command or packed request, > > >> that means it can package multiple requests to host controller to be handled > > >> at one time, which can improve the I/O performence. Thus this patchset is > > >> used to add the MMC packed function to support packed request or packed > > >> command. > > >> > > >> In this patch set, I implemented the SD host ADMA3 transfer mode to support > > >> packed request. The ADMA3 transfer mode can process a multi-block data transfer > > >> by using a pair of command descriptor and ADMA2 descriptor. In future we can > > >> easily expand the MMC packed function to support packed command. > > >> > > >> Below are some comparison data between packed request and non-packed request > > >> with fio tool. The fio command I used is like below with changing the > > >> '--rw' parameter and enabling the direct IO flag to measure the actual hardware > > >> transfer speed. > > >> > > >> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read > > >> > > >> My eMMC card working at HS400 Enhanced strobe mode: > > >> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > > >> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > >> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > > >> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > > >> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > > >> > > >> 1. Non-packed request > > >> I tested 3 times for each case and output a average speed. > > >> > > >> 1) Sequential read: > > >> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s > > >> Average speed: 28.7MiB/s > > > > This seems surprising low for a HS400ES card. Do you know why that is? > > I've set the clock to 400M, but it seems the hardware did not output > the corresponding clock. I will check my hardware. I've checked my hardware and did not find any problem. The reason of low speed is that I set the bs = 4k, when I changed the bs=1M, and the speed can go up to 251MiB/s. ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=1M --size=512M --group_reporting --numjobs=20 --name=test_read READ: bw=251MiB/s (263MB/s), 251MiB/s-251MiB/s (263MB/s-263MB/s), io=10.0GiB (10.7GB), run=40826-40826msec -- Baolin Wang Best Regards