[v6,00/18] Use block pr_ops in LIO

Message ID	20230407200551.12660-1-michael.christie@oracle.com
Headers	show Return-Path: <linux-scsi-owner@vger.kernel.org> From: Mike Christie <michael.christie@oracle.com> To: bvanassche@acm.org, hch@lst.de, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, james.bottomley@hansenpartnership.com, linux-block@vger.kernel.org, dm-devel@redhat.com, snitzer@kernel.org, axboe@kernel.dk, linux-nvme@lists.infradead.org, chaitanyak@nvidia.com, kbusch@kernel.org, target-devel@vger.kernel.org Subject: [PATCH v6 00/18] Use block pr_ops in LIO Date: Fri, 7 Apr 2023 15:05:33 -0500 Message-Id: <20230407200551.12660-1-michael.christie@oracle.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Use block pr_ops in LIO \| expand [v6,00/18] Use block pr_ops in LIO [v6,01/18] block: Add PR callouts for read keys and reservation [v6,02/18] block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT [v6,03/18] scsi: Rename sd_pr_command [v6,04/18] scsi: Move sd_pr_type to scsi_common [v6,05/18] scsi: Add support for block PR read keys/reservation [v6,06/18] dm: Add support for block PR read keys/reservation [v6,07/18] nvme: Fix reservation status related structs [v6,08/18] nvme: Don't hardcode the data len for pr commands [v6,09/18] nvme: Move pr code to it's own file [v6,10/18] nvme: Add helper to send pr command [v6,11/18] nvme: Add pr_ops read_keys support [v6,12/18] nvme: Add a nvme_pr_type enum [v6,13/18] nvme: Add pr_ops read_reservation support [v6,14/18] scsi: target: Rename sbc_ops to exec_cmd_ops [v6,15/18] scsi: target: Allow backends to hook into PR handling [v6,16/18] scsi: target: Pass struct target_opcode_descriptor to enabled [v6,17/18] scsi: target: Report and detect unsupported PR commands [v6,18/18] scsi: target: Add block PR support to iblock

Mike Christie April 7, 2023, 8:05 p.m. UTC

The patches in this thread allow us to use the block pr_ops with LIO's
target_core_iblock module to support cluster applications in VMs. They
were built over Linus's tree. They also apply over linux-next and
Martin's tree and Jens's trees.

Currently, to use windows clustering or linux clustering (pacemaker +
cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you have
to use tcmu or pscsi or use a cluster aware FS/framework for the LIO pr
file. Setting up a cluster FS/framework is pain and waste when your real
backend device is already a distributed device, and pscsi and tcmu are
nice for specific use cases, but iblock gives you the best performance and
allows you to use stacked devices like dm-multipath. So these patches
allow iblock to work like pscsi/tcmu where they can pass a PR command to
the backend module. And then iblock will use the pr_ops to pass the PR
command to the real devices similar to what we do for unmap today.

The patches are separated in the following groups:
Patch 1 - 2:
- Add block layer callouts for reading reservations and rename reservation
error code.
Patch 3 - 5:
- SCSI support for new callouts.
Patch 6:
- DM support for new callouts.
Patch 7 - 13:
- NVMe support for new callouts.
Patch 14 - 18:
- LIO support for new callouts.

This patchset has been tested with the libiscsi PGR ops and with window's
failover cluster verification test. Note that for scsi backend devices we
need this patchset:

https://lore.kernel.org/linux-scsi/20230123221046.125483-1-michael.christie@oracle.com/T/#m4834a643ffb5bac2529d65d40906d3cfbdd9b1b7

to handle UAs. To reduce the size of this patchset that's being done
separately to make reviewing easier. And to make merging easier this
patchset and the one above do not have any conflicts so can be merged
in different trees.

v6:
- Drop dm use comment.
- Move scsi pr type conversion code to scsi_common.c/.h.
- Fix le NVME_EXTENDED_DATA_STRUCT use.
- Fix sparse warnings like sense_reason_t use.

v5:
- Use []/struct_size with nvme reservation structs
- Add Keith's copywrite to pr.c
- Drop else in nvme_send_pr_command
- Fix PR_EXCLUSIVE_ACCESS_ALL_REGS use in block_pr_type_from_nvme

v4:
- Pass read_keys number of keys instead of array len
- Keep the switch use when converting between block and scsi/nvme PR
types. Drop default case so compiler spits out warning if in the future
a new value is added.
- Add helper for handling
nvme_send_ns_head_pr_command/nvme_send_ns_pr_command
- Use void * instead of u8* for passing data buffer.
- Rename status variable to rs.
- Have caller init buffer/structs instead of nvme/scsi callouts.
- Drop blk_status to err code.

v3:
- Fix patch subject formatting.
- Fix coding style.
- Rearrange patches so helpers are added with users to avoid compilation
errors.
- Move pr type conversion to array and add nvme_pr_type.
- Add Extended Data Structure control flag enum and use in code for checks.
- Move nvme pr code to new file.
- Add more info to patch subjects about why we need to add blk_status
to pr_ops.
- Use generic SCSI passthrough error handling interface.
- Fix checkpatch --strict errors. Note that I kept the existing coding
style that it complained about because it looked like it was the preferred
style for the code and I didn't want a mix and match.

v2:
- Drop BLK_STS_NEXUS rename changes. Will do separately.
- Add NVMe support.
- Fixed bug in target_core_iblock where a variable was not initialized
mentioned by Christoph.
- Fixed sd pr_ops UA handling issue found when running libiscsi PGR tests.
- Added patches to allow pr_ops to pass up a BLK_STS so we could return
a RESERVATION_CONFLICT status when a pr_ops callout fails.

Naresh Kamboju April 12, 2023, 9:36 a.m. UTC | #1

[sorry for the adding you in CC]

While running LTP controllers test suite on this patch set applied on top of
the next-20230406 and the following kernel panic noticed on qemu-i386.

crash log:
---------
pids 1 TINFO: timeout per run is 0h 25m 0s
pids 1 TINFO: test starts with cgroup version 1
pids 1 TINFO: Running testcase 6 with 10 processes
pids 1 TINFO: set a limit that is smaller than current number of pids
<6>[  782.211806] cgroup: fork rejected by pids controller in /ltp/test-2724
pids 1 TPASS: fork failed as expected
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2760 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2761 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2762 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2763 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2764 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2765 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2766 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2767 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2768 Killed                  pids_task2
/opt/ltp/testcases/bin/tst_test.sh: line 150:  2769 Killed                  pids_task2
<4>[  782.594441] int3: 0000 [#1] PREEMPT SMP
<4>[  782.594783] CPU: 1 PID: 2724 Comm: pids.sh Not tainted 6.3.0-rc5-next-20230406 #1
<4>[  782.594915] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
<4>[  782.595168] EIP: get_page_from_freelist+0x157/0xccc
<4>[  782.595745] Code: 48 04 8d 43 08 85 c9 0f 85 fe 07 00 00 3b 73 0c 0f 82 f5 07 00 00 89 45 c8 8b 45 c8 8b 00 89 45 e0 85 c0 0f 84 fc 07 00 00 3e <8d> 74 26 00 8b 45 cc 80 78 14 00 0f 84 90 05 00 00 8b 45 e0 8b 58
<4>[  782.595850] EAX: 00000000 EBX: c7ab5ce4 ECX: 00000800 EDX: 00000000
<4>[  782.595889] ESI: 00400dc0 EDI: 00000000 EBP: c7ab5c78 ESP: c7ab5c04
<4>[  782.595928] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00000297
<4>[  782.596017] CR0: 80050033 CR2: 086d1d54 CR3: 06eaa000 CR4: 000006d0
<4>[  782.596188] Call Trace:
<4>[  782.596503]  ? exc_int3+0x10/0x130
<4>[  782.596630]  ? __irq_exit_rcu+0x15/0xcc
<4>[  782.596655]  ? sysvec_call_function+0x3c/0x3c
<4>[  782.596677]  ? __irq_exit_rcu+0x15/0xcc
<4>[  782.596694]  ? irqentry_exit+0x26/0x58
<4>[  782.596733]  __alloc_pages+0x156/0xf50
<4>[  782.596783]  ? handle_exception+0x133/0x133
<4>[  782.596817]  ? __rcu_read_unlock+0x1e/0x30
<4>[  782.596843]  ? sysvec_call_function+0x3c/0x3c
<4>[  782.596880]  ? __irq_exit_rcu+0x15/0xcc
<4>[  782.596903]  pte_alloc_one+0x23/0x88
<4>[  782.596928]  __pte_alloc+0x21/0xb0
<4>[  782.596981]  ? trace_hardirqs_on+0x2c/0x88
<4>[  782.597002]  copy_page_range+0x67d/0xb40
<4>[  782.597049]  ? mas_wr_modify+0x10e/0x364
<4>[  782.597069]  ? mod_objcg_state+0x99/0x378
<4>[  782.597103]  ? mas_wr_store_entry.isra.0+0x10c/0x534
<4>[  782.597127]  ? mas_store+0x45/0xb0
<4>[  782.597160]  copy_process+0x1de6/0x1f9c
<4>[  782.597179]  ? lockref_get_not_dead+0x2c/0x38
<4>[  782.597246]  kernel_clone+0xc1/0x3dc
<4>[  782.597279]  __ia32_sys_clone+0x71/0x8c
<4>[  782.597320]  __do_fast_syscall_32+0x4c/0xb8
<4>[  782.597340]  do_fast_syscall_32+0x32/0x74
<4>[  782.597361]  do_SYSENTER_32+0x15/0x24
<4>[  782.597381]  entry_SYSENTER_32+0x98/0xf1
<4>[  782.597484] EIP: 0xb7f3e579
<4>[  782.597838] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
<4>[  782.597856] EAX: ffffffda EBX: 01200011 ECX: 00000000 EDX: 00000000
<4>[  782.597876] ESI: 00000000 EDI: b7f398e8 EBP: b7f01e3c ESP: bfa4599c
<4>[  782.597889] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286
<4>[  782.598167] Modules linked in:
<4>[  782.616852]  ---[ end trace 0000000000000000 ] ---
<4>[  782.616998] EIP: get_page_from_freelist+0x157/0xccc
<4>[  782.617213] Code: 48 04 8d 43 08 85 c9 0f 85 fe 07 00 00 3b 73 0c 0f 82 f5 07 00 00 89 45 c8 8b 45 c8 8b 00 89 45 e0 85 c0 0f 84 fc 07 00 00 3e <8d> 74 26 00 8b 45 cc 80 78 14 00 0f 84 90 05 00 00 8b 45 e0 8b 58
<4>[  782.617234] EAX: 00000000 EBX: c7ab5ce4 ECX: 00000800 EDX: 00000000
<4>[  782.617247] ESI: 00400dc0 EDI: 00000000 EBP: c7ab5c78 ESP: c7ab5c04
<4>[  782.617260] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00000297
<4>[  782.617276] CR0: 80050033 CR2: 086d1d54 CR3: 06eaa000 CR4: 000006d0
<0>[  782.617472] Kernel panic - not syncing: Fatal exception in interrupt
<0>[  782.619115] Kernel Offset: disabled


Crash log details,
https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171777/suite/log-parser-test/test/check-kernel-panic/details/
https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171777/suite/log-parser-test/test/check-kernel-panic/log

Build artifacts links,
https://storage.tuxsuite.com/public/linaro/anders/builds/2OGpRflnb14kT3aJdPte4NdjKoT/
vmlinux: https://storage.tuxsuite.com/public/linaro/anders/builds/2OGpRflnb14kT3aJdPte4NdjKoT/vmlinux.xz
System.map: https://storage.tuxsuite.com/public/linaro/anders/builds/2OGpRflnb14kT3aJdPte4NdjKoT/System.map


steps to reproduce:
# To install tuxrun on your system globally:
# sudo pip3 install -U tuxrun==0.41.0
#
# See https://tuxrun.org/ for complete documentation.

tuxrun   \
 --runtime podman   \
 --device qemu-i386   \
 --kernel https://storage.tuxsuite.com/public/linaro/anders/builds/2OGpRflnb14kT3aJdPte4NdjKoT/bzImage   \
 --modules https://storage.tuxsuite.com/public/linaro/anders/builds/2OGpRflnb14kT3aJdPte4NdjKoT/modules.tar.xz   \
 --rootfs https://storage.tuxsuite.com/public/linaro/lkft/oebuilds/2OFRZUbhWDZYvEcYrKKj1AJ618K/images/intel-core2-32/lkft-tux-image-intel-core2-32-20230410193126.rootfs.ext4.gz   \
 --parameters SKIPFILE=skipfile-lkft.yaml   \
 --parameters SHARD_NUMBER=10   \
 --parameters SHARD_INDEX=10   \
 --image docker.io/lavasoftware/lava-dispatcher:2023.01.0020.gc1598238f   \
 --tests ltp-controllers   \
 --timeouts boot=15 ltp-controllers=80


--
Linaro LKFT
https://lkft.linaro.org

Naresh Kamboju April 12, 2023, 10:25 a.m. UTC | #2

On Wed, 12 Apr 2023 at 15:06, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> [sorry for the adding you in CC]
>
> While running LTP controllers test suite on this patch set applied on top of
> the next-20230406 and the following kernel panic noticed on qemu-i386.

Also noticed on qemu-x86_64.

Crash log link,
------------------
- https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171908/suite/log-parser-test/test/check-kernel-panic/log
- https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171908/suite/log-parser-test/tests/

lore link,
https://lore.kernel.org/linux-block/20230407200551.12660-1-michael.christie@oracle.com/


--
Linaro LKFT
https://lkft.linaro.org

Naresh Kamboju April 12, 2023, 10:57 a.m. UTC | #3

[sorry for the adding you in CC]

While running LTP controllers test suite on this patch set applied on top of
the next-20230406 and the following kernel panic noticed on qemu-x86_64.


Lore link: https://lore.kernel.org/linux-block/20230407200551.12660-1-michael.christie@oracle.com/


cpuset_inherit 31 TPASS: mem_exclusive: Inherited information is right!
cpuset_inherit 33 TPASS: mem_hardwall: Inherited information is right!
<4>[ 1234.875309] int3: 0000 [#1] PREEMPT SMP PTI
<4>[ 1234.875748] CPU: 1 PID: 32990 Comm: umount Not tainted 6.3.0-rc5-next-20230406 #1
<4>[ 1234.875946] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
<4>[ 1234.876062] RIP: 0010:__alloc_pages+0xdf/0x310
<4>[ 1234.876666] Code: 4c 89 45 b0 83 3d b8 1c cb 01 00 0f 85 a2 01 00 00 44 89 f0 c1 e8 03 83 e0 03 89 75 a0 89 45 c0 be 01 00 00 00 4c 89 45 98 0f <1f> 44 00 00 4d 89 c5 44 89 f0 44 89 75 a4 41 f7 c6 00 04 00 00 74
<4>[ 1234.876885] RSP: 0000:ffff8b9305f47c10 EFLAGS: 00000202
<4>[ 1234.877418] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
<4>[ 1234.877433] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000140cca
<4>[ 1234.877446] RBP: ffff8b9305f47c78 R08: 0000000000000000 R09: 0000000000000000
<4>[ 1234.877460] R10: 0000000000100cca R11: 0000000000000000 R12: 0000000000000003
<4>[ 1234.877472] R13: ffff88d28b753b00 R14: 0000000000140cca R15: ffff88d2ffffb300
<4>[ 1234.877518] FS:  0000000000000000(0000) GS:ffff88d2fbd00000(0000) knlGS:0000000000000000
<4>[ 1234.877547] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 1234.877568] CR2: 0000000000408ee0 CR3: 0000000111432000 CR4: 00000000000006e0
<4>[ 1234.877662] Call Trace:
<4>[ 1234.877828]  <TASK>
<4>[ 1234.877954]  __folio_alloc+0x1e/0x50
<4>[ 1234.878051]  vma_alloc_folio+0x4af/0x520
<4>[ 1234.878066]  ? _raw_spin_unlock+0x1a/0x40
<4>[ 1234.878079]  ? do_wp_page+0x164/0xdd0
<4>[ 1234.878093]  ? trace_preempt_on+0x1e/0x80
<4>[ 1234.878105]  ? preempt_count_sub+0x63/0x80
<4>[ 1234.878118]  do_wp_page+0x3cb/0xdd0
<4>[ 1234.878129]  ? handle_mm_fault+0x739/0x19e0
<4>[ 1234.878142]  ? _raw_spin_lock+0x23/0x50
<4>[ 1234.878159]  handle_mm_fault+0x770/0x19e0
<4>[ 1234.878170]  ? up_write+0x52/0xe0
<4>[ 1234.878196]  do_user_addr_fault+0x4d4/0x6c0
<4>[ 1234.878210]  ? trace_hardirqs_off_finish+0x38/0x90
<4>[ 1234.878222]  exc_page_fault+0x80/0x1d0
<4>[ 1234.878234]  asm_exc_page_fault+0x2b/0x30
<4>[ 1234.878291] RIP: 0033:0x7f3f910e65f6
<4>[ 1234.878542] Code: 2f 0a 00 00 e8 5b 23 ff ff 48 89 85 60 fd ff ff 48 8b 85 88 fd ff ff 48 8b 80 e8 00 00 00 48 85 c0 74 0b 48 8b 9d 68 fd ff ff <48> 89 58 08 48 8b 85 68 fd ff ff c7 40 18 01 00 00 00 e8 43 2b fe
<4>[ 1234.878552] RSP: 002b:00007fff6f8dce50 EFLAGS: 00000206
<4>[ 1234.878563] RAX: 0000000000408ed8 RBX: 00007f3f910fe0d8 RCX: 00007f3f910c6270
<4>[ 1234.878569] RDX: 0000000000000000 RSI: 00007f3f910fe2a0 RDI: 00007fff6f8dcf10
<4>[ 1234.878575] RBP: 00007fff6f8dd110 R08: 0000000000000000 R09: 0000000000000007
<4>[ 1234.878581] R10: 0000000000000000 R11: 0000000000000008 R12: 0000000000000000
<4>[ 1234.878585] R13: 00007f3f910fe2a0 R14: 00007fff6f8dced0 R15: 0000000000000000
<4>[ 1234.878636]  </TASK>
<4>[ 1234.878693] Modules linked in:
<4>[ 1234.894886] ---[ end trace 0000000000000000 ]---
<4>[ 1234.895879] RIP: 0010:__alloc_pages+0xdf/0x310
<4>[ 1234.895921] Code: 4c 89 45 b0 83 3d b8 1c cb 01 00 0f 85 a2 01 00 00 44 89 f0 c1 e8 03 83 e0 03 89 75 a0 89 45 c0 be 01 00 00 00 4c 89 45 98 0f <1f> 44 00 00 4d 89 c5 44 89 f0 44 89 75 a4 41 f7 c6 00 04 00 00 74
<4>[ 1234.895936] RSP: 0000:ffff8b9305f47c10 EFLAGS: 00000202
<4>[ 1234.895956] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
<4>[ 1234.895966] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000140cca
<4>[ 1234.895977] RBP: ffff8b9305f47c78 R08: 0000000000000000 R09: 0000000000000000
<4>[ 1234.895985] R10: 0000000000100cca R11: 0000000000000000 R12: 0000000000000003
<4>[ 1234.895993] R13: ffff88d28b753b00 R14: 0000000000140cca R15: ffff88d2ffffb300
<4>[ 1234.896003] FS:  0000000000000000(0000) GS:ffff88d2fbd00000(0000) knlGS:0000000000000000
<4>[ 1234.896016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 1234.896026] CR2: 0000000000408ee0 CR3: 0000000111432000 CR4: 00000000000006e0
<0>[ 1234.896379] Kernel panic - not syncing: Fatal exception in interrupt
<0>[ 1234.900046] Kernel Offset: 0x8800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Links:
https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230407200551_12660-1-michael_christie_oracle_com/testrun/16172098/suite/log-parser-test/test/check-kernel-panic/log
https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230407200551_12660-1-michael_christie_oracle_com/testrun/16172061/suite/log-parser-test/test/check-kernel-panic/log

https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230407200551_12660-1-michael_christie_oracle_com/testrun/16172098/suite/log-parser-test/tests/
https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230407200551_12660-1-michael_christie_oracle_com/testrun/16172061/suite/log-parser-test/tests/


Steps to reproduce:
-------------------

# To install tuxrun on your system globally:
# sudo pip3 install -U tuxrun==0.41.0
#
# See https://tuxrun.org/ for complete documentation.

tuxrun   \
 --runtime podman   \
 --device qemu-x86_64   \
 --kernel https://storage.tuxsuite.com/public/linaro/anders/builds/2OGpdVeBBG4Gj6aACnSdSGva2LN/bzImage   \
 --modules https://storage.tuxsuite.com/public/linaro/anders/builds/2OGpdVeBBG4Gj6aACnSdSGva2LN/modules.tar.xz   \
 --rootfs https://storage.tuxsuite.com/public/linaro/lkft/oebuilds/2OFRZUnV2Q9jOFdE3gH3Gq2v692/images/intel-corei7-64/lkft-tux-image-intel-corei7-64-20230410193144.rootfs.ext4.gz   \
 --parameters SKIPFILE=skipfile-lkft.yaml   \
 --parameters SHARD_NUMBER=10   \
 --parameters SHARD_INDEX=10   \
 --image docker.io/lavasoftware/lava-dispatcher:2023.01.0020.gc1598238f   \
 --tests ltp-controllers   \
 --timeouts boot=15 ltp-controllers=80



--
Linaro LKFT
https://lkft.linaro.org

Mike Christie April 12, 2023, 6:28 p.m. UTC | #4

On 4/12/23 5:25 AM, Naresh Kamboju wrote:
> On Wed, 12 Apr 2023 at 15:06, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>>
>> [sorry for the adding you in CC]
>>
>> While running LTP controllers test suite on this patch set applied on top of
>> the next-20230406 and the following kernel panic noticed on qemu-i386.
> 
> Also noticed on qemu-x86_64.
> 
> Crash log link,
> ------------------
> - https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171908/suite/log-parser-test/test/check-kernel-panic/log
> - https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171908/suite/log-parser-test/tests/

Can you point me to the original report? I don't think my patches are the cause of
the failure, or if they are there is a crazy bug.

Above, I think you pointed me to the wrong link above because it looks like that's
for a different patchset. Or did I misunderstand the testing and that link has my
patches included?

I did see my patches tested:

https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/

but they seem to fail in similar places as other failures that day, and the
failures don't seem related to my patches. It doesn't look like you are doing
anything nvme or block pr ioctl related and just failing on forks and OOM.
It looks like you are booting from a scsi device but I only touched the scsi
layer's code for persistent reservations and the tests don't seem to be
using that code.

> 
> lore link,
> https://lore.kernel.org/linux-block/20230407200551.12660-1-michael.christie@oracle.com/
> 
> 
> --
> Linaro LKFT
> https://lkft.linaro.org

Naresh Kamboju April 13, 2023, 9:50 a.m. UTC | #5

On Wed, 12 Apr 2023 at 23:59, Mike Christie <michael.christie@oracle.com> wrote:
>
> On 4/12/23 5:25 AM, Naresh Kamboju wrote:
> > On Wed, 12 Apr 2023 at 15:06, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >>
> >> [sorry for the adding you in CC]
> >>
> >> While running LTP controllers test suite on this patch set applied on top of
> >> the next-20230406 and the following kernel panic noticed on qemu-i386.
> >
> > Also noticed on qemu-x86_64.
> >
> > Crash log link,
> > ------------------
> > - https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171908/suite/log-parser-test/test/check-kernel-panic/log
> > - https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/build/lore_kernel_org_linux-block_20230404140835_25166-1-sergei_shtepa_veeam_com/testrun/16171908/suite/log-parser-test/tests/
>
> Can you point me to the original report? I don't think my patches are the cause of
> the failure, or if they are there is a crazy bug.
>
> Above, I think you pointed me to the wrong link above because it looks like that's
> for a different patchset. Or did I misunderstand the testing and that link has my
> patches included?
>
> I did see my patches tested:
>
> https://qa-reports.linaro.org/~anders.roxell/linux-mainline-patches/
>
> but they seem to fail in similar places as other failures that day, and the
> failures don't seem related to my patches. It doesn't look like you are doing
> anything nvme or block pr ioctl related and just failing on forks and OOM.
> It looks like you are booting from a scsi device but I only touched the scsi
> layer's code for persistent reservations and the tests don't seem to be
> using that code.

Thanks for the analysis on these reports.
Since it is based on top of Linux next-20230306, I will re-validate the base
and get back to you with my findings.

- Naresh


>
>
>
> >
> > lore link,
> > https://lore.kernel.org/linux-block/20230407200551.12660-1-michael.christie@oracle.com/
> >
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org
>

[v6,00/18] Use block pr_ops in LIO

Message

Comments