Message ID | 20250205000249.123054-1-slava@dubeyko.com |
---|---|
Headers | show |
Series | ceph: fix generic/421 test failure | expand |
Hi David, Have you tried the fix? Does it fix the issue on your side? Thanks, Slava. On Tue, 2025-02-04 at 16:02 -0800, Viacheslav Dubeyko wrote: > From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> > > The generic/421 fails to finish because of the issue: > > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.894678] INFO: task kworker/u48:0:11 blocked for more than 122 seconds. > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895403] Not tainted 6.13.0-rc5+ #1 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896633] task:kworker/u48:0 state:D stack:0 pid:11 tgid:11 ppid:2 flags:0x00004000 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896641] Workqueue: writeback wb_workfn (flush-ceph-24) > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897614] Call Trace: > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897620] <TASK> > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897629] __schedule+0x443/0x16b0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897637] schedule+0x2b/0x140 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897640] io_schedule+0x4c/0x80 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897643] folio_wait_bit_common+0x11b/0x310 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897646] ? _raw_spin_unlock_irq+0xe/0x50 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897652] ? __pfx_wake_page_function+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897655] __folio_lock+0x17/0x30 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897658] ceph_writepages_start+0xca9/0x1fb0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897663] ? fsnotify_remove_queued_event+0x2f/0x40 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897668] do_writepages+0xd2/0x240 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897672] __writeback_single_inode+0x44/0x350 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897675] writeback_sb_inodes+0x25c/0x550 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897680] wb_writeback+0x89/0x310 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897683] ? finish_task_switch.isra.0+0x97/0x310 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897687] wb_workfn+0xb5/0x410 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897689] process_one_work+0x188/0x3d0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897692] worker_thread+0x2b5/0x3c0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897694] ? __pfx_worker_thread+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897696] kthread+0xe1/0x120 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897699] ? __pfx_kthread+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897701] ret_from_fork+0x43/0x70 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897705] ? __pfx_kthread+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897707] ret_from_fork_asm+0x1a/0x30 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897711] </TASK> > > There are several issues here: > (1) ceph_kill_sb() doesn't wait ending of flushing > all dirty folios/pages because of racy nature of > mdsc->stopping_blockers. As a result, mdsc->stopping > becomes CEPH_MDSC_STOPPING_FLUSHED too early. > (2) The ceph_inc_osd_stopping_blocker(fsc->mdsc) fails > to increment mdsc->stopping_blockers. Finally, > already locked folios/pages are never been unlocked > and the logic tries to lock the same page second time. > (3) The folio_batch with found dirty pages by > filemap_get_folios_tag() is not processed properly. > And this is why some number of dirty pages simply never > processed and we have dirty folios/pages after unmount > anyway. > > This patchset is refactoring the ceph_writepages_start() > method and it fixes the issues by means of: > (1) introducing dirty_folios counter and flush_end_wq > waiting queue in struct ceph_mds_client; > (2) ceph_dirty_folio() increments the dirty_folios > counter; > (3) writepages_finish() decrements the dirty_folios > counter and wake up all waiters on the queue > if dirty_folios counter is equal or lesser than zero; > (4) adding in ceph_kill_sb() method the logic of > checking the value of dirty_folios counter and > waiting if it is bigger than zero; > (5) adding ceph_inc_osd_stopping_blocker() call in the > beginning of the ceph_writepages_start() and > ceph_dec_osd_stopping_blocker() at the end of > the ceph_writepages_start() with the goal to resolve > the racy nature of mdsc->stopping_blockers. > > sudo ./check generic/421 > FSTYP -- ceph > PLATFORM -- Linux/x86_64 ceph-testing-0001 6.13.0+ #137 SMP PREEMPT_DYNAMIC Mon Feb 3 20:30:08 UTC 2025 > MKFS_OPTIONS -- 127.0.0.1:40551:/scratch > MOUNT_OPTIONS -- -o name=fs,secret=<secret>,ms_mode=crc,nowsync,copyfrom 127.0.0.1:40551:/scratch /mnt/scratch > > generic/421 7s ... 4s > Ran: generic/421 > Passed all 1 tests > > Viacheslav Dubeyko (4): > ceph: extend ceph_writeback_ctl for ceph_writepages_start() > refactoring > ceph: introduce ceph_process_folio_batch() method > ceph: introduce ceph_submit_write() method > ceph: fix generic/421 test failure > > fs/ceph/addr.c | 1110 +++++++++++++++++++++++++++--------------- > fs/ceph/mds_client.c | 2 + > fs/ceph/mds_client.h | 3 + > fs/ceph/super.c | 11 + > 4 files changed, 746 insertions(+), 380 deletions(-) >
Okay... I *think* that fixes the hang. There was one case where I saw the
hang, but I'm not sure that I had your patches applied or whether I'd managed
to boot the previous kernel that didn't.
So, just with respect to fixing the hang:
Tested-by: David Howells <dhowells@redhat.com>
There's still the issue of encrypted filenames occasionally showing through
which generic/397 is showing up - but I don't think your patches here fix
that, right?
David
On Fri, 2025-02-14 at 17:19 +0000, David Howells wrote: > Okay... I *think* that fixes the hang. There was one case where I saw the > hang, but I'm not sure that I had your patches applied or whether I'd managed > to boot the previous kernel that didn't. > > So, just with respect to fixing the hang: > > Tested-by: David Howells <dhowells@redhat.com> > > There's still the issue of encrypted filenames occasionally showing through > which generic/397 is showing up - but I don't think your patches here fix > that, right? > This patchset doesn't fix the generic/397 issue. I sent another patch ([PATCH v2] ceph: Fix kernel crash in generic/397 test) [1] before this one with the fix. Thanks, Slava. [1] https://lore.kernel.org/all/CAO8a2SjrDL5TqW70P3yyqv8X-B5jfQRg-eMTs9Nbntr8=Mwbog@mail.gmail.com/T/
Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > There's still the issue of encrypted filenames occasionally showing through > > which generic/397 is showing up - but I don't think your patches here fix > > that, right? > > > > This patchset doesn't fix the generic/397 issue. I sent another patch ([PATCH > v2] ceph: Fix kernel crash in generic/397 test) [1] before this one with the > fix. That doesn't fix the problem either. That seems to be fixing a crash, not: generic/397 - output mismatch (see /root/xfstests-dev/results//generic/397.out.bad) --- tests/generic/397.out 2024-09-12 12:36:14.167441927 +0100 +++ /root/xfstests-dev/results//generic/397.out.bad 2025-02-14 20:34:10.365900035 +0000 @@ -1,13 +1,27 @@ QA output created by 397 +Only in /xfstest.scratch/ref_dir: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy +Only in /xfstest.scratch/edir: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy����Sd�S�e��[�@���7,�� [�g�� +Only in /xfstest.scratch/edir: 70h6RnwpEg1PMtJp9yQ,2g +Only in /xfstest.scratch/edir: HHBOImQ7cdmsZKNhc5yPCX+XKu0+dn4VViEQzd0q3Ig +Only in /xfstest.scratch/edir: HXYO3UK3FrxqwSZaNnQ5zQ +Only in /xfstest.scratch/edir: PecH6opy8KkkB8ir8Oz0pw ... (Run 'diff -u /root/xfstests-dev/tests/generic/397.out /root/xfstests-dev/results//generic/397.out.bad' to see the entire diff) David
On Fri, 2025-02-14 at 20:35 +0000, David Howells wrote: > Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > > > There's still the issue of encrypted filenames occasionally showing through > > > which generic/397 is showing up - but I don't think your patches here fix > > > that, right? > > > > > > > This patchset doesn't fix the generic/397 issue. I sent another patch ([PATCH > > v2] ceph: Fix kernel crash in generic/397 test) [1] before this one with the > > fix. > > That doesn't fix the problem either. That seems to be fixing a crash, not: > > generic/397 - output mismatch (see /root/xfstests-dev/results//generic/397.out.bad) > --- tests/generic/397.out 2024-09-12 12:36:14.167441927 +0100 > +++ /root/xfstests-dev/results//generic/397.out.bad 2025-02-14 20:34:10.365900035 +0000 > @@ -1,13 +1,27 @@ > QA output created by 397 > +Only in /xfstest.scratch/ref_dir: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy > +Only in /xfstest.scratch/edir: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy����Sd�S�e��[�@���7,�� > [�g�� > +Only in /xfstest.scratch/edir: 70h6RnwpEg1PMtJp9yQ,2g > +Only in /xfstest.scratch/edir: HHBOImQ7cdmsZKNhc5yPCX+XKu0+dn4VViEQzd0q3Ig > +Only in /xfstest.scratch/edir: HXYO3UK3FrxqwSZaNnQ5zQ > +Only in /xfstest.scratch/edir: PecH6opy8KkkB8ir8Oz0pw > ... > (Run 'diff -u /root/xfstests-dev/tests/generic/397.out /root/xfstests-dev/results//generic/397.out.bad' to see the entire diff) > > > Do you mean that you applied this modification? --- fs/ceph/addr.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 85936f6d2bf7..5e6ba92219f3 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -396,6 +396,15 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) struct page **pages; size_t page_off; + /* + * The io_iter.count needs to be corrected to aligned length. + * Otherwise, iov_iter_get_pages_alloc2() operates with + * the initial unaligned length value. As a result, + * ceph_msg_data_cursor_init() triggers BUG_ON() in the case + * if msg->sparse_read_total > msg->data_length. + */ + subreq->io_iter.count = len; + err = iov_iter_get_pages_alloc2(&subreq->io_iter, &pages, len, &page_off); if (err < 0) { doutc(cl, "%llx.%llx failed to allocate pages, %d\n", @@ -405,6 +414,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) /* should always give us a page-aligned read */ WARN_ON_ONCE(page_off); + len = err; err = 0;
Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> Do you mean that you applied this modification?
See:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes
for I have applied.
David
On Fri, 2025-02-14 at 20:52 +0000, David Howells wrote: > Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > > Do you mean that you applied this modification? > > See: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes > > for I have applied. > OK. I didn't see such output during the testing: generic/397 - output mismatch (see /root/xfstests- dev/results//generic/397.out.bad) --- tests/generic/397.out 2024-09-12 12:36:14.167441927 +0100 +++ /root/xfstests-dev/results//generic/397.out.bad 2025-02-14 20:34:10.365900035 +0000 @@ -1,13 +1,27 @@ QA output created by 397 +Only in /xfstest.scratch/ref_dir: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy yyyyyyyyyyyyyyy +Only in /xfstest.scratch/edir: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy����Sd�S�e��[�@� ��7,�� [�g�� +Only in /xfstest.scratch/edir: 70h6RnwpEg1PMtJp9yQ,2g +Only in /xfstest.scratch/edir: HHBOImQ7cdmsZKNhc5yPCX+XKu0+dn4VViEQzd0q3Ig +Only in /xfstest.scratch/edir: HXYO3UK3FrxqwSZaNnQ5zQ +Only in /xfstest.scratch/edir: PecH6opy8KkkB8ir8Oz0pw ... (Run 'diff -u /root/xfstests-dev/tests/generic/397.out /root/xfstests- dev/results//generic/397.out.bad' to see the entire diff) Let me double check the test again. Thanks, Slava.
On Fri, 2025-02-14 at 20:52 +0000, David Howells wrote: > Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > > Do you mean that you applied this modification? > > See: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes > > for I have applied. > I took your branch [1] and compiled the kernel: git status HEAD detached at origin/netfs-fixes But I cannot reproduce the issue: sudo ./check -g encrypt FSTYP -- ceph PLATFORM -- Linux/x86_64 ceph-testing-0001 6.14.0-rc2+ #1 SMP PREEMPT_DYNAMIC Fri Feb 14 23:04:17 UTC 2025 MKFS_OPTIONS -- 127.0.0.1:40137:/scratch MOUNT_OPTIONS -- -o name=fs,secret=<secret>,ms_mode=crc,nowsync,copyfrom 127.0.0.1:40137:/scratch /mnt/scratch generic/395 15s ... 10s generic/396 12s ... 9s generic/397 13s ... 11s generic/398 1s ... [not run] kernel doesn't support renameat2 syscall generic/399 28s ... [not run] Filesystem ceph not supported in _scratch_mkfs_sized_encrypted generic/419 1s ... [not run] kernel doesn't support renameat2 syscall generic/421 17s ... 13s generic/429 24s ... 22s generic/435 1115s ... 873s generic/440 18s ... 13s generic/548 2s ... [not run] xfs_io fiemap failed (old kernel/wrong fs?) generic/549 2s ... [not run] encryption policy '-c 5 -n 6 -f 0' is unusable; probably missing kernel crypto API support generic/550 4s ... [not run] encryption policy '-c 9 -n 9 -f 0' is unusable; probably missing kernel crypto API support generic/576 [not run] fsverity utility required, skipped this test generic/580 18s ... 15s generic/581 21s ... 20s generic/582 2s ... [not run] xfs_io fiemap failed (old kernel/wrong fs?) generic/583 2s ... [not run] encryption policy '-c 5 -n 6 -v 2 -f 0' is unusable; probably missing kernel crypto API support generic/584 3s ... [not run] encryption policy '-c 9 -n 9 -v 2 -f 0' is unusable; probably missing kernel crypto API support generic/592 3s ... [not run] kernel does not support encryption policy: '-c 1 -n 4 -v 2 -f 8' generic/593 18s ... 14s generic/595 20s ... 19s generic/602 2s ... [not run] kernel does not support encryption policy: '-c 1 -n 4 -v 2 -f 16' generic/613 5s ... [not run] _get_encryption_nonce() isn't implemented on ceph generic/621 6s ... [not run] kernel doesn't support renameat2 syscall generic/693 6s ... [not run] encryption policy '-c 1 -n 10 -v 2 -f 0' is unusable; probably missing kernel crypto API support generic/739 [not run] xfs_io set_encpolicy doesn't support -s Ran: generic/395 generic/396 generic/397 generic/398 generic/399 generic/419 generic/421 generic/429 generic/435 generic/440 generic/548 generic/549 generic/550 generic/576 generic/580 generic/581 generic/582 generic/583 generic/584 generic/592 generic/593 generic/595 generic/602 generic/613 generic/621 generic/693 generic/739 Not run: generic/398 generic/399 generic/419 generic/548 generic/549 generic/550 generic/576 generic/582 generic/583 generic/584 generic/592 generic/602 generic/613 generic/621 generic/693 generic/739 Passed all 27 tests [1] https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
On Tue, 04 Feb 2025 16:02:45 -0800, Viacheslav Dubeyko wrote: > From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> > > The generic/421 fails to finish because of the issue: > > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.894678] INFO: task kworker/u48:0:11 blocked for more than 122 seconds. > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895403] Not tainted 6.13.0-rc5+ #1 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896633] task:kworker/u48:0 state:D stack:0 pid:11 tgid:11 ppid:2 flags:0x00004000 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896641] Workqueue: writeback wb_workfn (flush-ceph-24) > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897614] Call Trace: > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897620] <TASK> > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897629] __schedule+0x443/0x16b0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897637] schedule+0x2b/0x140 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897640] io_schedule+0x4c/0x80 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897643] folio_wait_bit_common+0x11b/0x310 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897646] ? _raw_spin_unlock_irq+0xe/0x50 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897652] ? __pfx_wake_page_function+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897655] __folio_lock+0x17/0x30 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897658] ceph_writepages_start+0xca9/0x1fb0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897663] ? fsnotify_remove_queued_event+0x2f/0x40 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897668] do_writepages+0xd2/0x240 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897672] __writeback_single_inode+0x44/0x350 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897675] writeback_sb_inodes+0x25c/0x550 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897680] wb_writeback+0x89/0x310 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897683] ? finish_task_switch.isra.0+0x97/0x310 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897687] wb_workfn+0xb5/0x410 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897689] process_one_work+0x188/0x3d0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897692] worker_thread+0x2b5/0x3c0 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897694] ? __pfx_worker_thread+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897696] kthread+0xe1/0x120 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897699] ? __pfx_kthread+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897701] ret_from_fork+0x43/0x70 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897705] ? __pfx_kthread+0x10/0x10 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897707] ret_from_fork_asm+0x1a/0x30 > Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897711] </TASK> > > [...] Applied to the vfs-6.15.ceph branch of the vfs/vfs.git tree. Patches in the vfs-6.15.ceph branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.15.ceph [1/4] ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoring https://git.kernel.org/vfs/vfs/c/f08068df4aa4 [2/4] ceph: introduce ceph_process_folio_batch() method https://git.kernel.org/vfs/vfs/c/ce80b76dd327 [3/4] ceph: introduce ceph_submit_write() method https://git.kernel.org/vfs/vfs/c/1551ec61dc55 [4/4] ceph: fix generic/421 test failure https://git.kernel.org/vfs/vfs/c/fd7449d937e7
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> The generic/421 fails to finish because of the issue: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.894678] INFO: task kworker/u48:0:11 blocked for more than 122 seconds. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895403] Not tainted 6.13.0-rc5+ #1 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.895867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896633] task:kworker/u48:0 state:D stack:0 pid:11 tgid:11 ppid:2 flags:0x00004000 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.896641] Workqueue: writeback wb_workfn (flush-ceph-24) Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897614] Call Trace: Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897620] <TASK> Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897629] __schedule+0x443/0x16b0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897637] schedule+0x2b/0x140 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897640] io_schedule+0x4c/0x80 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897643] folio_wait_bit_common+0x11b/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897646] ? _raw_spin_unlock_irq+0xe/0x50 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897652] ? __pfx_wake_page_function+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897655] __folio_lock+0x17/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897658] ceph_writepages_start+0xca9/0x1fb0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897663] ? fsnotify_remove_queued_event+0x2f/0x40 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897668] do_writepages+0xd2/0x240 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897672] __writeback_single_inode+0x44/0x350 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897675] writeback_sb_inodes+0x25c/0x550 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897680] wb_writeback+0x89/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897683] ? finish_task_switch.isra.0+0x97/0x310 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897687] wb_workfn+0xb5/0x410 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897689] process_one_work+0x188/0x3d0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897692] worker_thread+0x2b5/0x3c0 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897694] ? __pfx_worker_thread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897696] kthread+0xe1/0x120 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897699] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897701] ret_from_fork+0x43/0x70 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897705] ? __pfx_kthread+0x10/0x10 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897707] ret_from_fork_asm+0x1a/0x30 Jan 3 14:25:27 ceph-testing-0001 kernel: [ 369.897711] </TASK> There are several issues here: (1) ceph_kill_sb() doesn't wait ending of flushing all dirty folios/pages because of racy nature of mdsc->stopping_blockers. As a result, mdsc->stopping becomes CEPH_MDSC_STOPPING_FLUSHED too early. (2) The ceph_inc_osd_stopping_blocker(fsc->mdsc) fails to increment mdsc->stopping_blockers. Finally, already locked folios/pages are never been unlocked and the logic tries to lock the same page second time. (3) The folio_batch with found dirty pages by filemap_get_folios_tag() is not processed properly. And this is why some number of dirty pages simply never processed and we have dirty folios/pages after unmount anyway. This patchset is refactoring the ceph_writepages_start() method and it fixes the issues by means of: (1) introducing dirty_folios counter and flush_end_wq waiting queue in struct ceph_mds_client; (2) ceph_dirty_folio() increments the dirty_folios counter; (3) writepages_finish() decrements the dirty_folios counter and wake up all waiters on the queue if dirty_folios counter is equal or lesser than zero; (4) adding in ceph_kill_sb() method the logic of checking the value of dirty_folios counter and waiting if it is bigger than zero; (5) adding ceph_inc_osd_stopping_blocker() call in the beginning of the ceph_writepages_start() and ceph_dec_osd_stopping_blocker() at the end of the ceph_writepages_start() with the goal to resolve the racy nature of mdsc->stopping_blockers. sudo ./check generic/421 FSTYP -- ceph PLATFORM -- Linux/x86_64 ceph-testing-0001 6.13.0+ #137 SMP PREEMPT_DYNAMIC Mon Feb 3 20:30:08 UTC 2025 MKFS_OPTIONS -- 127.0.0.1:40551:/scratch MOUNT_OPTIONS -- -o name=fs,secret=<secret>,ms_mode=crc,nowsync,copyfrom 127.0.0.1:40551:/scratch /mnt/scratch generic/421 7s ... 4s Ran: generic/421 Passed all 1 tests Viacheslav Dubeyko (4): ceph: extend ceph_writeback_ctl for ceph_writepages_start() refactoring ceph: introduce ceph_process_folio_batch() method ceph: introduce ceph_submit_write() method ceph: fix generic/421 test failure fs/ceph/addr.c | 1110 +++++++++++++++++++++++++++--------------- fs/ceph/mds_client.c | 2 + fs/ceph/mds_client.h | 3 + fs/ceph/super.c | 11 + 4 files changed, 746 insertions(+), 380 deletions(-)