Message ID | 20231003110556.140317-1-vshankar@redhat.com |
---|---|
State | New |
Headers | show |
Series | Revert "ceph: enable async dirops by default" | expand |
On 10/4/23 21:15, Venky Shankar wrote: > Hi Ilya, > > After some digging and talking to Jeff, I figured that it's possible > to disable async dirops from the mds side by setting > `mds_client_delegate_inos_pct` config to 0: > > - name: mds_client_delegate_inos_pct > type: uint > level: advanced > desc: percentage of preallocated inos to delegate to client > default: 50 > services: > - mds > > So, I guess this patch is really not required. We can suggest this > config update to users and document it for now. We lack tests with > this config disabled, so I'll be adding the same before recommending > it out. Will keep you posted. Venky, Jeff This option could disable the async create operation but could it also disable the async unlink operation ? Thanks - Xiubo > On Wed, Oct 4, 2023 at 10:16 AM Venky Shankar <vshankar@redhat.com> wrote: >> Hi Ilya, >> >> On Tue, Oct 3, 2023 at 5:00 PM Ilya Dryomov <idryomov@gmail.com> wrote: >>> On Tue, Oct 3, 2023 at 1:06 PM Venky Shankar <vshankar@redhat.com> wrote: >>>> From: Xiubo Li <xiubli@redhat.com> >>>> >>>> This reverts commit f7a67b463fb83a4b9b11ceaa8ec4950b8fb7f902. >>>> >>>> We have identified an issue in the MDS affecting CephFS users using >>>> the kernel driver. The issue was first introduced in the octopus >>>> release that added support for clients to perform asynchronous >>>> directory operations using the `nowsync` mount option. The issue >>>> presents itself as an MDS crash resembling (any of) the following >>>> crashes: >>>> >>>> https://tracker.ceph.com/issues/61009 >>>> https://tracker.ceph.com/issues/58489 >>>> >>>> There is no apparent data loss or corruption, but since the underlying >>>> cause is related to an (operation) ordering issue, the extent of the >>>> problem could surface in other forms - most likely MDS crashes >>>> involving preallocated inodes. >>>> >>>> The fix is being reviewed and is being worked on priority: >>>> >>>> https://github.com/ceph/ceph/pull/53752 >>>> >>>> As a workaround, we recommend (kernel) clients be remounted with the >>>> `wsync` mount option which disables asynchronous directory operations >>>> (depending on the kernel version being used, the default could be >>>> `nowsync`). >>>> >>>> This change reverts the default, so, async dirops is disabled (by default). >>> Hi Venky, >>> >>> Given that the fix is now up and being reviewed on priority, does it >>> still make sense to change the default? >>> >>> According to Xiubo, https://tracker.ceph.com/issues/58489 which morphed >>> into https://tracker.ceph.com/issues/61009 isn't the only concern -- he >>> also brought up https://tracker.ceph.com/issues/62810. If the move to >>> revert (change of default) is also prompted by that issue, it should be >>> described in the patch. >> Fair enough -- I'll push out with an updated commit message. >> >>> Thanks, >>> >>> Ilya >>> >> >> -- >> Cheers, >> Venky > >
diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 5ec102f6b1ac..2bf6ccc9887b 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -742,8 +742,8 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root) if (fsopt->flags & CEPH_MOUNT_OPT_CLEANRECOVER) seq_show_option(m, "recover_session", "clean"); - if (!(fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)) - seq_puts(m, ",wsync"); + if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS) + seq_puts(m, ",nowsync"); if (fsopt->flags & CEPH_MOUNT_OPT_NOPAGECACHE) seq_puts(m, ",nopagecache"); if (fsopt->flags & CEPH_MOUNT_OPT_SPARSEREAD) diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 7f4b62182a5d..a5476892896c 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -47,8 +47,7 @@ #define CEPH_MOUNT_OPT_DEFAULT \ (CEPH_MOUNT_OPT_DCACHE | \ - CEPH_MOUNT_OPT_NOCOPYFROM | \ - CEPH_MOUNT_OPT_ASYNC_DIROPS) + CEPH_MOUNT_OPT_NOCOPYFROM) #define ceph_set_mount_opt(fsc, opt) \ (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt