Message ID | 20220311132206.24515-2-xiaoguang.wang@linux.alibaba.com |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] scsi: target: tcmu: Fix possible page UAF | expand |
hello, Gentle ping. Regards, Xiaoguang Wang > Currently tcmu_vma_fault() uses udev->cmdr_lock to avoid concurrent > find_free_blocks(), which unmaps idle pages and truncates them. This > work is really like many filesystem's truncate operations, but they > use address_space->invalidate_lock to protect race. > > This patch replaces cmdr_lock with address_space->invalidate_lock in > tcmu fault procedure, which will also make page-fault have concurrency. > > Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> > --- > drivers/target/target_core_user.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c > index 06a5c4086551..e0a62623ccd7 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -1815,13 +1815,14 @@ static int tcmu_find_mem_index(struct vm_area_struct *vma) > > static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) > { > + struct address_space *mapping = udev->inode->i_mapping; > struct page *page; > > - mutex_lock(&udev->cmdr_lock); > + filemap_invalidate_lock_shared(mapping); > page = xa_load(&udev->data_pages, dpi); > if (likely(page)) { > get_page(page); > - mutex_unlock(&udev->cmdr_lock); > + filemap_invalidate_unlock_shared(mapping); > return page; > } > > @@ -1831,7 +1832,7 @@ static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) > */ > pr_err("Invalid addr to data page mapping (dpi %u) on device %s\n", > dpi, udev->name); > - mutex_unlock(&udev->cmdr_lock); > + filemap_invalidate_unlock_shared(mapping); > > return NULL; > } > @@ -3111,6 +3112,7 @@ static void find_free_blocks(void) > loff_t off; > u32 pages_freed, total_pages_freed = 0; > u32 start, end, block, total_blocks_freed = 0; > + struct address_space *mapping; > > if (atomic_read(&global_page_count) <= tcmu_global_max_pages) > return; > @@ -3134,6 +3136,7 @@ static void find_free_blocks(void) > continue; > } > > + mapping = udev->inode->i_mapping; > end = udev->dbi_max + 1; > block = find_last_bit(udev->data_bitmap, end); > if (block == udev->dbi_max) { > @@ -3152,12 +3155,14 @@ static void find_free_blocks(void) > udev->dbi_max = block; > } > > + filemap_invalidate_lock(mapping); > /* Here will truncate the data area from off */ > off = udev->data_off + (loff_t)start * udev->data_blk_size; > - unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); > + unmap_mapping_range(mapping, off, 0, 1); > > /* Release the block pages */ > pages_freed = tcmu_blocks_release(udev, start, end - 1); > + filemap_invalidate_unlock(mapping); > mutex_unlock(&udev->cmdr_lock); > > total_pages_freed += pages_freed;
Sorry for the late response. Currently I'm quite busy. In your earlier mail you described a possible dead lock. With this patch applied, are you sure a similar deadlock cannot happen? Additionally, let's assume tcmu_vma_fault/tcmu_try_get_data_page - after having found a valid page to map - is interrupted after releasing the invalidate_lock. Are there any locks held to prevent find_free_blocks from jumping in and possibly remove that page from xarray and try to remove it from the mmapped area? If not, we might end up mapping a no longer valid page. Of course, this would be a long standing problem not caused by your change. But if there would be a problem, we should try to fix it when touching this code, I think. Unfortunately I didn't manage yet to check which locks are involved during page fault handling and unmap_mapping_range. Bodo On 16.03.22 11:43, Xiaoguang Wang wrote: > hello, > > Gentle ping. > > Regards, > Xiaoguang Wang > >> Currently tcmu_vma_fault() uses udev->cmdr_lock to avoid concurrent >> find_free_blocks(), which unmaps idle pages and truncates them. This >> work is really like many filesystem's truncate operations, but they >> use address_space->invalidate_lock to protect race. >> >> This patch replaces cmdr_lock with address_space->invalidate_lock in >> tcmu fault procedure, which will also make page-fault have concurrency. >> >> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> >> --- >> drivers/target/target_core_user.c | 13 +++++++++---- >> 1 file changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/target/target_core_user.c >> b/drivers/target/target_core_user.c >> index 06a5c4086551..e0a62623ccd7 100644 >> --- a/drivers/target/target_core_user.c >> +++ b/drivers/target/target_core_user.c >> @@ -1815,13 +1815,14 @@ static int tcmu_find_mem_index(struct >> vm_area_struct *vma) >> static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, >> uint32_t dpi) >> { >> + struct address_space *mapping = udev->inode->i_mapping; >> struct page *page; >> - mutex_lock(&udev->cmdr_lock); >> + filemap_invalidate_lock_shared(mapping); >> page = xa_load(&udev->data_pages, dpi); >> if (likely(page)) { >> get_page(page); >> - mutex_unlock(&udev->cmdr_lock); >> + filemap_invalidate_unlock_shared(mapping); >> return page; >> } >> @@ -1831,7 +1832,7 @@ static struct page >> *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) >> */ >> pr_err("Invalid addr to data page mapping (dpi %u) on device %s\n", >> dpi, udev->name); >> - mutex_unlock(&udev->cmdr_lock); >> + filemap_invalidate_unlock_shared(mapping); >> return NULL; >> } >> @@ -3111,6 +3112,7 @@ static void find_free_blocks(void) >> loff_t off; >> u32 pages_freed, total_pages_freed = 0; >> u32 start, end, block, total_blocks_freed = 0; >> + struct address_space *mapping; >> if (atomic_read(&global_page_count) <= tcmu_global_max_pages) >> return; >> @@ -3134,6 +3136,7 @@ static void find_free_blocks(void) >> continue; >> } >> + mapping = udev->inode->i_mapping; >> end = udev->dbi_max + 1; >> block = find_last_bit(udev->data_bitmap, end); >> if (block == udev->dbi_max) { >> @@ -3152,12 +3155,14 @@ static void find_free_blocks(void) >> udev->dbi_max = block; >> } >> + filemap_invalidate_lock(mapping); >> /* Here will truncate the data area from off */ >> off = udev->data_off + (loff_t)start * udev->data_blk_size; >> - unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); >> + unmap_mapping_range(mapping, off, 0, 1); >> /* Release the block pages */ >> pages_freed = tcmu_blocks_release(udev, start, end - 1); >> + filemap_invalidate_unlock(mapping); >> mutex_unlock(&udev->cmdr_lock); >> total_pages_freed += pages_freed; >
hi, > Sorry for the late response. Currently I'm quite busy. Really never mind :) > > In your earlier mail you described a possible dead lock. > With this patch applied, are you sure a similar deadlock cannot > happen? AFAIK, this patch will solve the deadlock. > > Additionally, let's assume tcmu_vma_fault/tcmu_try_get_data_page > - after having found a valid page to map - is interrupted after > releasing the invalidate_lock. Are there any locks held to prevent > find_free_blocks from jumping in and possibly remove that page from > xarray and try to remove it from the mmapped area? > If not, we might end up mapping a no longer valid page. Yeah, after tcmu_try_get_data_page() returns, find_free_blocks() definitely may come in and do unmap_mapping_range() and tcmu_blocks_release(), but I think it won't cause problems: 1) Since page fault procedure and unmap_mapping_range are designed to be able to run concurrently, they sync at pte_offset_map_lock(). See => do_user_addr_fault ==> handle_mm_fault ===> __handle_mm_fault ====> do_fault =====> do_shared_fault =======> finish_fault ========> pte_offset_map_lock ========> do_set_pte ========> pte_unmap_unlock and in find_free_blocks(): => unmap_mapping_range == > unmap_mapping_range_tree ===> zap_page_range_single ====> unmap_page_range =====> zap_p4d_range ======> zap_pud_range ========> zap_pmd_range ==========> zap_pte_range ===========> pte_offset_map_lock ===========> pte_clear_not_present_full ===========> pte_unmap_unlock(start_pte, ptl); So what I want to express is that because of the concurrency of page fault procedure and unmap_mapping_range(), one will either see a valid map, or not. And if not, because this page exceeds dbi_max, a later page fault will happen, and will get sigbus, but it's reasonable. As for your question, tcmu_try_get_data_page() finds a page successfully, this page will get a refcount properly, if later unmap_mapping_range() and tcmu_blocks_release() come in, just after tcmu_try_get_data_page() returns and before tcmu_vma_fault() returns, then actually tcmu_blocks_release() won't free this page because there is one refcount. So yes, we'll map a no longer valid page, but this page also won't be re-used, unless the map is unmapped later(process exits or killed), then put_page() will be called and page will finally be given back to mm subsystem. > > Of course, this would be a long standing problem not caused by your > change. But if there would be a problem, we should try to fix it > when touching this code, I think. > Unfortunately I didn't manage yet to check which locks are involved > during page fault handling and unmap_mapping_range. At least for my knowledge, page fault will hold mmap_read_lock() and pte lock, unmap_mapping_range() will hold mapping->i_mmap_rwsem and pte lock. Regards, Xiaoguang Wang > > Bodo > > On 16.03.22 11:43, Xiaoguang Wang wrote: >> hello, >> >> Gentle ping. >> >> Regards, >> Xiaoguang Wang >> >>> Currently tcmu_vma_fault() uses udev->cmdr_lock to avoid concurrent >>> find_free_blocks(), which unmaps idle pages and truncates them. This >>> work is really like many filesystem's truncate operations, but they >>> use address_space->invalidate_lock to protect race. >>> >>> This patch replaces cmdr_lock with address_space->invalidate_lock in >>> tcmu fault procedure, which will also make page-fault have concurrency. >>> >>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> >>> --- >>> drivers/target/target_core_user.c | 13 +++++++++---- >>> 1 file changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/target/target_core_user.c >>> b/drivers/target/target_core_user.c >>> index 06a5c4086551..e0a62623ccd7 100644 >>> --- a/drivers/target/target_core_user.c >>> +++ b/drivers/target/target_core_user.c >>> @@ -1815,13 +1815,14 @@ static int tcmu_find_mem_index(struct >>> vm_area_struct *vma) >>> static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, >>> uint32_t dpi) >>> { >>> + struct address_space *mapping = udev->inode->i_mapping; >>> struct page *page; >>> - mutex_lock(&udev->cmdr_lock); >>> + filemap_invalidate_lock_shared(mapping); >>> page = xa_load(&udev->data_pages, dpi); >>> if (likely(page)) { >>> get_page(page); >>> - mutex_unlock(&udev->cmdr_lock); >>> + filemap_invalidate_unlock_shared(mapping); >>> return page; >>> } >>> @@ -1831,7 +1832,7 @@ static struct page >>> *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) >>> */ >>> pr_err("Invalid addr to data page mapping (dpi %u) on device >>> %s\n", >>> dpi, udev->name); >>> - mutex_unlock(&udev->cmdr_lock); >>> + filemap_invalidate_unlock_shared(mapping); >>> return NULL; >>> } >>> @@ -3111,6 +3112,7 @@ static void find_free_blocks(void) >>> loff_t off; >>> u32 pages_freed, total_pages_freed = 0; >>> u32 start, end, block, total_blocks_freed = 0; >>> + struct address_space *mapping; >>> if (atomic_read(&global_page_count) <= tcmu_global_max_pages) >>> return; >>> @@ -3134,6 +3136,7 @@ static void find_free_blocks(void) >>> continue; >>> } >>> + mapping = udev->inode->i_mapping; >>> end = udev->dbi_max + 1; >>> block = find_last_bit(udev->data_bitmap, end); >>> if (block == udev->dbi_max) { >>> @@ -3152,12 +3155,14 @@ static void find_free_blocks(void) >>> udev->dbi_max = block; >>> } >>> + filemap_invalidate_lock(mapping); >>> /* Here will truncate the data area from off */ >>> off = udev->data_off + (loff_t)start * udev->data_blk_size; >>> - unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); >>> + unmap_mapping_range(mapping, off, 0, 1); >>> /* Release the block pages */ >>> pages_freed = tcmu_blocks_release(udev, start, end - 1); >>> + filemap_invalidate_unlock(mapping); >>> mutex_unlock(&udev->cmdr_lock); >>> total_pages_freed += pages_freed; >>
hi, > hi, > >> Sorry for the late response. Currently I'm quite busy. > Really never mind :) > >> >> In your earlier mail you described a possible dead lock. >> With this patch applied, are you sure a similar deadlock cannot >> happen? > AFAIK, this patch will solve the deadlock. > >> >> Additionally, let's assume tcmu_vma_fault/tcmu_try_get_data_page >> - after having found a valid page to map - is interrupted after >> releasing the invalidate_lock. Are there any locks held to prevent >> find_free_blocks from jumping in and possibly remove that page from >> xarray and try to remove it from the mmapped area? >> If not, we might end up mapping a no longer valid page. > Yeah, after tcmu_try_get_data_page() returns, find_free_blocks() > definitely > may come in and do unmap_mapping_range() and tcmu_blocks_release(), > but I think it won't cause problems: > 1) Since page fault procedure and unmap_mapping_range are designed to > be able to run concurrently, they sync at pte_offset_map_lock(). See > => do_user_addr_fault > ==> handle_mm_fault > ===> __handle_mm_fault > ====> do_fault > =====> do_shared_fault > =======> finish_fault > ========> pte_offset_map_lock > ========> do_set_pte > ========> pte_unmap_unlock > > and in find_free_blocks(): > => unmap_mapping_range > == > unmap_mapping_range_tree > ===> zap_page_range_single > ====> unmap_page_range > =====> zap_p4d_range > ======> zap_pud_range > ========> zap_pmd_range > ==========> zap_pte_range > ===========> pte_offset_map_lock > ===========> pte_clear_not_present_full > ===========> pte_unmap_unlock(start_pte, ptl); > > So what I want to express is that because of the concurrency of page > fault > procedure and unmap_mapping_range(), one will either see a valid map, or > not. And if not, because this page exceeds dbi_max, a later page fault > will > happen, and will get sigbus, but it's reasonable. > > As for your question, tcmu_try_get_data_page() finds a page successfully, > this page will get a refcount properly, if later unmap_mapping_range() > and > tcmu_blocks_release() come in, just after tcmu_try_get_data_page() > returns and > before tcmu_vma_fault() returns, then actually tcmu_blocks_release() > won't > free this page because there is one refcount. So yes, we'll map a no > longer > valid page, but this page also won't be re-used, unless the map is > unmapped > later(process exits or killed), then put_page() will be called and > page will finally > be given back to mm subsystem. After thinking more about this problem, if we now have a valid map which points to a truncated page, and this offset of this page in data_bitmap is freed. If later another command runs in, it reuse the previous freed slot in data_bitmap. Though we'll allocate new page for this slot in data_area, but seems no page fault will happen again, because we have a valid map.. so real request's data will lose. As you say, indeed this would be a long standing problem, we'll need to have a deeper look at codes. Regards, Xiaoguang Wang > >> >> Of course, this would be a long standing problem not caused by your >> change. But if there would be a problem, we should try to fix it >> when touching this code, I think. >> Unfortunately I didn't manage yet to check which locks are involved >> during page fault handling and unmap_mapping_range. > At least for my knowledge, page fault will hold mmap_read_lock() and > pte lock, unmap_mapping_range() will hold mapping->i_mmap_rwsem > and pte lock. > > Regards, > Xiaoguang Wang >> >> Bodo >> >> On 16.03.22 11:43, Xiaoguang Wang wrote: >>> hello, >>> >>> Gentle ping. >>> >>> Regards, >>> Xiaoguang Wang >>> >>>> Currently tcmu_vma_fault() uses udev->cmdr_lock to avoid concurrent >>>> find_free_blocks(), which unmaps idle pages and truncates them. This >>>> work is really like many filesystem's truncate operations, but they >>>> use address_space->invalidate_lock to protect race. >>>> >>>> This patch replaces cmdr_lock with address_space->invalidate_lock in >>>> tcmu fault procedure, which will also make page-fault have >>>> concurrency. >>>> >>>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> >>>> --- >>>> drivers/target/target_core_user.c | 13 +++++++++---- >>>> 1 file changed, 9 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/target/target_core_user.c >>>> b/drivers/target/target_core_user.c >>>> index 06a5c4086551..e0a62623ccd7 100644 >>>> --- a/drivers/target/target_core_user.c >>>> +++ b/drivers/target/target_core_user.c >>>> @@ -1815,13 +1815,14 @@ static int tcmu_find_mem_index(struct >>>> vm_area_struct *vma) >>>> static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, >>>> uint32_t dpi) >>>> { >>>> + struct address_space *mapping = udev->inode->i_mapping; >>>> struct page *page; >>>> - mutex_lock(&udev->cmdr_lock); >>>> + filemap_invalidate_lock_shared(mapping); >>>> page = xa_load(&udev->data_pages, dpi); >>>> if (likely(page)) { >>>> get_page(page); >>>> - mutex_unlock(&udev->cmdr_lock); >>>> + filemap_invalidate_unlock_shared(mapping); >>>> return page; >>>> } >>>> @@ -1831,7 +1832,7 @@ static struct page >>>> *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) >>>> */ >>>> pr_err("Invalid addr to data page mapping (dpi %u) on device >>>> %s\n", >>>> dpi, udev->name); >>>> - mutex_unlock(&udev->cmdr_lock); >>>> + filemap_invalidate_unlock_shared(mapping); >>>> return NULL; >>>> } >>>> @@ -3111,6 +3112,7 @@ static void find_free_blocks(void) >>>> loff_t off; >>>> u32 pages_freed, total_pages_freed = 0; >>>> u32 start, end, block, total_blocks_freed = 0; >>>> + struct address_space *mapping; >>>> if (atomic_read(&global_page_count) <= tcmu_global_max_pages) >>>> return; >>>> @@ -3134,6 +3136,7 @@ static void find_free_blocks(void) >>>> continue; >>>> } >>>> + mapping = udev->inode->i_mapping; >>>> end = udev->dbi_max + 1; >>>> block = find_last_bit(udev->data_bitmap, end); >>>> if (block == udev->dbi_max) { >>>> @@ -3152,12 +3155,14 @@ static void find_free_blocks(void) >>>> udev->dbi_max = block; >>>> } >>>> + filemap_invalidate_lock(mapping); >>>> /* Here will truncate the data area from off */ >>>> off = udev->data_off + (loff_t)start * udev->data_blk_size; >>>> - unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); >>>> + unmap_mapping_range(mapping, off, 0, 1); >>>> /* Release the block pages */ >>>> pages_freed = tcmu_blocks_release(udev, start, end - 1); >>>> + filemap_invalidate_unlock(mapping); >>>> mutex_unlock(&udev->cmdr_lock); >>>> total_pages_freed += pages_freed; >>>
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 06a5c4086551..e0a62623ccd7 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -1815,13 +1815,14 @@ static int tcmu_find_mem_index(struct vm_area_struct *vma) static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) { + struct address_space *mapping = udev->inode->i_mapping; struct page *page; - mutex_lock(&udev->cmdr_lock); + filemap_invalidate_lock_shared(mapping); page = xa_load(&udev->data_pages, dpi); if (likely(page)) { get_page(page); - mutex_unlock(&udev->cmdr_lock); + filemap_invalidate_unlock_shared(mapping); return page; } @@ -1831,7 +1832,7 @@ static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi) */ pr_err("Invalid addr to data page mapping (dpi %u) on device %s\n", dpi, udev->name); - mutex_unlock(&udev->cmdr_lock); + filemap_invalidate_unlock_shared(mapping); return NULL; } @@ -3111,6 +3112,7 @@ static void find_free_blocks(void) loff_t off; u32 pages_freed, total_pages_freed = 0; u32 start, end, block, total_blocks_freed = 0; + struct address_space *mapping; if (atomic_read(&global_page_count) <= tcmu_global_max_pages) return; @@ -3134,6 +3136,7 @@ static void find_free_blocks(void) continue; } + mapping = udev->inode->i_mapping; end = udev->dbi_max + 1; block = find_last_bit(udev->data_bitmap, end); if (block == udev->dbi_max) { @@ -3152,12 +3155,14 @@ static void find_free_blocks(void) udev->dbi_max = block; } + filemap_invalidate_lock(mapping); /* Here will truncate the data area from off */ off = udev->data_off + (loff_t)start * udev->data_blk_size; - unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); + unmap_mapping_range(mapping, off, 0, 1); /* Release the block pages */ pages_freed = tcmu_blocks_release(udev, start, end - 1); + filemap_invalidate_unlock(mapping); mutex_unlock(&udev->cmdr_lock); total_pages_freed += pages_freed;
Currently tcmu_vma_fault() uses udev->cmdr_lock to avoid concurrent find_free_blocks(), which unmaps idle pages and truncates them. This work is really like many filesystem's truncate operations, but they use address_space->invalidate_lock to protect race. This patch replaces cmdr_lock with address_space->invalidate_lock in tcmu fault procedure, which will also make page-fault have concurrency. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> --- drivers/target/target_core_user.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)