Message ID | 20250603221949.53272-1-Smita.KoralahalliChannabasappa@amd.com |
---|---|
Headers | show |
Series | Add managed SOFT RESERVE resource handling | expand |
Smita, Thanks for your awesome work. I just tested the scenarios you listed, and they work as expected. Thanks again. (Minor comments inlined) Tested-by: Li Zhijian <lizhijian@fujitsu.com> To the CXL community, The scenarios mentioned here essentially cover what a correct firmware may provide. However, I would like to discuss one more scenario that I can simulate with a modified QEMU: The E820 exposes a SOFT RESERVED region which is the same as a CFMW, but the HDM decoders are not committed. This means no region will be auto-created during boot. As an example, after boot, the iomem tree is as follows: 1050000000-304fffffff : CXL Window 0 1050000000-304fffffff : Soft Reserved <No region> In this case, the SOFT RESERVED resource is not trimmed, so the end-user cannot create a new region. My question is: Is this scenario a problem? If it is, should we fix it in this patchset or create a new patch? On 04/06/2025 06:19, Smita Koralahalli wrote: > Add the ability to manage SOFT RESERVE iomem resources prior to them being > added to the iomem resource tree. This allows drivers, such as CXL, to > remove any pieces of the SOFT RESERVE resource that intersect with created > CXL regions. > > The current approach of leaving the SOFT RESERVE resources as is can cause > failures during hotplug of devices, such as CXL, because the resource is > not available for reuse after teardown of the device. > > The approach is to add SOFT RESERVE resources to a separate tree during > boot. No special tree at all since V3 > This allows any drivers to update the SOFT RESERVE resources before > they are merged into the iomem resource tree. In addition a notifier chain > is added so that drivers can be notified when these SOFT RESERVE resources > are added to the ioeme resource tree. > > The CXL driver is modified to use a worker thread that waits for the CXL > PCI and CXL mem drivers to be loaded and for their probe routine to > complete. Then the driver walks through any created CXL regions to trim any > intersections with SOFT RESERVE resources in the iomem tree. > > The dax driver uses the new soft reserve notifier chain so it can consume > any remaining SOFT RESERVES once they're added to the iomem tree. > > The following scenarios have been tested: > > Example 1: Exact alignment, soft reserved is a child of the region > > |---------- "Soft Reserved" -----------| > |-------------- "Region #" ------------| > > Before: > 1050000000-304fffffff : CXL Window 0 > 1050000000-304fffffff : region0 > 1050000000-304fffffff : Soft Reserved > 1080000000-2fffffffff : dax0.0 BTW, I'm curious how to set up a dax with an address range different from its corresponding region. > 1080000000-2fffffffff : System RAM (kmem) > > After: > 1050000000-304fffffff : CXL Window 0 > 1050000000-304fffffff : region1 > 1080000000-2fffffffff : dax0.0 > 1080000000-2fffffffff : System RAM (kmem) > > Example 2: Start and/or end aligned and soft reserved spans multiple > regions Tested > > |----------- "Soft Reserved" -----------| > |-------- "Region #" -------| > or > |----------- "Soft Reserved" -----------| > |-------- "Region #" -------| Typo? should be: |----------- "Soft Reserved" -----------| |-------- "Region #" -------| > > Example 3: No alignment > |---------- "Soft Reserved" ----------| > |---- "Region #" ----| Tested. Thanks Zhijian
On 04/06/2025 06:19, Smita Koralahalli wrote: > Introduce a pci_loaded flag similar to mem_active, and define > mark_cxl_pci_loaded() to indicate when the PCI driver has initialized. > > This will be used by other CXL components, such as cxl_acpi and the soft > reserved resource handling logic, to coordinate initialization and ensure > that dependent operations proceed only after both cxl_pci and cxl_mem > drivers are ready. > > Co-developed-by: Nathan Fontenot <Nathan.Fontenot@amd.com> > Signed-off-by: Nathan Fontenot <Nathan.Fontenot@amd.com> > Co-developed-by: Terry Bowman <terry.bowman@amd.com> > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> > --- > drivers/cxl/core/suspend.c | 8 ++++++++ > drivers/cxl/cxlpci.h | 1 + > drivers/cxl/pci.c | 2 ++ > 3 files changed, 11 insertions(+) > > diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c > index 5ba4b4de0e33..72818a2c8ec8 100644 > --- a/drivers/cxl/core/suspend.c > +++ b/drivers/cxl/core/suspend.c > @@ -3,8 +3,10 @@ > #include <linux/atomic.h> > #include <linux/export.h> > #include "cxlmem.h" > +#include "cxlpci.h" > > static atomic_t mem_active; > +static atomic_t pci_loaded; I find it odd to place these changes in suspend.c. Also, I noticed that in a subsequent patch, DJ has mentioned (and I agree) that this patch is unnecessary. Thanks Zhijian > > bool cxl_mem_active(void) > { > @@ -25,3 +27,9 @@ void cxl_mem_active_dec(void) > atomic_dec(&mem_active); > } > EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL"); > + > +void mark_cxl_pci_loaded(void) > +{ > + atomic_inc(&pci_loaded); > +} > +EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL"); > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h > index 54e219b0049e..5a811ac63fcf 100644 > --- a/drivers/cxl/cxlpci.h > +++ b/drivers/cxl/cxlpci.h > @@ -135,4 +135,5 @@ void read_cdat_data(struct cxl_port *port); > void cxl_cor_error_detected(struct pci_dev *pdev); > pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, > pci_channel_state_t state); > +void mark_cxl_pci_loaded(void); > #endif /* __CXL_PCI_H__ */ > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 785aa2af5eaa..b019bd324dba 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -1189,6 +1189,8 @@ static int __init cxl_pci_driver_init(void) > if (rc) > pci_unregister_driver(&cxl_pci_driver); > > + mark_cxl_pci_loaded(); > + > return rc; > } >
On 04/06/2025 06:19, Smita Koralahalli wrote: > drivers/cxl/acpi.c | 23 +++++++++++++++++++++++ > drivers/cxl/core/suspend.c | 21 +++++++++++++++++++++ > drivers/cxl/cxl.h | 2 ++ > 3 files changed, 46 insertions(+) > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index cb14829bb9be..978f63b32b41 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -813,6 +813,24 @@ static int pair_cxl_resource(struct device *dev, void *data) > return 0; > } > > +static void cxl_softreserv_mem_work_fn(struct work_struct *work) > +{ > + /* Wait for cxl_pci and cxl_mem drivers to load */ > + cxl_wait_for_pci_mem(); > + > + /* > + * Wait for the driver probe routines to complete after cxl_pci > + * and cxl_mem drivers are loaded. > + */ > + wait_for_device_probe(); > +} > +static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn); > + > +static void cxl_softreserv_mem_update(void) > +{ > + schedule_work(&cxl_sr_work); > +} > + > static int cxl_acpi_probe(struct platform_device *pdev) > { > int rc; > @@ -887,6 +905,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) > > /* In case PCI is scanned before ACPI re-trigger memdev attach */ > cxl_bus_rescan(); > + > + /* Update SOFT RESERVE resources that intersect with CXL regions */ > + cxl_softreserv_mem_update(); > + > return 0; > } > > @@ -918,6 +940,7 @@ static int __init cxl_acpi_init(void) > > static void __exit cxl_acpi_exit(void) > { > + cancel_work_sync(&cxl_sr_work); > platform_driver_unregister(&cxl_acpi_driver); > cxl_bus_drain(); > } > diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c > index 72818a2c8ec8..c0d8f70aed56 100644 > --- a/drivers/cxl/core/suspend.c > +++ b/drivers/cxl/core/suspend.c > @@ -2,12 +2,15 @@ > /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ > #include <linux/atomic.h> > #include <linux/export.h> > +#include <linux/wait.h> > #include "cxlmem.h" > #include "cxlpci.h" > > static atomic_t mem_active; > static atomic_t pci_loaded; > > +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue); Given that this file (suspend.c) focuses on power management functions, it might be more appropriate to move the wait queue declaration and its related changes to acpi.c in where the its caller is. Thanks Zhijian > + > bool cxl_mem_active(void) > { > if (IS_ENABLED(CONFIG_CXL_MEM)) > @@ -19,6 +22,7 @@ bool cxl_mem_active(void) > void cxl_mem_active_inc(void) > { > atomic_inc(&mem_active); > + wake_up(&cxl_wait_queue); > } > EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, "CXL"); > > @@ -28,8 +32,25 @@ void cxl_mem_active_dec(void) > } > EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL"); > > +static bool cxl_pci_loaded(void) > +{ > + if (IS_ENABLED(CONFIG_CXL_PCI)) > + return atomic_read(&pci_loaded) != 0; > + > + return false; > +} > + > void mark_cxl_pci_loaded(void) > { > atomic_inc(&pci_loaded); > + wake_up(&cxl_wait_queue); > } > EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL"); > + > +void cxl_wait_for_pci_mem(void) > +{ > + if (!wait_event_timeout(cxl_wait_queue, cxl_pci_loaded() && > + cxl_mem_active(), 30 * HZ)) > + pr_debug("Timeout waiting for cxl_pci or cxl_mem probing\n"); > +}
On 6/4/25 2:40 AM, Zhijian Li (Fujitsu) wrote: > > > On 04/06/2025 06:19, Smita Koralahalli wrote: >> drivers/cxl/acpi.c | 23 +++++++++++++++++++++++ >> drivers/cxl/core/suspend.c | 21 +++++++++++++++++++++ >> drivers/cxl/cxl.h | 2 ++ >> 3 files changed, 46 insertions(+) >> >> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c >> index cb14829bb9be..978f63b32b41 100644 >> --- a/drivers/cxl/acpi.c >> +++ b/drivers/cxl/acpi.c >> @@ -813,6 +813,24 @@ static int pair_cxl_resource(struct device *dev, void *data) >> return 0; >> } >> >> +static void cxl_softreserv_mem_work_fn(struct work_struct *work) >> +{ >> + /* Wait for cxl_pci and cxl_mem drivers to load */ >> + cxl_wait_for_pci_mem(); >> + >> + /* >> + * Wait for the driver probe routines to complete after cxl_pci >> + * and cxl_mem drivers are loaded. >> + */ >> + wait_for_device_probe(); >> +} >> +static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn); >> + >> +static void cxl_softreserv_mem_update(void) >> +{ >> + schedule_work(&cxl_sr_work); >> +} >> + >> static int cxl_acpi_probe(struct platform_device *pdev) >> { >> int rc; >> @@ -887,6 +905,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) >> >> /* In case PCI is scanned before ACPI re-trigger memdev attach */ >> cxl_bus_rescan(); >> + >> + /* Update SOFT RESERVE resources that intersect with CXL regions */ >> + cxl_softreserv_mem_update(); >> + >> return 0; >> } >> >> @@ -918,6 +940,7 @@ static int __init cxl_acpi_init(void) >> >> static void __exit cxl_acpi_exit(void) >> { >> + cancel_work_sync(&cxl_sr_work); >> platform_driver_unregister(&cxl_acpi_driver); >> cxl_bus_drain(); >> } >> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c >> index 72818a2c8ec8..c0d8f70aed56 100644 >> --- a/drivers/cxl/core/suspend.c >> +++ b/drivers/cxl/core/suspend.c >> @@ -2,12 +2,15 @@ >> /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ >> #include <linux/atomic.h> >> #include <linux/export.h> >> +#include <linux/wait.h> >> #include "cxlmem.h" >> #include "cxlpci.h" >> >> static atomic_t mem_active; >> static atomic_t pci_loaded; >> >> +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue); > > Given that this file (suspend.c) focuses on power management functions, > it might be more appropriate to move the wait queue declaration and its > related changes to acpi.c in where the its caller is. You mean drivers/cxl/acpi.c and not drivers/cxl/core/acpi.c right? The core one is my mistake and I'm going to create a patch to remove that. DJ > > > Thanks > Zhijian > >> + >> bool cxl_mem_active(void) >> { >> if (IS_ENABLED(CONFIG_CXL_MEM)) >> @@ -19,6 +22,7 @@ bool cxl_mem_active(void) >> void cxl_mem_active_inc(void) >> { >> atomic_inc(&mem_active); >> + wake_up(&cxl_wait_queue); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, "CXL"); >> >> @@ -28,8 +32,25 @@ void cxl_mem_active_dec(void) >> } >> EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL"); >> >> +static bool cxl_pci_loaded(void) >> +{ >> + if (IS_ENABLED(CONFIG_CXL_PCI)) >> + return atomic_read(&pci_loaded) != 0; >> + >> + return false; >> +} >> + >> void mark_cxl_pci_loaded(void) >> { >> atomic_inc(&pci_loaded); >> + wake_up(&cxl_wait_queue); >> } >> EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL"); >> + >> +void cxl_wait_for_pci_mem(void) >> +{ >> + if (!wait_event_timeout(cxl_wait_queue, cxl_pci_loaded() && >> + cxl_mem_active(), 30 * HZ)) >> + pr_debug("Timeout waiting for cxl_pci or cxl_mem probing\n"); >> +}
On 6/3/2025 4:45 PM, Dave Jiang wrote: > > > On 6/3/25 3:19 PM, Smita Koralahalli wrote: >> Introduce a waitqueue mechanism to coordinate initialization between the >> cxl_pci and cxl_mem drivers. >> >> Launch a background worker from cxl_acpi_probe() that waits for both >> drivers to complete initialization before invoking wait_for_device_probe(). >> Without this, the probe completion wait could begin prematurely, before >> the drivers are present, leading to missed updates. >> >> Co-developed-by: Nathan Fontenot <Nathan.Fontenot@amd.com> >> Signed-off-by: Nathan Fontenot <Nathan.Fontenot@amd.com> >> Co-developed-by: Terry Bowman <terry.bowman@amd.com> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> >> --- >> drivers/cxl/acpi.c | 23 +++++++++++++++++++++++ >> drivers/cxl/core/suspend.c | 21 +++++++++++++++++++++ >> drivers/cxl/cxl.h | 2 ++ >> 3 files changed, 46 insertions(+) >> >> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c >> index cb14829bb9be..978f63b32b41 100644 >> --- a/drivers/cxl/acpi.c >> +++ b/drivers/cxl/acpi.c >> @@ -813,6 +813,24 @@ static int pair_cxl_resource(struct device *dev, void *data) >> return 0; >> } >> >> +static void cxl_softreserv_mem_work_fn(struct work_struct *work) >> +{ >> + /* Wait for cxl_pci and cxl_mem drivers to load */ >> + cxl_wait_for_pci_mem(); >> + >> + /* >> + * Wait for the driver probe routines to complete after cxl_pci >> + * and cxl_mem drivers are loaded. >> + */ >> + wait_for_device_probe(); >> +} >> +static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn); >> + >> +static void cxl_softreserv_mem_update(void) >> +{ >> + schedule_work(&cxl_sr_work); >> +} >> + >> static int cxl_acpi_probe(struct platform_device *pdev) >> { >> int rc; >> @@ -887,6 +905,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) >> >> /* In case PCI is scanned before ACPI re-trigger memdev attach */ >> cxl_bus_rescan(); >> + >> + /* Update SOFT RESERVE resources that intersect with CXL regions */ >> + cxl_softreserv_mem_update(); >> + >> return 0; >> } >> >> @@ -918,6 +940,7 @@ static int __init cxl_acpi_init(void) >> >> static void __exit cxl_acpi_exit(void) >> { >> + cancel_work_sync(&cxl_sr_work); >> platform_driver_unregister(&cxl_acpi_driver); >> cxl_bus_drain(); >> } >> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c >> index 72818a2c8ec8..c0d8f70aed56 100644 >> --- a/drivers/cxl/core/suspend.c >> +++ b/drivers/cxl/core/suspend.c >> @@ -2,12 +2,15 @@ >> /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ >> #include <linux/atomic.h> >> #include <linux/export.h> >> +#include <linux/wait.h> >> #include "cxlmem.h" >> #include "cxlpci.h" >> >> static atomic_t mem_active; >> static atomic_t pci_loaded; >> >> +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue); >> + >> bool cxl_mem_active(void) >> { >> if (IS_ENABLED(CONFIG_CXL_MEM)) >> @@ -19,6 +22,7 @@ bool cxl_mem_active(void) >> void cxl_mem_active_inc(void) >> { >> atomic_inc(&mem_active); >> + wake_up(&cxl_wait_queue); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, "CXL"); >> >> @@ -28,8 +32,25 @@ void cxl_mem_active_dec(void) >> } >> EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL"); >> >> +static bool cxl_pci_loaded(void) >> +{ >> + if (IS_ENABLED(CONFIG_CXL_PCI)) >> + return atomic_read(&pci_loaded) != 0; >> + >> + return false; >> +} >> + >> void mark_cxl_pci_loaded(void) >> { >> atomic_inc(&pci_loaded); >> + wake_up(&cxl_wait_queue); >> } >> EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL"); >> + >> +void cxl_wait_for_pci_mem(void) >> +{ >> + if (!wait_event_timeout(cxl_wait_queue, cxl_pci_loaded() && >> + cxl_mem_active(), 30 * HZ)) > > I'm trying to understand why cxl_pci_loaded() is needed. cxl_mem_active() goes above 0 when a cxl_mem_probe() instance succeeds. cxl_mem_probe() being triggered implies that an instance of cxl_pci_probe() has been called since cxl_mem_probe() is triggered from devm_cxl_add_memdev() with memdev being added and cxl_mem driver also have been loaded. So does cxl_mem_active() not imply cxl_pci_loaded() and makes it unnecessary? Yeah you are right. I will remove this check. Thanks Smita > > DJ > > >> + pr_debug("Timeout waiting for cxl_pci or cxl_mem probing\n"); >> +} >> +EXPORT_SYMBOL_NS_GPL(cxl_wait_for_pci_mem, "CXL"); >> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h >> index a9ab46eb0610..1ba7d39c2991 100644 >> --- a/drivers/cxl/cxl.h >> +++ b/drivers/cxl/cxl.h >> @@ -902,6 +902,8 @@ void cxl_coordinates_combine(struct access_coordinate *out, >> >> bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port); >> >> +void cxl_wait_for_pci_mem(void); >> + >> /* >> * Unit test builds overrides this to __weak, find the 'strong' version >> * of these symbols in tools/testing/cxl/. >
On 04/06/2025 22:35, Dave Jiang wrote: >>> >>> +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue); >> Given that this file (suspend.c) focuses on power management functions, >> it might be more appropriate to move the wait queue declaration and its >> related changes to acpi.c in where the its caller is. > You mean drivers/cxl/acpi.c and not drivers/cxl/core/acpi.c right? Yes > The core one is my mistake and I'm going to create a patch to remove that. Completely missed its existence - thank you for catching and cleaning that up! Thanks Zhijian > > DJ