Message ID | 20191108152508.4039168-1-jean-philippe@linaro.org |
---|---|
Headers | show |
Series | iommu: Add PASID support to Arm SMMUv3 | expand |
On Fri, 8 Nov 2019 16:25:04 +0100 Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > When a master supports substream ID, allocate a table with multiple > context descriptors for its stage-1 domain. For the moment S1CDMax is > still 0 in the STE, so the additional context descriptors are ignored. > > Context descriptor tables are allocated once for the first master attached > to a domain. Therefore attaching multiple devices with different SSID > sizes is tricky, and we currently don't support it. > > As a future improvement it would be nice to at least support attaching a > SSID-capable device to a domain that isn't using SSID, by reallocating the > SSID table. This would allow supporting a SSID-capable device that is in > the same IOMMU group as a bridge, for example. Varying SSID size is less > of a concern, since the PCIe specification "highly recommends" that > devices supporting PASID implement all 20 bits of it. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Hmm. There are several different refactors in here alongside a few new bits. Would be nice to break it up more to make life even easier for reviewers. It's not 'so' complex that it's really a problem though so could leave it as is if you really want to. One carry over inline on zeroing a coherent allocation... > --- > drivers/iommu/arm-smmu-v3.c | 117 ++++++++++++++++++++++++++---------- > 1 file changed, 85 insertions(+), 32 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 33488da8f742..122bed0168a3 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -553,16 +553,22 @@ struct arm_smmu_strtab_l1_desc { > dma_addr_t l2ptr_dma; > }; > > +struct arm_smmu_ctx_desc { > + u16 asid; > + u64 ttbr; > + u64 tcr; > + u64 mair; > +}; > + > +struct arm_smmu_cd_table { > + __le64 *ptr; > + dma_addr_t ptr_dma; > +}; > + > struct arm_smmu_s1_cfg { > - __le64 *cdptr; > - dma_addr_t cdptr_dma; > - > - struct arm_smmu_ctx_desc { > - u16 asid; > - u64 ttbr; > - u64 tcr; > - u64 mair; > - } cd; > + u8 s1cdmax; > + struct arm_smmu_cd_table table; > + struct arm_smmu_ctx_desc cd; It might have been a tiny bit nicer to have a precursor patch that did the change to a pair of structs. Then only functional changes would be in here. > }; > > struct arm_smmu_s2_cfg { > @@ -1450,6 +1456,31 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) > } > > /* Context descriptor manipulation functions */ > +static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, > + struct arm_smmu_cd_table *table, > + size_t num_entries) > +{ > + size_t size = num_entries * (CTXDESC_CD_DWORDS << 3); > + > + table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma, > + GFP_KERNEL | __GFP_ZERO); We dropped dma_zalloc_coherent because we now zero in dma_alloc_coherent anyway. Hence I'm fairly sure that __GFP_ZERO should have no effect. https://lore.kernel.org/patchwork/patch/1031536/ Am I missing some special corner case here? > + if (!table->ptr) { > + dev_warn(smmu->dev, > + "failed to allocate context descriptor table\n"); > + return -ENOMEM; > + } > + return 0; > +} > + > +static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu, > + struct arm_smmu_cd_table *table, > + size_t num_entries) > +{ > + size_t size = num_entries * (CTXDESC_CD_DWORDS << 3); > + > + dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma); > +} > + > static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr) > { > u64 val = 0; > @@ -1471,6 +1502,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu, > struct arm_smmu_s1_cfg *cfg) > { > u64 val; > + __le64 *cdptr = cfg->table.ptr; The changes in here would all be in purely mechanical refactor of the structure patch. > > /* > * We don't need to issue any invalidation here, as we'll invalidate > @@ -1488,12 +1520,29 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu, > if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE) > val |= CTXDESC_CD_0_S; > > - cfg->cdptr[0] = cpu_to_le64(val); > + cdptr[0] = cpu_to_le64(val); > > val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK; > - cfg->cdptr[1] = cpu_to_le64(val); > + cdptr[1] = cpu_to_le64(val); > > - cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair); > + cdptr[3] = cpu_to_le64(cfg->cd.mair); > +} > + > +static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain) > +{ > + struct arm_smmu_device *smmu = smmu_domain->smmu; > + struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > + > + return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, > + 1 << cfg->s1cdmax); > +} > + > +static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain) > +{ > + struct arm_smmu_device *smmu = smmu_domain->smmu; > + struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > + > + arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax); > } > > /* Stream table manipulation functions */ > @@ -1624,7 +1673,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, > !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) > dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); > > - val |= (s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) | > + val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) | > FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS); > } > > @@ -2138,12 +2187,8 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) > if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { > struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > > - if (cfg->cdptr) { > - dmam_free_coherent(smmu_domain->smmu->dev, > - CTXDESC_CD_DWORDS << 3, > - cfg->cdptr, > - cfg->cdptr_dma); > - > + if (cfg->table.ptr) { > + arm_smmu_free_cd_tables(smmu_domain); > arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid); > } > } else { > @@ -2156,6 +2201,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) > } > > static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, > + struct arm_smmu_master *master, > struct io_pgtable_cfg *pgtbl_cfg) > { > int ret; > @@ -2167,19 +2213,19 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, > if (asid < 0) > return asid; > > - cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3, > - &cfg->cdptr_dma, > - GFP_KERNEL | __GFP_ZERO); > - if (!cfg->cdptr) { > - dev_warn(smmu->dev, "failed to allocate context descriptor\n"); > - ret = -ENOMEM; > + cfg->s1cdmax = master->ssid_bits; > + > + ret = arm_smmu_alloc_cd_tables(smmu_domain); > + if (ret) > goto out_free_asid; > - } > > cfg->cd.asid = (u16)asid; > cfg->cd.ttbr = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0]; > cfg->cd.tcr = pgtbl_cfg->arm_lpae_s1_cfg.tcr; > cfg->cd.mair = pgtbl_cfg->arm_lpae_s1_cfg.mair[0]; > + > + arm_smmu_write_ctx_desc(smmu, cfg); > + > return 0; > > out_free_asid: > @@ -2188,6 +2234,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, > } > > static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, > + struct arm_smmu_master *master, > struct io_pgtable_cfg *pgtbl_cfg) > { > int vmid; > @@ -2204,7 +2251,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, > return 0; > } > > -static int arm_smmu_domain_finalise(struct iommu_domain *domain) > +static int arm_smmu_domain_finalise(struct iommu_domain *domain, > + struct arm_smmu_master *master) > { > int ret; > unsigned long ias, oas; > @@ -2212,6 +2260,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) > struct io_pgtable_cfg pgtbl_cfg; > struct io_pgtable_ops *pgtbl_ops; > int (*finalise_stage_fn)(struct arm_smmu_domain *, > + struct arm_smmu_master *, > struct io_pgtable_cfg *); > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > struct arm_smmu_device *smmu = smmu_domain->smmu; > @@ -2266,7 +2315,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) > domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; > domain->geometry.force_aperture = true; > > - ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg); > + ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg); > if (ret < 0) { > free_io_pgtable_ops(pgtbl_ops); > return ret; > @@ -2419,7 +2468,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) > > if (!smmu_domain->smmu) { > smmu_domain->smmu = smmu; > - ret = arm_smmu_domain_finalise(domain); > + ret = arm_smmu_domain_finalise(domain, master); > if (ret) { > smmu_domain->smmu = NULL; > goto out_unlock; > @@ -2431,6 +2480,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) > dev_name(smmu->dev)); > ret = -ENXIO; > goto out_unlock; > + } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && > + master->ssid_bits != smmu_domain->s1_cfg.s1cdmax) { > + dev_err(dev, > + "cannot attach to incompatible domain (%u SSID bits != %u)\n", > + smmu_domain->s1_cfg.s1cdmax, master->ssid_bits); > + ret = -EINVAL; > + goto out_unlock; > } > > master->domain = smmu_domain; > @@ -2438,9 +2494,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) > if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS) > master->ats_enabled = arm_smmu_ats_supported(master); > > - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) > - arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg); > - Whilst it seems fine, perhaps a note on the 'why' of moving this into finalise_s1 would be good in the patch description. > arm_smmu_install_ste_for_dev(master); > > spin_lock_irqsave(&smmu_domain->devices_lock, flags);
On Fri, 8 Nov 2019 16:25:06 +0100 Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > The SMMU can support up to 20 bits of SSID. Add a second level of page > tables to accommodate this. Devices that support more than 1024 SSIDs now > have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context > descriptors (64kB), allocated on demand. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Hi Jean-Philippe, There seems to be a disconnect in here between clearing by hand device managed entities, which normally implies we'll reallocate them later, and clearing pointers that are used in the control flow of allocation. I'm looking at this a bit in isolation so I'm not quite sure on how they are used. > --- > drivers/iommu/arm-smmu-v3.c | 137 +++++++++++++++++++++++++++++++++--- > 1 file changed, 126 insertions(+), 11 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index df7d45503c65..82eac89ee187 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -224,6 +224,7 @@ > > #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4) > #define STRTAB_STE_0_S1FMT_LINEAR 0 > +#define STRTAB_STE_0_S1FMT_64K_L2 2 > #define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6) > #define STRTAB_STE_0_S1CDMAX GENMASK_ULL(63, 59) > > @@ -263,7 +264,20 @@ > > #define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4) > > -/* Context descriptor (stage-1 only) */ > +/* > + * Context descriptors. > + * > + * Linear: when less than 1024 SSIDs are supported > + * 2lvl: at most 1024 L1 entries, > + * 1024 lazy entries per table. > + */ > +#define CTXDESC_SPLIT 10 > +#define CTXDESC_L2_ENTRIES (1 << CTXDESC_SPLIT) > + > +#define CTXDESC_L1_DESC_DWORDS 1 > +#define CTXDESC_L1_DESC_VALID 1 > +#define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12) > + > #define CTXDESC_CD_DWORDS 8 > #define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0) > #define ARM64_TCR_T0SZ GENMASK_ULL(5, 0) > @@ -577,7 +591,10 @@ struct arm_smmu_cd_table { > struct arm_smmu_s1_cfg { > u8 s1fmt; > u8 s1cdmax; > - struct arm_smmu_cd_table table; > + struct arm_smmu_cd_table *tables; > + size_t num_tables; > + __le64 *l1ptr; > + dma_addr_t l1ptr_dma; > struct arm_smmu_ctx_desc cd; > }; > > @@ -1521,12 +1538,51 @@ static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu, > { > size_t size = num_entries * (CTXDESC_CD_DWORDS << 3); > > + if (!table->ptr) > + return; > dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma); > } > > -static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid) > +static void arm_smmu_write_cd_l1_desc(__le64 *dst, > + struct arm_smmu_cd_table *table) > { > - return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS; > + u64 val = (table->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | > + CTXDESC_L1_DESC_VALID; > + > + WRITE_ONCE(*dst, cpu_to_le64(val)); > +} > + > +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, > + u32 ssid) > +{ > + __le64 *l1ptr; > + unsigned int idx; > + struct arm_smmu_cd_table *table; > + struct arm_smmu_device *smmu = smmu_domain->smmu; > + struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > + > + if (cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) { > + table = &cfg->tables[0]; > + idx = ssid; > + } else { > + idx = ssid >> CTXDESC_SPLIT; > + if (idx >= cfg->num_tables) > + return NULL; > + > + table = &cfg->tables[idx]; > + if (!table->ptr) { > + if (arm_smmu_alloc_cd_leaf_table(smmu, table, > + CTXDESC_L2_ENTRIES)) > + return NULL; > + > + l1ptr = cfg->l1ptr + idx * CTXDESC_L1_DESC_DWORDS; > + arm_smmu_write_cd_l1_desc(l1ptr, table); > + /* An invalid L1CD can be cached */ > + arm_smmu_sync_cd(smmu_domain, ssid, false); > + } > + idx = ssid & (CTXDESC_L2_ENTRIES - 1); > + } > + return table->ptr + idx * CTXDESC_CD_DWORDS; > } > > static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr) > @@ -1552,7 +1608,7 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, > u64 val; > bool cd_live; > struct arm_smmu_device *smmu = smmu_domain->smmu; > - __le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid); > + __le64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid); > > /* > * This function handles the following cases: > @@ -1612,20 +1668,76 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, > > static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain) > { > + int ret; > + size_t size = 0; > + size_t max_contexts; > struct arm_smmu_device *smmu = smmu_domain->smmu; > struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > > - cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR; > - return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, > - 1 << cfg->s1cdmax); > + max_contexts = 1 << cfg->s1cdmax; > + > + if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) || > + max_contexts <= CTXDESC_L2_ENTRIES) { > + cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR; > + cfg->num_tables = 1; > + } else { > + cfg->s1fmt = STRTAB_STE_0_S1FMT_64K_L2; > + cfg->num_tables = DIV_ROUND_UP(max_contexts, > + CTXDESC_L2_ENTRIES); > + > + size = cfg->num_tables * (CTXDESC_L1_DESC_DWORDS << 3); > + cfg->l1ptr = dmam_alloc_coherent(smmu->dev, size, > + &cfg->l1ptr_dma, > + GFP_KERNEL | __GFP_ZERO); As before. Fairly sure __GFP_ZERO doesn't give you anything extra. > + if (!cfg->l1ptr) { > + dev_warn(smmu->dev, "failed to allocate L1 context table\n"); > + return -ENOMEM; > + } > + } > + > + cfg->tables = devm_kzalloc(smmu->dev, sizeof(struct arm_smmu_cd_table) * > + cfg->num_tables, GFP_KERNEL); > + if (!cfg->tables) { > + ret = -ENOMEM; > + goto err_free_l1; > + } > + > + /* With two levels, leaf tables are allocated lazily */ This comment is a kind of odd one. It is actually talking about what 'doesn't' happen here I think.. Perhaps /* * Only allocate a leaf table for linear case. * With two levels, the leaf tables are allocated lazily. */ > + if (!cfg->l1ptr) { > + ret = arm_smmu_alloc_cd_leaf_table(smmu, &cfg->tables[0], > + max_contexts); > + if (ret) > + goto err_free_tables; > + } > + > + return 0; > + > +err_free_tables: > + devm_kfree(smmu->dev, cfg->tables); > +err_free_l1: > + if (cfg->l1ptr) > + dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma); This cleanup only occurs if we have had an error. Is there potential for this to rerun at some point later? If so we should be careful to also reset relevant pointers - e.g. cfg->l1ptr = NULL as they are used to control the flow above. If there is no chance of a rerun why bother cleaning them up at all? Something has gone horribly wrong so let the eventual smmu cleanup deal with them. > + return ret; > } > > static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain) > { > + int i; > struct arm_smmu_device *smmu = smmu_domain->smmu; > struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > + size_t num_leaf_entries = 1 << cfg->s1cdmax; > + struct arm_smmu_cd_table *table = cfg->tables; > > - arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax); > + if (cfg->l1ptr) { > + size_t size = cfg->num_tables * (CTXDESC_L1_DESC_DWORDS << 3); > + > + dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma); As above, if we can call this in a fashion that makes sense other than in eventual smmu tear down, then we need to be careful to reset the pointers. If not, then why are we clearing managed resourced by hand anyway? > + num_leaf_entries = CTXDESC_L2_ENTRIES; > + } > + > + for (i = 0; i < cfg->num_tables; i++, table++) > + arm_smmu_free_cd_leaf_table(smmu, table, num_leaf_entries); > + devm_kfree(smmu->dev, cfg->tables); > } > > /* Stream table manipulation functions */ > @@ -1745,6 +1857,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, > } > > if (s1_cfg) { > + dma_addr_t ptr_dma = s1_cfg->l1ptr ? s1_cfg->l1ptr_dma : > + s1_cfg->tables[0].ptr_dma; > + > BUG_ON(ste_live); > dst[1] = cpu_to_le64( > FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | > @@ -1757,7 +1872,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, > !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) > dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); > > - val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) | > + val |= (ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) | > FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | > FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) | > FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt); > @@ -2273,7 +2388,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain) > if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { > struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > > - if (cfg->table.ptr) { > + if (cfg->tables) { > arm_smmu_free_cd_tables(smmu_domain); > arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid); > }
On Fri, 8 Nov 2019 16:25:07 +0100 Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > Let add_device() clean up after itself. The iommu_bus_init() function > does call remove_device() on error, but other sites (e.g. of_iommu) do > not. > > Don't free level-2 stream tables because we'd have to track if we > allocated each of them or if they are used by other endpoints. It's not > worth the hassle since they are managed resources. > > Reviewed-by: Eric Auger <eric.auger@redhat.com> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Potentially some fun around reordering of last few actions, but doesn't seem there is any real connection between them so should be fine. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/iommu/arm-smmu-v3.c | 28 +++++++++++++++++++++------- > 1 file changed, 21 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 82eac89ee187..88ec0bf33492 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -2826,14 +2826,16 @@ static int arm_smmu_add_device(struct device *dev) > for (i = 0; i < master->num_sids; i++) { > u32 sid = master->sids[i]; > > - if (!arm_smmu_sid_in_range(smmu, sid)) > - return -ERANGE; > + if (!arm_smmu_sid_in_range(smmu, sid)) { > + ret = -ERANGE; > + goto err_free_master; > + } > > /* Ensure l2 strtab is initialised */ > if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { > ret = arm_smmu_init_l2_strtab(smmu, sid); > if (ret) > - return ret; > + goto err_free_master; > } > } > > @@ -2843,13 +2845,25 @@ static int arm_smmu_add_device(struct device *dev) > master->ssid_bits = min_t(u8, master->ssid_bits, > CTXDESC_LINEAR_CDMAX); > > + ret = iommu_device_link(&smmu->iommu, dev); > + if (ret) > + goto err_free_master; > + > group = iommu_group_get_for_dev(dev); > - if (!IS_ERR(group)) { > - iommu_group_put(group); > - iommu_device_link(&smmu->iommu, dev); > + if (IS_ERR(group)) { > + ret = PTR_ERR(group); > + goto err_unlink; > } > > - return PTR_ERR_OR_ZERO(group); > + iommu_group_put(group); > + return 0; > + > +err_unlink: > + iommu_device_unlink(&smmu->iommu, dev); > +err_free_master: > + kfree(master); > + fwspec->iommu_priv = NULL; > + return ret; > } > > static void arm_smmu_remove_device(struct device *dev)
On 2019/11/8 下午11:25, Jean-Philippe Brucker wrote: > This is version 2 of the series I sent a while ago [1], adding PASID > support to the Arm SMMUv3 driver. > > Changes since v1: > * Dropped the patch adding auxiliary domain support. It's an easy way to > test PASID, by populating PASID contexts using iommu_map/unmap(), but > I don't know if it will ever have real users. > > Since v1 I changed my testing gear, and am using the zip accelerator > [2] instead of a software model. It only uses SVA and testing > auxiliary domains would require additional changes that would never go > upstream. SVA requires another 20 patches (including I/O page faults) > that I will send later, but at least I know that this will get used. > > * ioasid patch has been carried by Jacob and should be merged for v5.5 [3] > > * Split patch "Add support for Substream IDs" into patches 4 and 5. > > * Added IORT support (patch 3) and addressed other comments. > > [1] https://lore.kernel.org/linux-iommu/20190610184714.6786-1-jean-philippe.brucker@arm.com/ > [2] https://lore.kernel.org/linux-iommu/1572331216-9503-1-git-send-email-zhangfei.gao@linaro.org/ > [3] https://lore.kernel.org/linux-iommu/1570045363-24856-1-git-send-email-jacob.jun.pan@linux.intel.com/ > > Jean-Philippe Brucker (8): > dt-bindings: document PASID property for IOMMU masters > iommu/arm-smmu-v3: Support platform SSID > ACPI/IORT: Support PASID for platform devices > iommu/arm-smmu-v3: Prepare for SSID support > iommu/arm-smmu-v3: Add support for Substream IDs > iommu/arm-smmu-v3: Add second level of context descriptor table > iommu/arm-smmu-v3: Improve add_device() error handling > iommu/arm-smmu-v3: Add support for PCI PASID Thanks Jean for the patch The series tested well on Hisilicon platform KunPeng920 Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
On Mon, Nov 11, 2019 at 02:38:11PM +0000, Jonathan Cameron wrote: > Hmm. There are several different refactors in here alongside a few new > bits. Would be nice to break it up more to make life even easier for > reviewers. It's not 'so' complex that it's really a problem though > so could leave it as is if you really want to. Sure, I'll see if I can split it more in next version. > > + table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma, > > + GFP_KERNEL | __GFP_ZERO); > > We dropped dma_zalloc_coherent because we now zero in dma_alloc_coherent > anyway. Hence I'm fairly sure that __GFP_ZERO should have no effect. > > https://lore.kernel.org/patchwork/patch/1031536/ > > Am I missing some special corner case here? Here I just copied the GFP flags already in use. But removing all __GFP_ZERO from the driver would make a good cleanup patch. > > - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) > > - arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg); > > - > > Whilst it seems fine, perhaps a note on the 'why' of moving this into > finalise_s1 would be good in the patch description. Ok. Since it's only to simplify the handling of allocation failure in a subsequent patch, I think I'll move that part over there. Thanks, Jean