[PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

List overview All Threads
Download

newer

older

[Bug 105425] 3D & games produce...

[Bug 48424] New: Fix...

Sinan Kaya

10 Apr 2018 10 Apr '18

8:59 p.m.

Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Signed-off-by: Sinan Kaya okaya@codeaurora.org --- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index 8e28270..1b031eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n"); } - + dma_set_max_seg_size(adev->dev, PAGE_SIZE); r = gmc_v6_0_init_microcode(adev); if (r) { dev_err(adev->dev, "Failed to load mc firmware!\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 86e9d682..0a4b2cc1 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); } + dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v7_0_init_microcode(adev); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index 9a813d8..b171529 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); } + dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v8_0_init_microcode(adev); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 3b7e7af..36e658ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); printk(KERN_WARNING "amdgpu: No coherent DMA available.\n"); } + dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v9_0_mc_init(adev); if (r)

-- 2.7.4

Show replies by date

Huang Rui

11 Apr 11 Apr

6:26 a.m.

On Tue, Apr 10, 2018 at 04:59:55PM -0400, Sinan Kaya wrote:

...

Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Signed-off-by: Sinan Kaya okaya@codeaurora.org

Sinan, I guess Christian's suggestion is to add amdgpu_device_init function like here:

8<------- diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 0e798b3..9b96771 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2339,6 +2339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, /* init the mode config */ drm_mode_config_init(adev->ddev);

+ dma_set_max_seg_size(adev->dev, PAGE_SIZE); + r = amdgpu_device_ip_init(adev); if (r) { /* failed in exclusive mode due to timeout */ 8<-------

After that, we don't do it in each generation.

Thanks, Ray

Christian König

9:17 a.m.

Am 11.04.2018 um 08:26 schrieb Huang Rui:

...

On Tue, Apr 10, 2018 at 04:59:55PM -0400, Sinan Kaya wrote:

...
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Signed-off-by: Sinan Kaya okaya@codeaurora.org

Sinan, I guess Christian's suggestion is to add amdgpu_device_init function like here:

Yes, exactly. Looks like Sinan just tried to place the call next to pci_set_dma_mask()/pci_set_consistent_dma_mask().

Not necessary a bad idea, but in this case not optimal.

Sinan please refine your patch as Rui suggested and resend.

Thanks, Christian.

...

8<------- diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 0e798b3..9b96771 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2339,6 +2339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev, /* init the mode config */ drm_mode_config_init(adev->ddev);
  dma_set_max_seg_size(adev->dev, PAGE_SIZE);
   r = amdgpu_device_ip_init(adev);
   if (r) {
           /* failed in exclusive mode due to timeout */
8<-------

After that, we don't do it in each generation.

Thanks, Ray _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Robin Murphy

12:03 p.m.

On 10/04/18 21:59, Sinan Kaya wrote:

...

Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

...

IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).

As a short-term fix, at least do something like what i915 does and constrain the table allocation to the desired segment size as well, so things remain self-consistent. But still never claim that faking a hardware constraint as a workaround for a driver shortcoming is "the right thing to do" ;)

Robin.

...

Signed-off-by: Sinan Kaya okaya@codeaurora.org

drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index 8e28270..1b031eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n"); }

dma_set_max_seg_size(adev->dev, PAGE_SIZE); r = gmc_v6_0_init_microcode(adev); if (r) { dev_err(adev->dev, "Failed to load mc firmware!\n");

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 86e9d682..0a4b2cc1 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); }

dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v7_0_init_microcode(adev); if (r) {

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index 9a813d8..b171529 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); }

dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v8_0_init_microcode(adev); if (r) {

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 3b7e7af..36e658ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); printk(KERN_WARNING "amdgpu: No coherent DMA available.\n"); }

dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v9_0_mc_init(adev); if (r)

Sinan Kaya

2:33 p.m.

On 4/11/2018 8:03 AM, Robin Murphy wrote:

...

On 10/04/18 21:59, Sinan Kaya wrote:

...
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

Sure, I'll take a better fix if there is one.

...

...
IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).

As a short-term fix, at least do something like what i915 does and constrain the table allocation to the desired segment size as well, so things remain self-consistent. But still never claim that faking a hardware constraint as a workaround for a driver shortcoming is "the right thing to do" ;)

You are asking for something like this from here, right?

https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_ge...

ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); if (ret) goto err_free;

src = obj->mm.pages->sgl; dst = st->sgl; for (i = 0; i < obj->mm.pages->nents; i++) { sg_set_page(dst, sg_page(src), src->length, 0); dst = sg_next(dst); src = sg_next(src); }

This seems to allocate the scatter gather list and fill it in manually before passing it to dma_map_sg(). I'll give it a try.

Just double checking.

...

Robin.

...
Signed-off-by: Sinan Kaya okaya@codeaurora.org

-- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

Robin Murphy

3:32 p.m.

On 11/04/18 15:33, Sinan Kaya wrote:

...

On 4/11/2018 8:03 AM, Robin Murphy wrote:

...
On 10/04/18 21:59, Sinan Kaya wrote:

...
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

Sure, I'll take a better fix if there is one.

...
...
IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).

As a short-term fix, at least do something like what i915 does and constrain the table allocation to the desired segment size as well, so things remain self-consistent. But still never claim that faking a hardware constraint as a workaround for a driver shortcoming is "the right thing to do" ;)

You are asking for something like this from here, right?

https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_ge...

I just looked for callers of __sg_alloc_table_from_pages() with a limit other than SCATTERLIST_MAX_SEGMENT, and found __i915_gem_userptr_alloc_pages(), which at first glance appears to be doing something vaguely similar to amdgpu_ttm_tt_pin_userptr().

Robin.

Christoph Hellwig

12 Apr 12 Apr

6:26 a.m.

On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:

...

On 10/04/18 21:59, Sinan Kaya wrote:

...
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed it is completely bogus.

...

...
IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).

Agreed.

Christian König

9:42 a.m.

Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:

...

On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:

...
On 10/04/18 21:59, Sinan Kaya wrote:

...
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed it is completely bogus.

I agree that the common DRM functions should be able to deal with both, but we should let the driver side decide if it wants multiple addresses combined or not.

...

...
...
IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).

Agreed.

Sounds like my understanding of max_seg_size is incorrect, what exactly should that describe?

Thanks, Christian.

Robin Murphy

1:51 p.m.

On 12/04/18 10:42, Christian König wrote:

...

Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:

...
On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:

...
On 10/04/18 21:59, Sinan Kaya wrote:

...
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed it is completely bogus.

I agree that the common DRM functions should be able to deal with both, but we should let the driver side decide if it wants multiple addresses combined or not.

...
...
...
IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).

Agreed.

Sounds like my understanding of max_seg_size is incorrect, what exactly should that describe?

The segment size and boundary mask are there to represent a device's hardware scatter-gather capabilities - a lot of things have limits on the size and alignment of the data pointed to by a single descriptor (e.g. an xHCI TRB, where the data buffer must not cross a 64KB boundary) - although they can also be relevant to non-scatter-gather devices if they just have limits on how big/aligned a single DMA transfer can be.

Robin.

2582

Age (days ago)

2584

Last active (days ago)

dri-devel@lists.freedesktop.org

8 comments

5 participants

tags (0)

participants (5)

Christian König
Christoph Hellwig
Huang Rui
Robin Murphy
Sinan Kaya