2018-06-06 14:19 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 14:08 schrieb Gabriel C:
2018-06-06 13:33 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 13:28 schrieb Gabriel C:
2018-04-11 7:02 GMT+02:00 Gabriel C nix.or.die@gmail.com:
2018-04-11 6:00 GMT+02:00 Gabriel C nix.or.die@gmail.com: 2018-04-09 11:42 GMT+02:00 Christian König ckoenig.leichtzumerken@gmail.com: > > Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
...
I can help testing code for 4.17/++ if you wish but that is *different* storry.
Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver are broken now in this one.
radeon tells:
...
[ 6.337838] [drm] PCIE GART of 2048M enabled (table at 0x00000000001D6000). [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration
...
I have the same Issue now on final 4.17.
Actually Michel came up with a fix for the performance regression which is now backported to older kernels as well.
So the original issue of this mail thread should be fixed by now.
Ok , will test as soon I get the GPU to work :))
Also I played with BIOS options also which does not fix anything but changes the error message.
IOMMU && SR-IOV disabled the error changes to this :
[ 7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD) [ 7.092059] radeon 0000:21:00.0: disabling GPU acceleration
While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to kill the GPU with no way for me to make it work ( at least I could not find any workaround by now )
That actually sounds like something completely different. Can you provide a full dmesg of radeon and/or amdgpu?
Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
Also nothing else changed in that setup just testing kernel 4.17.
That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
Also if you don't support SME at all now on that Hardware while worked before please add proper error handling and proper dmesg messages letting the user know.
radeon: xxxx : SME not supported on that Hardware anymore , please disable SME... radeon: xxxx: Update your GPU < or whatever >
How hard would be that ?
No one but developers , can guess from these error messges why his hardware suddenly isn't working anymore by just updating the kernel.
Try to disable SME either in the BIOS or on the kernel command line.
Yes that works but is not the point.
Really you just can't break users setups like this.
On 2018-06-06 03:33 PM, Gabriel C wrote:
2018-06-06 14:19 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 14:08 schrieb Gabriel C:
2018-06-06 13:33 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 13:28 schrieb Gabriel C:
2018-04-11 7:02 GMT+02:00 Gabriel C nix.or.die@gmail.com:
[ 6.337838] [drm] PCIE GART of 2048M enabled (table at 0x00000000001D6000). [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration
...
I have the same Issue now on final 4.17.
Please file a bug report, and ideally bisect which commit(s) introduced the issue(s).
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
Also nothing else changed in that setup just testing kernel 4.17.
That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
If that is true, again please bisect which commit broke it.
All the reports I've seen before this indicated that at least amdgpu has never worked with SME (which BTW doesn't mean it's never going to work or that we don't want to support it, just that as far as we know it's currently not working).
Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
On 2018-06-06 03:33 PM, Gabriel C wrote:
2018-06-06 14:19 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 14:08 schrieb Gabriel C:
2018-06-06 13:33 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 13:28 schrieb Gabriel C:
2018-04-11 7:02 GMT+02:00 Gabriel C nix.or.die@gmail.com: > > [ 6.337838] [drm] PCIE GART of 2048M enabled (table at > 0x00000000001D6000). > [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed > [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration > > ... > I have the same Issue now on final 4.17.
Please file a bug report, and ideally bisect which commit(s) introduced the issue(s).
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
Also nothing else changed in that setup just testing kernel 4.17.
That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
If that is true, again please bisect which commit broke it.
All the reports I've seen before this indicated that at least amdgpu has never worked with SME (which BTW doesn't mean it's never going to work or that we don't want to support it, just that as far as we know it's currently not working).
At least in theory it should work when we use the coherent DMA allocator.
When that really worked before, so the most likely commit which broke this is:
commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:09 2018 +0800
drm/amdgpu: only enable swiotlb alloc when need v2
get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later
Signed-off-by: Chunming Zhou david1.zhou@amd.com Reviewed-by: Monk Liu monk.liu@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
Currently looking into how we could somehow improve this detection.
Regards, Christian.
On 2018-06-06 04:44 PM, Christian König wrote:
Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
On 2018-06-06 03:33 PM, Gabriel C wrote:
2018-06-06 14:19 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 14:08 schrieb Gabriel C:
2018-06-06 13:33 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 13:28 schrieb Gabriel C:
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
Also nothing else changed in that setup just testing kernel 4.17.
That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
If that is true, again please bisect which commit broke it.
All the reports I've seen before this indicated that at least amdgpu has never worked with SME (which BTW doesn't mean it's never going to work or that we don't want to support it, just that as far as we know it's currently not working).
At least in theory it should work when we use the coherent DMA allocator.
When that really worked before, so the most likely commit which broke this is:
commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:09 2018 +0800
drm/amdgpu: only enable swiotlb alloc when need v2
get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later
Signed-off-by: Chunming Zhou david1.zhou@amd.com Reviewed-by: Monk Liu monk.liu@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
Currently looking into how we could somehow improve this detection.
I guess this could fit for Gabriel, but e.g. https://bugs.freedesktop.org/104437 says amdgpu was already broken with SME in 4.15, if not 4.14 (I suspect there was simply no SME support earlier).
2018-06-06 17:03 GMT+02:00 Michel Dänzer michel@daenzer.net:
On 2018-06-06 04:44 PM, Christian König wrote:
Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
On 2018-06-06 03:33 PM, Gabriel C wrote:
2018-06-06 14:19 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 14:08 schrieb Gabriel C:
2018-06-06 13:33 GMT+02:00 Christian König christian.koenig@amd.com: > Am 06.06.2018 um 13:28 schrieb Gabriel C:
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
Also nothing else changed in that setup just testing kernel 4.17.
That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
If that is true, again please bisect which commit broke it.
All the reports I've seen before this indicated that at least amdgpu has never worked with SME (which BTW doesn't mean it's never going to work or that we don't want to support it, just that as far as we know it's currently not working).
At least in theory it should work when we use the coherent DMA allocator.
When that really worked before, so the most likely commit which broke this is:
commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:09 2018 +0800
drm/amdgpu: only enable swiotlb alloc when need v2 get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later Signed-off-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently looking into how we could somehow improve this detection.
I guess this could fit for Gabriel, but e.g. https://bugs.freedesktop.org/104437 says amdgpu was already broken with SME in 4.15, if not 4.14 (I suspect there was simply no SME support earlier).
I got strange performance issue with 4.15 and 4.16 .. but SME was ON on that setup ( even before it hit mainline ) and never broke the GPU like this.
There is a 4.16.13 boot dmesg which has no such issue:
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-...
With the setup as is booting 4.16.x works , while 4.17 trows the errors.
-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Am 06.06.2018 um 17:44 schrieb Gabriel C:
2018-06-06 17:03 GMT+02:00 Michel Dänzer michel@daenzer.net:
On 2018-06-06 04:44 PM, Christian König wrote:
Am 06.06.2018 um 16:12 schrieb Michel Dänzer: [SNIP] At least in theory it should work when we use the coherent DMA allocator.
When that really worked before, so the most likely commit which broke this is:
commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:09 2018 +0800
drm/amdgpu: only enable swiotlb alloc when need v2 get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later Signed-off-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently looking into how we could somehow improve this detection.
I guess this could fit for Gabriel, but e.g. https://bugs.freedesktop.org/104437 says amdgpu was already broken with SME in 4.15, if not 4.14 (I suspect there was simply no SME support earlier).
And what I totally missed is that Gabriel is using radeon and not amdgpu.
So Gabriel you need to revert this one for testing: commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:10 2018 +0800
drm/radeon: only enable swiotlb path when need v2
swiotlb expands our card accessing range, but its path always is slower than ttm pool allocation. So add condition to use it. v2: move a bit later
Signed-off-by: Chunming Zhou david1.zhou@amd.com Reviewed-by: Monk Liu monk.liu@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.z...
I got strange performance issue with 4.15 and 4.16 .. but SME was ON on that setup ( even before it hit mainline ) and never broke the GPU like this.
Well that is very interesting, you are the first one who reports that SME + GFX works in some way. So far we only got negative reports for that.
There is a 4.16.13 boot dmesg which has no such issue:
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-...
With the setup as is booting 4.16.x works , while 4.17 trows the errors.
Please do the bisect if the patch I've mentioned above doesn't help.
Thanks, Christian.
-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
2018-06-07 9:07 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 17:44 schrieb Gabriel C:
2018-06-06 17:03 GMT+02:00 Michel Dänzer michel@daenzer.net:
On 2018-06-06 04:44 PM, Christian König wrote:
Am 06.06.2018 um 16:12 schrieb Michel Dänzer: [SNIP] At least in theory it should work when we use the coherent DMA allocator.
When that really worked before, so the most likely commit which broke this is:
commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:09 2018 +0800
drm/amdgpu: only enable swiotlb alloc when need v2 get the max io mapping address of system memory to see if it is
over our card accessing range. v2: move checking later
Signed-off-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently looking into how we could somehow improve this detection.
I guess this could fit for Gabriel, but e.g. https://bugs.freedesktop.org/104437 says amdgpu was already broken with SME in 4.15, if not 4.14 (I suspect there was simply no SME support earlier).
And what I totally missed is that Gabriel is using radeon and not amdgpu.
So Gabriel you need to revert this one for testing: commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:10 2018 +0800
drm/radeon: only enable swiotlb path when need v2 swiotlb expands our card accessing range, but its path always is slower than ttm pool allocation. So add condition to use it. v2: move a bit later Signed-off-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link:
https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.z...
I got strange performance issue with 4.15 and 4.16 .. but SME was ON on that setup ( even before it hit mainline ) and never broke the GPU like this.
Well that is very interesting, you are the first one who reports that SME + GFX works in some way. So far we only got negative reports for that.
There is a 4.16.13 boot dmesg which has no such issue:
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-...
With the setup as is booting 4.16.x works , while 4.17 trows the errors.
Please do the bisect if the patch I've mentioned above doesn't help.
Ok done.. bisect points to:
b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 Author: Christoph Hellwig hch@lst.de Date: Mon Mar 19 11:38:19 2018 +0100
iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
This cleans up the code a lot by removing duplicate logic.
Tested-by: Tom Lendacky thomas.lendacky@amd.com Tested-by: Joerg Roedel jroedel@suse.de Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Joerg Roedel jroedel@suse.de Cc: David Woodhouse dwmw2@infradead.org Cc: Joerg Roedel joro@8bytes.org Cc: Jon Mason jdmason@kudzu.us Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Muli Ben-Yehuda mulix@mulix.org Cc: Peter Zijlstra peterz@infradead.org Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-8-hch@lst.de Signed-off-by: Ingo Molnar mingo@kernel.org
I'll try to revert this once I'm home.
BR
Well that is very interesting, you are the first one who reports that SME + GFX works in some way. So far we only got negative reports for that.
There is a 4.16.13 boot dmesg which has no such issue:
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-...
With the setup as is booting 4.16.x works , while 4.17 trows the errors.
Please do the bisect if the patch I've mentioned above doesn't help.
Ok done.. bisect points to:
b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 Author: Christoph Hellwig hch@lst.de Date: Mon Mar 19 11:38:19 2018 +0100
iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
This cleans up the code a lot by removing duplicate logic.
Tested-by: Tom Lendacky thomas.lendacky@amd.com Tested-by: Joerg Roedel jroedel@suse.de Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Joerg Roedel jroedel@suse.de Cc: David Woodhouse dwmw2@infradead.org Cc: Joerg Roedel joro@8bytes.org Cc: Jon Mason jdmason@kudzu.us Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Muli Ben-Yehuda mulix@mulix.org Cc: Peter Zijlstra peterz@infradead.org Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-8-hch@lst.de Signed-off-by: Ingo Molnar mingo@kernel.org
I'll try to revert this once I'm home.
I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 fixes that issue for me.
The GPU is working fine with SME enabled.
Now with working GPU :) I can also confirm performance is back to normal without doing any other workarounds.
The only app still acting up a bit is Firefox , just minor frame drops, but nothing to bad. ( probably an Firefox bug too )
crhomium/chrome is fine .. even with 10 tabs open , each one playing an video on youtube no glitches at all.
Desktop is also fine now, could not find anything wrong.
BR
Hi Christopher,
Am 07.06.2018 um 18:24 schrieb Gabriel C:
[SNIP] Ok done.. bisect points to:
b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 Author: Christoph Hellwig hch@lst.de Date: Mon Mar 19 11:38:19 2018 +0100
iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() This cleans up the code a lot by removing duplicate logic. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-8-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
I'll try to revert this once I'm home.
I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 fixes that issue for me.
any idea what could cause that? Basically this patch breaks radeon when SME is enabled.
The GPU is working fine with SME enabled.
Now with working GPU :) I can also confirm performance is back to normal without doing any other workarounds.
The only app still acting up a bit is Firefox , just minor frame drops, but nothing to bad. ( probably an Firefox bug too )
crhomium/chrome is fine .. even with 10 tabs open , each one playing an video on youtube no glitches at all.
Desktop is also fine now, could not find anything wrong.
Thanks for testing, Christian.
BR
Hi Christoph,
Am 08.06.2018 um 08:01 schrieb Christoph Hellwig:
On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote:
Hi Christopher,
I don't see a Christopher on the Cc list..
Sorry, auto-uncorrection. I indeed meant you :)
Christian.
On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
Ok done.. bisect points to:
What is the failure mode you are seeing? Can't find anything in the mail unfortunately.
Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
Ok done.. bisect points to:
What is the failure mode you are seeing? Can't find anything in the mail unfortunately.
As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.
Still need to figure out which parameters we want to use for the allocation, but I think it is only 4k or 8k.
Regards, Christian.
2018-06-08 8:52 GMT+02:00 Christian König christian.koenig@amd.com:
Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
Ok done.. bisect points to:
What is the failure mode you are seeing? Can't find anything in the mail unfortunately.
As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.
Still need to figure out which parameters we want to use for the allocation, but I think it is only 4k or 8k.
When you guys need me to test something , or run debug patches or patches of any sort just let me know..
Regards, Christian.
BR
I think the prime issue is that dma_direct_alloc respects the dma mask. Which we don't need if actually using the iommu. This would be mostly harmless exept for the the SEV bit high in the address that makes the checks fail.
For now I'd say revert this commit for 4.17/4.18-rc and I'll look into addressing these issues properly.
On Mon, Jun 11, 2018 at 12:07 AM Christoph Hellwig hch@lst.de wrote:
For now I'd say revert this commit for 4.17/4.18-rc and I'll look into addressing these issues properly.
Ok, reverted in my tree, and marked for stable (for 4.17). Thanks,
Linus
2018-06-06 16:44 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
On 2018-06-06 03:33 PM, Gabriel C wrote:
2018-06-06 14:19 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 14:08 schrieb Gabriel C:
2018-06-06 13:33 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 13:28 schrieb Gabriel C: > > 2018-04-11 7:02 GMT+02:00 Gabriel C nix.or.die@gmail.com: >> >> >> [ 6.337838] [drm] PCIE GART of 2048M enabled (table at >> 0x00000000001D6000). >> [ 6.338210] radeon 0000:21:00.0: (-12) create WB bo failed >> [ 6.338214] radeon 0000:21:00.0: disabling GPU acceleration >> >> ... >> > I have the same Issue now on final 4.17.
Please file a bug report, and ideally bisect which commit(s) introduced the issue(s).
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-o...
Also nothing else changed in that setup just testing kernel 4.17.
That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
If that is true, again please bisect which commit broke it.
All the reports I've seen before this indicated that at least amdgpu has never worked with SME (which BTW doesn't mean it's never going to work or that we don't want to support it, just that as far as we know it's currently not working).
At least in theory it should work when we use the coherent DMA allocator.
When that really worked before, so the most likely commit which broke this is:
commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou david1.zhou@amd.com Date: Fri Feb 9 10:44:09 2018 +0800
drm/amdgpu: only enable swiotlb alloc when need v2 get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later Signed-off-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently looking into how we could somehow improve this detection.
Is not this one , I've build an kernel with this reverted.
I'll do an bisect tonight or tomorrow.
Regards, Christian.
Am 06.06.2018 um 15:33 schrieb Gabriel C:
2018-06-06 14:19 GMT+02:00 Christian König christian.koenig@amd.com:
Am 06.06.2018 um 14:08 schrieb Gabriel C:
[SNIP]
That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware.
Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
SME was like this in kernel 4.16.x here and all worked.
Also if you don't support SME at all now on that Hardware while worked before please add proper error handling and proper dmesg messages letting the user know.
radeon: xxxx : SME not supported on that Hardware anymore , please disable SME... radeon: xxxx: Update your GPU < or whatever >
How hard would be that ?
Yes, to be precise that isn't the job of the GFX driver to care about such things.
It is a well known and documented limitation of SME that it is in general mostly incompatible with GFX (or compute) hardware, and it actually doesn't matter which hardware or driver you use.
In other words what happens is that as soon as you use GFX (or compute) SME gets disabled transparently.
The problem is that this happens only on the DMA slow path we just disabled because of the performance problems.
Going to propose to revert that or at least only use it when SME is disabled.
Regards, Christian.
dri-devel@lists.freedesktop.org