Hi David,
I'm using 3.10.53-rt56 kernel and encounter a problem in r600_dma_ring_test() when vram memory is mapped as write-combining: no matter how long the polling is done, old value (0xCAFEDEAD) is read.
Looking with hardware analyzer at what actually happens in the PCI-E bus, the memory is accessed with 32-byte loads (8 words at a time). That is, when the memory is mapped as write-combining, the processor converts every readl() into a 32-bytes load transaction.
After doing some more experiments, it seems that Radeon has some kind of cache that keeps the old value (0xCAFEDEAD), and this cache is invalidated when: 1) Some other VRAM address is accessed, or 2) Processor issues a 4-byte load transaction.
The problem is that as long as the memory is write-combining, all loads will be converted to be 32-bytes long by the CPU, so the test fails with timeout. But if I comment out this particular ring test, everything seems to be working fine (tested with Doom 3).
Is it possible that the situation r600_dma_ring_test() checks for does not happen in real life, and I should be OK commenting it out?
Or maybe the test is broken and some cache-flushing command must be written into the ring buffer?
BTW this is an out-of-tree architecture, so bisecting is not possible.
Hi Alexander,
in the ring test we write the value 0xDEADBEEF and 0xCAFEDEAD into registers, not VRAM.
And the register bar shouldn't be accessed write combined, cause that could lead to a couple of ordering problems. Why do you think the access is done write combined?
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Regards, Christian.
Am 09.10.2014 um 13:39 schrieb Alexander Fyodorov:
Hi David,
I'm using 3.10.53-rt56 kernel and encounter a problem in r600_dma_ring_test() when vram memory is mapped as write-combining: no matter how long the polling is done, old value (0xCAFEDEAD) is read.
Looking with hardware analyzer at what actually happens in the PCI-E bus, the memory is accessed with 32-byte loads (8 words at a time). That is, when the memory is mapped as write-combining, the processor converts every readl() into a 32-bytes load transaction.
After doing some more experiments, it seems that Radeon has some kind of cache that keeps the old value (0xCAFEDEAD), and this cache is invalidated when:
- Some other VRAM address is accessed, or
- Processor issues a 4-byte load transaction.
The problem is that as long as the memory is write-combining, all loads will be converted to be 32-bytes long by the CPU, so the test fails with timeout. But if I comment out this particular ring test, everything seems to be working fine (tested with Doom 3).
Is it possible that the situation r600_dma_ring_test() checks for does not happen in real life, and I should be OK commenting it out?
Or maybe the test is broken and some cache-flushing command must be written into the ring buffer?
BTW this is an out-of-tree architecture, so bisecting is not possible. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
09.10.2014, 21:42, "Christian König" christian.koenig@amd.com:
Hi Alexander,
in the ring test we write the value 0xDEADBEEF and 0xCAFEDEAD into registers, not VRAM.
And the register bar shouldn't be accessed write combined, cause that could lead to a couple of ordering problems. Why do you think the access is done write combined?
Because there is this code in r600_dma_ring_test(): void __iomem *ptr = (void *)rdev->vram_scratch.ptr;
And vram_scratch is allocated in r600_vram_scratch_init() with domain RADEON_GEM_DOMAIN_VRAM which implies write-combining. I assumed that this means it points to the video memory.
Also when I look at page table attributes I can see that it is indeed mapped as write-combining. In this test only "rdev->rmmio" area was mapped as UC (the one where radeon_ring_commit() writes to to start the execution).
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Then maybe such a flush is what I need. How do I put it in the instruction ring buffer?
Am 09.10.2014 um 20:15 schrieb Alexander Fyodorov:
09.10.2014, 21:42, "Christian König" christian.koenig@amd.com:
Hi Alexander,
in the ring test we write the value 0xDEADBEEF and 0xCAFEDEAD into registers, not VRAM.
And the register bar shouldn't be accessed write combined, cause that could lead to a couple of ordering problems. Why do you think the access is done write combined?
Because there is this code in r600_dma_ring_test(): void __iomem *ptr = (void *)rdev->vram_scratch.ptr;
And vram_scratch is allocated in r600_vram_scratch_init() with domain RADEON_GEM_DOMAIN_VRAM which implies write-combining. I assumed that this means it points to the video memory.
Ah! Sorry, you are talking about the DMA ring test, not the GFX ring test. Right in this case we use a bit of VRAM for the test.
Also when I look at page table attributes I can see that it is indeed mapped as write-combining. In this test only "rdev->rmmio" area was mapped as UC (the one where radeon_ring_commit() writes to to start the execution).
Correct, that's the register bar.
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Then maybe such a flush is what I need. How do I put it in the instruction ring buffer?
At least we need to flush the HDP, but what hardware generation is this exactly? Some R6xx don't support hardware flushes in the ring buffer.
Try to call r600_mmio_hdp_flush(rdev) from the loop which checks the value written.
Regards, Christian.
09.10.2014, 22:32, "Christian König" christian.koenig@amd.com:
Am 09.10.2014 um 20:15 schrieb Alexander Fyodorov:
09.10.2014, 21:42, "Christian König" christian.koenig@amd.com:
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Then maybe such a flush is what I need. How do I put it in the instruction ring buffer?
At least we need to flush the HDP, but what hardware generation is this exactly? Some R6xx don't support hardware flushes in the ring buffer.
I observed the problem on HD2400 and HD6670.
Try to call r600_mmio_hdp_flush(rdev) from the loop which checks the value written.
Yep, it helped. Here is the patch against 3.10.53, tested on HD2400.
Flush VRAM cache before each read when polling.
Signed-off-by: Alexander Fyodorov <halcy at yandex.ru>
Index: drivers/gpu/drm/radeon/r600.c =================================================================== --- drivers/gpu/drm/radeon/r600.c (revision 11647) +++ drivers/gpu/drm/radeon/r600.c (working copy) @@ -2899,6 +2899,7 @@ radeon_ring_unlock_commit(rdev, ring);
for (i = 0; i < rdev->usec_timeout; i++) { + r600_ioctl_wait_idle(rdev, rdev->vram_scratch.robj); tmp = readl(ptr); if (tmp == 0xDEADBEEF) break;
On Thu, Oct 9, 2014 at 3:10 PM, Alexander Fyodorov halcy@yandex.ru wrote:
09.10.2014, 22:32, "Christian König" christian.koenig@amd.com:
Am 09.10.2014 um 20:15 schrieb Alexander Fyodorov:
09.10.2014, 21:42, "Christian König" christian.koenig@amd.com:
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Then maybe such a flush is what I need. How do I put it in the instruction ring buffer?
At least we need to flush the HDP, but what hardware generation is this exactly? Some R6xx don't support hardware flushes in the ring buffer.
I observed the problem on HD2400 and HD6670.
Try to call r600_mmio_hdp_flush(rdev) from the loop which checks the value written.
Yep, it helped. Here is the patch against 3.10.53, tested on HD2400.
Flush VRAM cache before each read when polling.
Signed-off-by: Alexander Fyodorov <halcy at yandex.ru>
Index: drivers/gpu/drm/radeon/r600.c
--- drivers/gpu/drm/radeon/r600.c (revision 11647) +++ drivers/gpu/drm/radeon/r600.c (working copy) @@ -2899,6 +2899,7 @@ radeon_ring_unlock_commit(rdev, ring);
for (i = 0; i < rdev->usec_timeout; i++) {
r600_ioctl_wait_idle(rdev, rdev->vram_scratch.robj); tmp = readl(ptr); if (tmp == 0xDEADBEEF) break;
I think I'd prefer to just switch the test to use gart memory since this code is shared by different asics thay may not all implement hdp flush the same way. We can just reserve a couple of slots in the wb page.
Alex
On Fri, Oct 10, 2014 at 12:00 PM, Alex Deucher alexdeucher@gmail.com wrote:
On Thu, Oct 9, 2014 at 3:10 PM, Alexander Fyodorov halcy@yandex.ru wrote:
09.10.2014, 22:32, "Christian König" christian.koenig@amd.com:
Am 09.10.2014 um 20:15 schrieb Alexander Fyodorov:
09.10.2014, 21:42, "Christian König" christian.koenig@amd.com:
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Then maybe such a flush is what I need. How do I put it in the instruction ring buffer?
At least we need to flush the HDP, but what hardware generation is this exactly? Some R6xx don't support hardware flushes in the ring buffer.
I observed the problem on HD2400 and HD6670.
Try to call r600_mmio_hdp_flush(rdev) from the loop which checks the value written.
Yep, it helped. Here is the patch against 3.10.53, tested on HD2400.
Flush VRAM cache before each read when polling.
Signed-off-by: Alexander Fyodorov <halcy at yandex.ru>
Index: drivers/gpu/drm/radeon/r600.c
--- drivers/gpu/drm/radeon/r600.c (revision 11647) +++ drivers/gpu/drm/radeon/r600.c (working copy) @@ -2899,6 +2899,7 @@ radeon_ring_unlock_commit(rdev, ring);
for (i = 0; i < rdev->usec_timeout; i++) {
r600_ioctl_wait_idle(rdev, rdev->vram_scratch.robj); tmp = readl(ptr); if (tmp == 0xDEADBEEF) break;
I think I'd prefer to just switch the test to use gart memory since this code is shared by different asics thay may not all implement hdp flush the same way. We can just reserve a couple of slots in the wb page.
Also newer versions of the test will need a similar fix.
Alex
Am 10.10.2014 um 18:02 schrieb Alex Deucher:
On Fri, Oct 10, 2014 at 12:00 PM, Alex Deucher alexdeucher@gmail.com wrote:
On Thu, Oct 9, 2014 at 3:10 PM, Alexander Fyodorov halcy@yandex.ru wrote:
09.10.2014, 22:32, "Christian König" christian.koenig@amd.com:
Am 09.10.2014 um 20:15 schrieb Alexander Fyodorov:
09.10.2014, 21:42, "Christian König" christian.koenig@amd.com:
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Then maybe such a flush is what I need. How do I put it in the instruction ring buffer?
At least we need to flush the HDP, but what hardware generation is this exactly? Some R6xx don't support hardware flushes in the ring buffer.
I observed the problem on HD2400 and HD6670.
Try to call r600_mmio_hdp_flush(rdev) from the loop which checks the value written.
Yep, it helped. Here is the patch against 3.10.53, tested on HD2400.
Flush VRAM cache before each read when polling.
Signed-off-by: Alexander Fyodorov <halcy at yandex.ru>
Index: drivers/gpu/drm/radeon/r600.c
--- drivers/gpu/drm/radeon/r600.c (revision 11647) +++ drivers/gpu/drm/radeon/r600.c (working copy) @@ -2899,6 +2899,7 @@ radeon_ring_unlock_commit(rdev, ring);
for (i = 0; i < rdev->usec_timeout; i++) {
r600_ioctl_wait_idle(rdev, rdev->vram_scratch.robj); tmp = readl(ptr); if (tmp == 0xDEADBEEF) break;
I think I'd prefer to just switch the test to use gart memory since this code is shared by different asics thay may not all implement hdp flush the same way. We can just reserve a couple of slots in the wb page.
Works for me as well. We could also grab a few bytes of GART using the SA manager.
Also newer versions of the test will need a similar fix.
See the patch I already send to you, the SDMA on CIK is the only one not usign this ring test.
Christian.
Alex
On Fri, Oct 10, 2014 at 12:00 PM, Alex Deucher alexdeucher@gmail.com wrote:
On Thu, Oct 9, 2014 at 3:10 PM, Alexander Fyodorov halcy@yandex.ru wrote:
09.10.2014, 22:32, "Christian König" christian.koenig@amd.com:
Am 09.10.2014 um 20:15 schrieb Alexander Fyodorov:
09.10.2014, 21:42, "Christian König" christian.koenig@amd.com:
For VRAM it is true that we have a couple of different caches between the CPU and the actually memory, which need to be flushed explicitly if you want to see a value written by the GPU.
Then maybe such a flush is what I need. How do I put it in the instruction ring buffer?
At least we need to flush the HDP, but what hardware generation is this exactly? Some R6xx don't support hardware flushes in the ring buffer.
I observed the problem on HD2400 and HD6670.
Try to call r600_mmio_hdp_flush(rdev) from the loop which checks the value written.
Yep, it helped. Here is the patch against 3.10.53, tested on HD2400.
Flush VRAM cache before each read when polling.
Signed-off-by: Alexander Fyodorov <halcy at yandex.ru>
Index: drivers/gpu/drm/radeon/r600.c
--- drivers/gpu/drm/radeon/r600.c (revision 11647) +++ drivers/gpu/drm/radeon/r600.c (working copy) @@ -2899,6 +2899,7 @@ radeon_ring_unlock_commit(rdev, ring);
for (i = 0; i < rdev->usec_timeout; i++) {
r600_ioctl_wait_idle(rdev, rdev->vram_scratch.robj); tmp = readl(ptr); if (tmp == 0xDEADBEEF) break;
I think I'd prefer to just switch the test to use gart memory since this code is shared by different asics thay may not all implement hdp flush the same way. We can just reserve a couple of slots in the wb page.
Does the attached patch work for you as well?
Alex
13.10.2014, 21:50, "Alex Deucher" alexdeucher@gmail.com:
On Fri, Oct 10, 2014 at 12:00 PM, Alex Deucher alexdeucher@gmail.com wrote:
I think I'd prefer to just switch the test to use gart memory since this code is shared by different asics thay may not all implement hdp flush the same way. We can just reserve a couple of slots in the wb page.
Does the attached patch work for you as well?
Yes, although it became a little bit slower - it now completes in 2 usecs.
dri-devel@lists.freedesktop.org