On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
Hi all,
This thread is about http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
We recently find some interesting thing about UVD based playback on loongson 3a plaform, and also find a way to fix the problem.
First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c caused the problem:
- If memcpy is implemented though 16B or 8B load/store instructions,
it will normally caused video mosaic. When insert a memcmp after the copying code in memcpy, it will report the src and dest are not equal.
- If memcpy use 1B load/store instructions only, the memcmp after the
copying code reports equal.
Then we find the following changeset fixs out problem:
diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index 2f98de2..f9599b6 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec, unsigned size) { buffer->buf = dec->ws->buffer_create(dec->ws, size, 4096, false,
RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
if (!buffer->buf) return false;RADEON_DOMAIN_GTT);
The VRAM is mapped to an uncached area in out platform, so, my question is what could go wrong while using >4B load/store instructions in UVD workflow? Any idea?
How do you map the VRAM into user process mapping ? ie do you have something like Intel PAT or something like MTRR or something else.
In other word, can you map into process address space a region of io memory (GPU VRAM in this case) and mark it as uncached so that none of the access to it goes through CPU cache.
Cheers, Jerome