On Mon, Sep 10, 2012 at 8:02 AM, Christian König deathsimple@vodafone.de wrote:
On 10.09.2012 13:12, Michel Dänzer wrote:
On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
Only increase the higher 32bits if we really detect a wrap around.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=54129 https://bugs.freedesktop.org/show_bug.cgi?id=54662
Possible fixes: https://bugzilla.redhat.com/show_bug.cgi?id=846505 https://bugzilla.redhat.com/show_bug.cgi?id=845639
Signed-off-by: Christian König deathsimple@vodafone.de Cc: stable@vger.kernel.org
drivers/gpu/drm/radeon/radeon_fence.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 7b737b9..4781e13 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring) do { seq = radeon_fence_read(rdev, ring); seq |= last_seq & 0xffffffff00000000LL;
if (seq < last_seq) {
if (seq < (last_seq - 0x80000000LL)) { seq += 0x100000000LL; }
Can you provide a bit more explanation for this change? In particular, how could the code previously detect a wraparound when there was none, and why is this the proper fix?
Honestly I also don't really understand how this bug happened in the first place.
We extend the 32bit fences supported by hardware by testing if a previously read fence value is smaller than the value we read now:
if (seq < last_seq) {
But the problem seems to be that on some systems we do get fence values that are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or maybe 5, 6, 0, 7, 8 because somebody accidentally overwrites the fence value).
It might be related to a hardware bug, or the algorithm is flawed in a way I currently don't see. Anyway the old code we had wasn't so picky about such problems and the patch just tries to make the current code as robust as the old code was, which indeed seems to solve the problems we see.
The wrap around detection still works (tested by setting the initial fence value to 0xfffffff0 and letting it wrap around shortly after start), so I think it we can safely commit this.
Christian.
If fence read ever gave the value 0 then your patch is postponing the issue until last fence reach >= 0x1 8000 0001 (which will takes month/years of uptime to happen :)). Having a log of all emitted fence value and all read en fence value would probably be helpful so we have a better clue of what's going on.
Sadly i don't think we don't receive fence in order but rather that we sometimes receive a 0 value fence, if we did receive fence out of order than the issue would also have happen with previous code.
Otherwise patch is ack but please add a better comment about fence value most likely being 0 for unknow reasons or receive out of order (thought i don't think so).
Cheers, Jerome