One subtest of mesa/demos/gltestperf takes 9 seconds to complete, so to prevent an unnecessary gpu reset followed by a hardlock, I am increasing the interval to 10 seconds after which a GPU is considered in a locked-up state. This is on RV530. However, with a little slower GPU, we would surpass the interval easily, so this is not a good fix for gltestperf.
Nevertheless, this commit also fixes hardlocks in the applications which render at speed of less than 1 frame per second, where the whole frame consists of only one command stream. The game Tiny & Big is an example. This bar is now lowered to 0.1 fps.
Now the question comes down to whether we should (often unsuccessfully) reset the GPU at all? Once we have stable enough drivers, we won't have to. Has the time come already?
If possible, this commit should go to stable as well.
Signed-off-by: Marek Olšák maraeo@gmail.com --- drivers/gpu/drm/radeon/r100.c | 13 +------------ 1 files changed, 1 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index e817a0b..ec64b36 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -2020,18 +2020,7 @@ bool r100_gpu_cp_is_lockup(struct radeon_device *rdev, struct r100_gpu_lockup *l return false; } elapsed = jiffies_to_msecs(cjiffies - lockup->last_jiffies); - if (elapsed >= 3000) { - /* very likely the improbable case where current - * rptr is equal to last recorded, a while ago, rptr - * this is more likely a false positive update tracking - * information which should force us to be recall at - * latter point - */ - lockup->last_cp_rptr = cp->rptr; - lockup->last_jiffies = jiffies; - return false; - } - if (elapsed >= 1000) { + if (elapsed >= 10000) { dev_err(rdev->dev, "GPU lockup CP stall for more than %lumsec\n", elapsed); return true; }
This commit fixes bogus CS rejection if it contains a sequence of the following operations:
- Set the color buffer 0. track->cb[i].robj becomes non-NULL. - Render. - Set a larger zbuffer than the previously-set color buffer. - Set a larger scissor area as well. - Set the color channel mask to 0 to do depth-only rendering. - Render. --> rejected, because track->cb[i].robj remained non-NULL, therefore the conditional checking for the color channel mask and friends is not performed, and the larger scissor area causes the rejection.
This fixes bugs: - https://bugs.freedesktop.org/show_bug.cgi?id=29762 - https://bugs.freedesktop.org/show_bug.cgi?id=28869 And maybe some others which seem to look the same.
If possible, this commit should go to stable as well.
Signed-off-by: Marek Olšák maraeo@gmail.com --- drivers/gpu/drm/radeon/r100.c | 11 ++++++----- 1 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index ec64b36..e151f16 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -3297,13 +3297,14 @@ int r100_cs_track_check(struct radeon_device *rdev, struct r100_cs_track *track) unsigned long size; unsigned prim_walk; unsigned nverts; + unsigned num_cb = track->num_cb;
- for (i = 0; i < track->num_cb; i++) { + if (!track->zb_cb_clear && !track->color_channel_mask && + !track->blend_read_enable) + num_cb = 0; + + for (i = 0; i < num_cb; i++) { if (track->cb[i].robj == NULL) { - if (!(track->zb_cb_clear || track->color_channel_mask || - track->blend_read_enable)) { - continue; - } DRM_ERROR("[drm] No buffer for color buffer %d !\n", i); return -EINVAL; }
Tested-by: Sven Arvidsson sa@whiz.se
Posted here: https://bugs.freedesktop.org/show_bug.cgi?id=30036#c6
On Sun, Sep 12, 2010 at 5:09 AM, Marek Olšák maraeo@gmail.com wrote:
This commit fixes bogus CS rejection if it contains a sequence of the following operations:
- Set the color buffer 0. track->cb[i].robj becomes non-NULL.
- Render.
- Set a larger zbuffer than the previously-set color buffer.
- Set a larger scissor area as well.
- Set the color channel mask to 0 to do depth-only rendering.
- Render. --> rejected, because track->cb[i].robj remained non-NULL,
therefore the conditional checking for the color channel mask and friends is not performed, and the larger scissor area causes the rejection.
This fixes bugs:
- https://bugs.freedesktop.org/show_bug.cgi?id=29762
- https://bugs.freedesktop.org/show_bug.cgi?id=28869
And maybe some others which seem to look the same.
If possible, this commit should go to stable as well.
Signed-off-by: Marek Olšák maraeo@gmail.com
drivers/gpu/drm/radeon/r100.c | 11 ++++++----- 1 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index ec64b36..e151f16 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -3297,13 +3297,14 @@ int r100_cs_track_check(struct radeon_device *rdev, struct r100_cs_track *track) unsigned long size; unsigned prim_walk; unsigned nverts;
unsigned num_cb = track->num_cb;
for (i = 0; i < track->num_cb; i++) {
if (!track->zb_cb_clear && !track->color_channel_mask &&
!track->blend_read_enable)
num_cb = 0;
for (i = 0; i < num_cb; i++) { if (track->cb[i].robj == NULL) {
if (!(track->zb_cb_clear ||
track->color_channel_mask ||
track->blend_read_enable)) {
continue;
} DRM_ERROR("[drm] No buffer for color buffer %d !\n",
i); return -EINVAL; } -- 1.7.0.4
Oh I almost forgot. I removed the funky "(elapsed >= 3000)" conditional which funnily did not mean a hardlock. I don't understand what it was meant to be and what it was for. Any idea?
Marek
On Sun, Sep 12, 2010 at 5:09 AM, Marek Olšák maraeo@gmail.com wrote:
One subtest of mesa/demos/gltestperf takes 9 seconds to complete, so to prevent an unnecessary gpu reset followed by a hardlock, I am increasing the interval to 10 seconds after which a GPU is considered in a locked-up state. This is on RV530. However, with a little slower GPU, we would surpass the interval easily, so this is not a good fix for gltestperf.
Nevertheless, this commit also fixes hardlocks in the applications which render at speed of less than 1 frame per second, where the whole frame consists of only one command stream. The game Tiny & Big is an example. This bar is now lowered to 0.1 fps.
Now the question comes down to whether we should (often unsuccessfully) reset the GPU at all? Once we have stable enough drivers, we won't have to. Has the time come already?
If possible, this commit should go to stable as well.
Signed-off-by: Marek Olšák maraeo@gmail.com
drivers/gpu/drm/radeon/r100.c | 13 +------------ 1 files changed, 1 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index e817a0b..ec64b36 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -2020,18 +2020,7 @@ bool r100_gpu_cp_is_lockup(struct radeon_device *rdev, struct r100_gpu_lockup *l return false; } elapsed = jiffies_to_msecs(cjiffies - lockup->last_jiffies);
if (elapsed >= 3000) {
/* very likely the improbable case where current
* rptr is equal to last recorded, a while ago, rptr
* this is more likely a false positive update tracking
* information which should force us to be recall at
* latter point
*/
lockup->last_cp_rptr = cp->rptr;
lockup->last_jiffies = jiffies;
return false;
}
if (elapsed >= 1000) {
if (elapsed >= 10000) { dev_err(rdev->dev, "GPU lockup CP stall for more than
%lumsec\n", elapsed); return true; } -- 1.7.0.4
dri-devel@lists.freedesktop.org